Configuration Options

Option	Description
Version Tag	A tag to identify this current deployment. Useful for model versioning.
Region	Region where the deployment is to be hosted in (e.g. `us` or `asia-east1`)
Instance Type	An identifier describing a fixed configuration of compute resources to be allocated to the deployment.
Number of Replicas	Number of instances to be spun up for the deployment.

Instance Types

Deployment resources can be allocated through predefined instance types. These instance types are categorized based on model compute requirements and inference request demands. Larger models may also require GPUs provisioned to the deployment.

CPU Configuration

CPU-only instances are typically used for testing purposes, low inference traffic environments, or for saving costs. These are recommended if the model to be deployed is lightweight, such as image classification models or object detection models such as the smaller-resolution variants of YOLOv8.

x1cpu Platform

The x1cpu platform family offers mid-to-high performance CPU instances suitable for low-traffic deployments with larger models.

CPU Type: AMD EPYC Milan

Identifier	vCPUs	RAM (MiB)
`x1cpu-micro`	2	10240
`x1cpu-standard`	6	26624
`x1cpu-large`	14	59392
`x1cpu-extreme`	30	124928
`x1cpu-ultra`	54	223232

x2cpu Platform

📘
This platform family is only available in the US region. Please contact us if you wish to utilize this platform in other regions.

The x2cpu platform offers high performance CPU instances suitable for low-to-medium traffic deployments with large models.

CPU Type: Intel Sapphire Rapids, with Intel AMX extensions

Identifier	vCPUs	RAM (MiB)
`x2cpu-micro`	2	10240
`x2cpu-standard`	6	26624
`x2cpu-large`	20	83968
`x2cpu-extreme`	42	174080
`x2cpu-ultra`	86	354304

GPU Configuration

GPU instances are used in high inference traffic environments or for multi-model deployments. This are also recommended for more compute-demanding models such as instance segmentation and semantic segmentation models (SegFormer, Mask2Former, etc.).

NVIDIA Tesla T4 Platform

The T4 platform family offers medium performance GPU instances suitable for medium traffic deployments.

CPU Type: Intel Skylake, Broadwell, Haswell, Sandy Bridge, Ivy Bridge

Identifier	vCPUs	RAM (MiB)	GPUs
`t4-standard-1g`	6	24576	1x T4
`t4-large-2g`	14	55926	2x T4
`t4-extreme-4g`	39	116736	4x T4
`t4-ultra-4g`	62	239616	4x T4

NVIDIA L4 Platform

The L4 platform family offers medium-to-high performance GPU instances suitable for high traffic deployments.

CPU Type: Intel Cascade Lake

Identifier	vCPUs	RAM (MiB)	GPUs
`l4-standard-1g`	6	26624	1x L4
`l4-large-2g`	22	88064	2x L4
`l4-extreme-4g`	46	174080	4x L4
`l4-ultra-4g`	94	370688	8x L4

NVIDIA A100 Platform

📘
This platform family is only available in the US region. Please contact us if you wish to utilize this platform in other regions.

The A100 platform family offers high performance GPU instances suitable for high traffic deployments.

CPU Type: Intel Cascade Lake

NVIDIA A100 (40GB)

Identifier	vCPUs	RAM (MiB)	GPUs
`a140-standard-1g`	10	75776	1x A100 (40GB)
`a140-large-2g`	22	151552	2x A100 (40GB)
`a140-extreme-4g`	44	317440	4x A100 (40GB)
`a140-ultra-8g`	94	647168	8x A100 (40GB)
`a140-ultra-16g`	94	1294336	16x A100 (40GB)

NVIDIA A100 (80GB)

Identifier	vCPUs	RAM (MiB)	GPUs
`a180-standard-1g`	10	151522	1x A100 (80GB)
`a180-large-2g`	22	317440	2x A100 (80GB)
`a180-extreme-4g`	46	647168	4x A100 (80GB)
`a180-ultra-8g`	94	1294336	8x A100 (80GB)

NVIDIA H100 SXM Platform

📘
This platform family is only available in the US region, and upon request. Please contact us if you wish to use this platform.

The H100 SXM platform family offers high performance GPU instances suitable for high traffic deployments.

CPU Type: Intel Sapphire Rapids

Identifier	vCPUs	RAM (MiB)	GPUs
`h100s-ultra-8g`	206	1837056	8x H100

👋 Need help? Contact us via website or email

🚀 Join our Slack Community

💻 For more resources: Blog | GitHub | Tutorial Page

🛠️ Need Technical Assistance? Connect with Datature Experts or chat with us via the chat button below 👇