Deployment Configuration
Configuration Options
Option | Description |
---|---|
Version Tag | A tag to identify this current deployment. Useful for model versioning. |
Region | Region where the deployment is to be hosted in (e.g. us or asia-east1 ) |
Instance Type | An identifier describing a fixed configuration of compute resources to be allocated to the deployment. |
Number of Replicas | Number of instances to be spun up for the deployment. |
Instance Types
Deployment resources can be allocated through predefined instance types. These instance types are categorized based on model compute requirements and inference request demands. Larger models may also require GPUs provisioned to the deployment.
CPU Configuration
CPU-only instances are typically used for testing purposes, low inference traffic environments, or for saving costs. These are recommended if the model to be deployed is lightweight, such as image classification models or object detection models such as the smaller-resolution variants of YOLOv8.
x1cpu Platform
The x1cpu platform family offers mid-to-high performance CPU instances suitable for low-traffic deployments with larger models.
CPU Type: AMD EPYC Milan
Identifier | vCPUs | RAM (MiB) |
---|---|---|
x1cpu-micro | 2 | 10240 |
x1cpu-standard | 6 | 26624 |
x1cpu-large | 14 | 59392 |
x1cpu-extreme | 30 | 124928 |
x1cpu-ultra | 54 | 223232 |
x2cpu Platform
This platform family is only available in the US region. Please contact us if you wish to utilize this platform in other regions.
The x2cpu platform offers high performance CPU instances suitable for low-to-medium traffic deployments with large models.
CPU Type: Intel Sapphire Rapids, with Intel AMX extensions
Identifier | vCPUs | RAM (MiB) |
---|---|---|
x2cpu-micro | 2 | 10240 |
x2cpu-standard | 6 | 26624 |
x2cpu-large | 20 | 83968 |
x2cpu-extreme | 42 | 174080 |
x2cpu-ultra | 86 | 354304 |
GPU Configuration
GPU instances are used in high inference traffic environments or for multi-model deployments. This are also recommended for more compute-demanding models such as instance segmentation and semantic segmentation models (SegFormer, Mask2Former, etc.).
NVIDIA Tesla T4 Platform
The T4 platform family offers medium performance GPU instances suitable for medium traffic deployments.
CPU Type: Intel Skylake, Broadwell, Haswell, Sandy Bridge, Ivy Bridge
Identifier | vCPUs | RAM (MiB) | GPUs |
---|---|---|---|
t4-standard-1g | 6 | 24576 | 1x T4 |
t4-large-2g | 14 | 55926 | 2x T4 |
t4-extreme-4g | 39 | 116736 | 4x T4 |
t4-ultra-4g | 62 | 239616 | 4x T4 |
NVIDIA L4 Platform
The L4 platform family offers medium-to-high performance GPU instances suitable for high traffic deployments.
CPU Type: Intel Cascade Lake
Identifier | vCPUs | RAM (MiB) | GPUs |
---|---|---|---|
l4-standard-1g | 6 | 26624 | 1x L4 |
l4-large-2g | 22 | 88064 | 2x L4 |
l4-extreme-4g | 46 | 174080 | 4x L4 |
l4-ultra-4g | 94 | 370688 | 8x L4 |
NVIDIA A100 Platform
This platform family is only available in the US region. Please contact us if you wish to utilize this platform in other regions.
The A100 platform family offers high performance GPU instances suitable for high traffic deployments.
CPU Type: Intel Cascade Lake
NVIDIA A100 (40GB)
Identifier | vCPUs | RAM (MiB) | GPUs |
---|---|---|---|
a140-standard-1g | 10 | 75776 | 1x A100 (40GB) |
a140-large-2g | 22 | 151552 | 2x A100 (40GB) |
a140-extreme-4g | 44 | 317440 | 4x A100 (40GB) |
a140-ultra-8g | 94 | 647168 | 8x A100 (40GB) |
a140-ultra-16g | 94 | 1294336 | 16x A100 (40GB) |
NVIDIA A100 (80GB)
Identifier | vCPUs | RAM (MiB) | GPUs |
---|---|---|---|
a180-standard-1g | 10 | 151522 | 1x A100 (80GB) |
a180-large-2g | 22 | 317440 | 2x A100 (80GB) |
a180-extreme-4g | 46 | 647168 | 4x A100 (80GB) |
a180-ultra-8g | 94 | 1294336 | 8x A100 (80GB) |
NVIDIA H100 SXM Platform
This platform family is only available in the US region, and upon request. Please contact us if you wish to use this platform.
The H100 SXM platform family offers high performance GPU instances suitable for high traffic deployments.
CPU Type: Intel Sapphire Rapids
Identifier | vCPUs | RAM (MiB) | GPUs |
---|---|---|---|
h100s-ultra-8g | 206 | 1837056 | 8x H100 |
Updated 4 months ago