Deployment Configuration

Configuration Options

OptionDescription
Version TagA tag to identify this current deployment. Useful for model versioning.
RegionRegion where the deployment is to be hosted in (e.g. us or asia-east1)
Instance TypeAn identifier describing a fixed configuration of compute resources to be allocated to the deployment.
Number of ReplicasNumber of instances to be spun up for the deployment.

Instance Types

Deployment resources can be allocated through predefined instance types. These instance types are categorized based on model compute requirements and inference request demands. Larger models may also require GPUs provisioned to the deployment.


CPU Configuration

CPU-only instances are typically used for testing purposes, low inference traffic environments, or for saving costs. These are recommended if the model to be deployed is lightweight, such as image classification models or object detection models such as the smaller-resolution variants of YOLOv8.

x1cpu Platform

The x1cpu platform family offers mid-to-high performance CPU instances suitable for low-traffic deployments with larger models.

CPU Type: AMD EPYC Milan

IdentifiervCPUsRAM (MiB)
x1cpu-micro210240
x1cpu-standard626624
x1cpu-large1459392
x1cpu-extreme30124928
x1cpu-ultra54223232

x2cpu Platform

📘

This platform family is only available in the US region. Please contact us if you wish to utilize this platform in other regions.

The x2cpu platform offers high performance CPU instances suitable for low-to-medium traffic deployments with large models.

CPU Type: Intel Sapphire Rapids, with Intel AMX extensions

IdentifiervCPUsRAM (MiB)
x2cpu-micro210240
x2cpu-standard626624
x2cpu-large2083968
x2cpu-extreme42174080
x2cpu-ultra86354304

GPU Configuration

GPU instances are used in high inference traffic environments or for multi-model deployments. This are also recommended for more compute-demanding models such as instance segmentation and semantic segmentation models (SegFormer, Mask2Former, etc.).

NVIDIA Tesla T4 Platform

The T4 platform family offers medium performance GPU instances suitable for medium traffic deployments.

CPU Type: Intel Skylake, Broadwell, Haswell, Sandy Bridge, Ivy Bridge

IdentifiervCPUsRAM (MiB)GPUs
t4-standard-1g6245761x T4
t4-large-2g14559262x T4
t4-extreme-4g391167364x T4
t4-ultra-4g622396164x T4

NVIDIA L4 Platform

The L4 platform family offers medium-to-high performance GPU instances suitable for high traffic deployments.

CPU Type: Intel Cascade Lake

IdentifiervCPUsRAM (MiB)GPUs
l4-standard-1g6266241x L4
l4-large-2g22880642x L4
l4-extreme-4g461740804x L4
l4-ultra-4g943706888x L4

NVIDIA A100 Platform

📘

This platform family is only available in the US region. Please contact us if you wish to utilize this platform in other regions.

The A100 platform family offers high performance GPU instances suitable for high traffic deployments.

CPU Type: Intel Cascade Lake

NVIDIA A100 (40GB)

IdentifiervCPUsRAM (MiB)GPUs
a140-standard-1g10757761x A100 (40GB)
a140-large-2g221515522x A100 (40GB)
a140-extreme-4g443174404x A100 (40GB)
a140-ultra-8g946471688x A100 (40GB)
a140-ultra-16g94129433616x A100 (40GB)

NVIDIA A100 (80GB)

IdentifiervCPUsRAM (MiB)GPUs
a180-standard-1g101515221x A100 (80GB)
a180-large-2g223174402x A100 (80GB)
a180-extreme-4g466471684x A100 (80GB)
a180-ultra-8g9412943368x A100 (80GB)

NVIDIA H100 SXM Platform

📘

This platform family is only available in the US region, and upon request. Please contact us if you wish to use this platform.

The H100 SXM platform family offers high performance GPU instances suitable for high traffic deployments.

CPU Type: Intel Sapphire Rapids

IdentifiervCPUsRAM (MiB)GPUs
h100s-ultra-8g20618370568x H100