Model Training Management

list

Lists all training runs regardless of status.

Return

A list of msgspec structs containing the training run metadata with the following structure:

[Run(
    id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_63bbd3bf8a78eb906f417396',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_T4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_ALWAYS_SAVE_LATEST',
            evaluation_interval=250,
            metric=None
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
    status=RunStatus(
        overview='Creating',
        message='Creating service.',
        update_data=1705466384796
    ),
    create_date=1705466384796,
    update_date=1705466392067,
    log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)]

Examples

  • List all trainings:
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.list()
  • List the latest created training:
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

latest_training = max(project.runs.list(), key=lambda x: x["create_date"])

get

Retrieves a specific training run using the run ID.

Parameters

NameTypeDescription
run_idstrThe ID of the training run.

Return

A msgspec struct containing the specific training run metadata with the following structure:

Run(
    id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_63bbd3bf8a78eb906f417396',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_T4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_ALWAYS_SAVE_LATEST',
            evaluation_interval=250,
            metric=None
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
    status=RunStatus(
        overview='Creating',
        message='Creating service.',
        update_data=1705466384796
    ),
    create_date=1705466384796,
    update_date=1705466392067,
    log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)

Examples

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.get("run_610ba26c-6fbb-42eb-b838-839c61b68b26")

kill

Kills a specific training run using the run ID.

🚧

Killed trainings cannot be resumed - all training progress of a killed training will be lost. You will need to restart the training from scratch if you wish to run another training. This action cannot be undone, so only kill your training when you are absolutely sure.

Parameters

NameTypeDescription
run_idstrThe ID of the training run.

Return

A msgspec struct containing the killed training metadata with the following structure:

Run(
    id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_64e812a7e47592ef374cbbc2',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_L4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_LOWEST_VALIDATION_LOSS',
            evaluation_interval=220,
            metric='Loss/total_loss'
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    status=RunStatus(
        overview='Cancelled',
        message='Training cancelled.',
        update_data=1700204316180
    ),
    create_date=1701927649302,
    update_date=1701927649302,
    features=RunFeatures(preview=True, matrix=True),
    log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)

Examples

  • Kill a specific training by run ID:
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.kill("run_63eb212ff0f856bf95085095")
  • Kill the latest created training:
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

latest_training_id = max(project.runs.list(),
                         key=lambda x: x["create_date"])["id"]
project.runs.kill(latest_training_id)

start

Starts a new training run from a specific workflow using the flow ID.

Parameters

NameTypeDescription
flow_idstrThe ID of the workflow.
setupobjectThe metadata of the training.

Return

A msgspec struct containing the newly-initialized training run metadata with the following structure:

Run(
    id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_64e812a7e47592ef374cbbc2',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_L4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_LOWEST_VALIDATION_LOSS',
            evaluation_interval=220,
            metric='Loss/total_loss'
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    status=RunStatus(
        overview='Creating',
        message='Training starting.',
        update_data=1700204316180
    ),
    create_date=1701927649302,
    update_date=1701927649302,
    features=RunFeatures(preview=True, matrix=True),
    log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)

Examples

  • Start training with a specific workflow ID:
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.start("flow_63d0f2d5fb1f9189db9b1c4b", {
    "accelerator": {
        "name": "GPU_T4",
        "count": 1
    },
    "checkpoint": {
        "strategy": "STRAT_ALWAYS_SAVE_LATEST",
        "evaluation_interval": 250
    },
    "limit": {
        "metric": "LIM_NONE",
        "value": 0
    },
    "preview": True,
    "matrix": True
})
  • Start a training for the workflow named "My Awesome Workflow":
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

flow_id = [
    flow for flow in project.workflows.list()
    if flow.title == "My Awesome Workflow"
][0]["id"]

project.runs.start(flow_id, {
    "accelerator": {
        "name": "GPU_T4",
        "count": 1
    },
    "checkpoint": {
        "strategy": "STRAT_ALWAYS_SAVE_LATEST",
        "evaluation_interval": 200
    },
    "limit": {
        "metric": "LIM_NONE",
        "value": 0
    },
    "preview": True,
   	"matrix": True
})

get_logs

Retrieves a specific training log using the log ID.

Parameters

NameTypeDescription
log_idstrThe ID of the training log.

Return

A msgspec struct with the specific training log metadata with the following structure:

RunLogs(
    id='runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA',
    logs=[
        {
            'ev': 'trainingCheckpoint',
            't': 1700200466147,
            'pl': {
                'step': 0,
                'log': 'Step 0, totalLoss: 6.4055E+01, boxLoss: 5.3928E+00, classificationLoss: 5.2646E+01, distributedFocalLoss: 6.0163E+00.',
                'totalLoss': 64.055,
                'boxLoss': 5.3928,
                'classificationLoss': 52.646,
                'distributedFocalLoss': 6.0163
            }
        }
    ]
)

Examples

  • Retrieve training logs by specific log ID:
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.get_logs("runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA")
  • Retrieve training logs for the workflow named "My Awesome Workflow":
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

flow_id = [
    flow for flow in project.workflows.list()
    if flow.title == "My Awesome Workflow"
][0]["id"]

log_id = [
    run for run in project.runs.list()
    if run["flow_id"] == flow_id
][0]["logs"][0]

datature.runs.get_logs(log_id)

get_confusion_matrix

Retrieves a training confusion matrix using the run ID.

Parameters

NameTypeDescription
run_idstrThe ID of the training run.

Return

A dictionary containing the specific training matrix JSON string with the following structure:

RunConfusionMatrix(
    "object" = "confusionMatrix",
    "data" = "{"0":[{"id":"RBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"WBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Platelets","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"boat","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Background","data":[{"x":"RBC","y":302},{"x":"WBC","y":27},{"x":"Platelets","y":22},{"x":"boat","y":2},{"x":"Background","y":0}]}]}"
)

The id represents the column name and the subsequent data represents the array of elements, where x is the name of the row and y represents the value of the element in the confusion matrix.

Examples

  • Retrieve the training confusion matrix using the run ID:
from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get_confusion_matrix("run_63eb212ff0f856bf95085095")