Model Training Management

`list`

Lists all training runs regardless of status.

Return

A list of msgspec structs containing the training run metadata with the following structure:

[Run(
    id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_63bbd3bf8a78eb906f417396',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_T4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_ALWAYS_SAVE_LATEST',
            evaluation_interval=250,
            metric=None
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
    status=RunStatus(
        overview='Creating',
        message='Creating service.',
        update_data=1705466384796
    ),
    create_date=1705466384796,
    update_date=1705466392067,
    log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)]

Examples

List all trainings:

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.list()

List the latest created training:

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

latest_training = max(project.runs.list(), key=lambda x: x["create_date"])

`get`

Retrieves a specific training run using the run ID.

Parameters

Name	Type	Description
`run_id`	`str`	The ID of the training run.

Return

A msgspec struct containing the specific training run metadata with the following structure:

Run(
    id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_63bbd3bf8a78eb906f417396',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_T4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_ALWAYS_SAVE_LATEST',
            evaluation_interval=250,
            metric=None
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
    status=RunStatus(
        overview='Creating',
        message='Creating service.',
        update_data=1705466384796
    ),
    create_date=1705466384796,
    update_date=1705466392067,
    log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)

Examples

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.get("run_610ba26c-6fbb-42eb-b838-839c61b68b26")

`kill`

Kills a specific training run using the run ID.

🚧
Killed trainings cannot be resumed - all training progress of a killed training will be lost. You will need to restart the training from scratch if you wish to run another training. This action cannot be undone, so only kill your training when you are absolutely sure.

Parameters

Name	Type	Description
`run_id`	`str`	The ID of the training run.

Return

A msgspec struct containing the killed training metadata with the following structure:

Run(
    id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_64e812a7e47592ef374cbbc2',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_L4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_LOWEST_VALIDATION_LOSS',
            evaluation_interval=220,
            metric='Loss/total_loss'
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    status=RunStatus(
        overview='Cancelled',
        message='Training cancelled.',
        update_data=1700204316180
    ),
    create_date=1701927649302,
    update_date=1701927649302,
    features=RunFeatures(preview=True, matrix=True),
    log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)

Examples

Kill a specific training by run ID:

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.kill("run_63eb212ff0f856bf95085095")

Kill the latest created training:

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

latest_training_id = max(project.runs.list(),
                         key=lambda x: x["create_date"])["id"]
project.runs.kill(latest_training_id)

`start`

Starts a new training run from a specific workflow using the flow ID.

Parameters

Name	Type	Description
`flow_id`	`str`	The ID of the workflow.
`setup`	`object`	The metadata of the training.

Return

A msgspec struct containing the newly-initialized training run metadata with the following structure:

Run(
    id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
    project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
    flow_id='flow_64e812a7e47592ef374cbbc2',
    execution=RunExecution(
        accelerator=RunAccelerator(name='GPU_L4', count=1),
        checkpoint=RunCheckpoint(
            strategy='STRAT_LOWEST_VALIDATION_LOSS',
            evaluation_interval=220,
            metric='Loss/total_loss'
        ),
        limit=RunLimit(metric='LIM_NONE', value=0)
    ),
    status=RunStatus(
        overview='Creating',
        message='Training starting.',
        update_data=1700204316180
    ),
    create_date=1701927649302,
    update_date=1701927649302,
    features=RunFeatures(preview=True, matrix=True),
    log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)

Examples

Start training with a specific workflow ID:

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.start("flow_63d0f2d5fb1f9189db9b1c4b", {
    "accelerator": {
        "name": "GPU_T4",
        "count": 1
    },
    "checkpoint": {
        "strategy": "STRAT_ALWAYS_SAVE_LATEST",
        "evaluation_interval": 250
    },
    "limit": {
        "metric": "LIM_NONE",
        "value": 0
    },
    "preview": True,
    "matrix": True
})

Start a training for the workflow named "My Awesome Workflow":

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

flow_id = [
    flow for flow in project.workflows.list()
    if flow.title == "My Awesome Workflow"
][0]["id"]

project.runs.start(flow_id, {
    "accelerator": {
        "name": "GPU_T4",
        "count": 1
    },
    "checkpoint": {
        "strategy": "STRAT_ALWAYS_SAVE_LATEST",
        "evaluation_interval": 200
    },
    "limit": {
        "metric": "LIM_NONE",
        "value": 0
    },
    "preview": True,
   	"matrix": True
})

`get_logs`

Retrieves a specific training log using the log ID.

Parameters

Name	Type	Description
`log_id`	`str`	The ID of the training log.

Return

A msgspec struct with the specific training log metadata with the following structure:

RunLogs(
    id='runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA',
    logs=[
        {
            'ev': 'trainingCheckpoint',
            't': 1700200466147,
            'pl': {
                'step': 0,
                'log': 'Step 0, totalLoss: 6.4055E+01, boxLoss: 5.3928E+00, classificationLoss: 5.2646E+01, distributedFocalLoss: 6.0163E+00.',
                'totalLoss': 64.055,
                'boxLoss': 5.3928,
                'classificationLoss': 52.646,
                'distributedFocalLoss': 6.0163
            }
        }
    ]
)

Examples

Retrieve training logs by specific log ID:

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

project.runs.get_logs("runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA")

Retrieve training logs for the workflow named "My Awesome Workflow":

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")

flow_id = [
    flow for flow in project.workflows.list()
    if flow.title == "My Awesome Workflow"
][0]["id"]

log_id = [
    run for run in project.runs.list()
    if run["flow_id"] == flow_id
][0]["logs"][0]

datature.runs.get_logs(log_id)

`get_confusion_matrix`

Retrieves a training confusion matrix using the run ID.

Parameters

Name	Type	Description
`run_id`	`str`	The ID of the training run.

Return

A dictionary containing the specific training matrix JSON string with the following structure:

RunConfusionMatrix(
    "object" = "confusionMatrix",
    "data" = "{"0":[{"id":"RBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"WBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Platelets","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"boat","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Background","data":[{"x":"RBC","y":302},{"x":"WBC","y":27},{"x":"Platelets","y":22},{"x":"boat","y":2},{"x":"Background","y":0}]}]}"
)

The id represents the column name and the subsequent data represents the array of elements, where x is the name of the row and y represents the value of the element in the confusion matrix.

Examples

Retrieve the training confusion matrix using the run ID:

from datature.nexus import Client

project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get_confusion_matrix("run_63eb212ff0f856bf95085095")