Model Training Management
list
listLists all training runs regardless of status.
Return
A list of msgspec structs containing the training run metadata with the following structure:
[Run(
id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_63bbd3bf8a78eb906f417396',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_T4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_ALWAYS_SAVE_LATEST',
evaluation_interval=250,
metric=None
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
status=RunStatus(
overview='Creating',
message='Creating service.',
update_data=1705466384796
),
create_date=1705466384796,
update_date=1705466392067,
log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)]Examples
- List all trainings:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.list()- List the latest created training:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
latest_training = max(project.runs.list(), key=lambda x: x["create_date"])get
getRetrieves a specific training run using the run ID.
Parameters
| Name | Type | Description |
|---|---|---|
run_id | str | The ID of the training run. |
Return
A msgspec struct containing the specific training run metadata with the following structure:
Run(
id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_63bbd3bf8a78eb906f417396',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_T4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_ALWAYS_SAVE_LATEST',
evaluation_interval=250,
metric=None
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
status=RunStatus(
overview='Creating',
message='Creating service.',
update_data=1705466384796
),
create_date=1705466384796,
update_date=1705466392067,
log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)Examples
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get("run_610ba26c-6fbb-42eb-b838-839c61b68b26")kill
killKills a specific training run using the run ID.
Killed trainings cannot be resumed - all training progress of a killed training will be lost. You will need to restart the training from scratch if you wish to run another training. This action cannot be undone, so only kill your training when you are absolutely sure.
Parameters
| Name | Type | Description |
|---|---|---|
run_id | str | The ID of the training run. |
Return
A msgspec struct containing the killed training metadata with the following structure:
Run(
id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_64e812a7e47592ef374cbbc2',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_L4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_LOWEST_VALIDATION_LOSS',
evaluation_interval=220,
metric='Loss/total_loss'
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
status=RunStatus(
overview='Cancelled',
message='Training cancelled.',
update_data=1700204316180
),
create_date=1701927649302,
update_date=1701927649302,
features=RunFeatures(preview=True, matrix=True),
log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)Examples
- Kill a specific training by run ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.kill("run_63eb212ff0f856bf95085095")- Kill the latest created training:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
latest_training_id = max(project.runs.list(),
key=lambda x: x["create_date"])["id"]
project.runs.kill(latest_training_id)start
startStarts a new training run from a specific workflow using the flow ID.
Parameters
| Name | Type | Description |
|---|---|---|
flow_id | str | The ID of the workflow. |
setup | object | The metadata of the training. |
Return
A msgspec struct containing the newly-initialized training run metadata with the following structure:
Run(
id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_64e812a7e47592ef374cbbc2',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_L4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_LOWEST_VALIDATION_LOSS',
evaluation_interval=220,
metric='Loss/total_loss'
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
status=RunStatus(
overview='Creating',
message='Training starting.',
update_data=1700204316180
),
create_date=1701927649302,
update_date=1701927649302,
features=RunFeatures(preview=True, matrix=True),
log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)Examples
- Start training with a specific workflow ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.start("flow_63d0f2d5fb1f9189db9b1c4b", {
"accelerator": {
"name": "GPU_T4",
"count": 1
},
"checkpoint": {
"strategy": "STRAT_ALWAYS_SAVE_LATEST",
"evaluation_interval": 250
},
"limit": {
"metric": "LIM_NONE",
"value": 0
},
"preview": True,
"matrix": True
})- Start a training for the workflow named "My Awesome Workflow":
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
flow_id = [
flow for flow in project.workflows.list()
if flow.title == "My Awesome Workflow"
][0]["id"]
project.runs.start(flow_id, {
"accelerator": {
"name": "GPU_T4",
"count": 1
},
"checkpoint": {
"strategy": "STRAT_ALWAYS_SAVE_LATEST",
"evaluation_interval": 200
},
"limit": {
"metric": "LIM_NONE",
"value": 0
},
"preview": True,
"matrix": True
})get_logs
get_logsRetrieves a specific training log using the log ID.
Parameters
| Name | Type | Description |
|---|---|---|
log_id | str | The ID of the training log. |
Return
A msgspec struct with the specific training log metadata with the following structure:
RunLogs(
id='runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA',
logs=[
{
'ev': 'trainingCheckpoint',
't': 1700200466147,
'pl': {
'step': 0,
'log': 'Step 0, totalLoss: 6.4055E+01, boxLoss: 5.3928E+00, classificationLoss: 5.2646E+01, distributedFocalLoss: 6.0163E+00.',
'totalLoss': 64.055,
'boxLoss': 5.3928,
'classificationLoss': 52.646,
'distributedFocalLoss': 6.0163
}
}
]
)Examples
- Retrieve training logs by specific log ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get_logs("runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA")- Retrieve training logs for the workflow named "My Awesome Workflow":
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
flow_id = [
flow for flow in project.workflows.list()
if flow.title == "My Awesome Workflow"
][0]["id"]
log_id = [
run for run in project.runs.list()
if run["flow_id"] == flow_id
][0]["logs"][0]
datature.runs.get_logs(log_id)get_confusion_matrix
get_confusion_matrixRetrieves a training confusion matrix using the run ID.
Parameters
| Name | Type | Description |
|---|---|---|
run_id | str | The ID of the training run. |
Return
A dictionary containing the specific training matrix JSON string with the following structure:
RunConfusionMatrix(
"object" = "confusionMatrix",
"data" = "{"0":[{"id":"RBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"WBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Platelets","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"boat","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Background","data":[{"x":"RBC","y":302},{"x":"WBC","y":27},{"x":"Platelets","y":22},{"x":"boat","y":2},{"x":"Background","y":0}]}]}"
)The id represents the column name and the subsequent data represents the array of elements, where x is the name of the row and y represents the value of the element in the confusion matrix.
Examples
- Retrieve the training confusion matrix using the run ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get_confusion_matrix("run_63eb212ff0f856bf95085095")Updated about 1 month ago