Model Training Management
list
list
Lists all training runs regardless of status.
Return
A list of msgspec structs containing the training run metadata with the following structure:
[Run(
id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_63bbd3bf8a78eb906f417396',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_T4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_ALWAYS_SAVE_LATEST',
evaluation_interval=250,
metric=None
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
status=RunStatus(
overview='Creating',
message='Creating service.',
update_data=1705466384796
),
create_date=1705466384796,
update_date=1705466392067,
log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)]
Examples
- List all trainings:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.list()
- List the latest created training:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
latest_training = max(project.runs.list(), key=lambda x: x["create_date"])
get
get
Retrieves a specific training run using the run ID.
Parameters
Name | Type | Description |
---|---|---|
run_id | str | The ID of the training run. |
Return
A msgspec struct containing the specific training run metadata with the following structure:
Run(
id='run_610ba26c-6fbb-42eb-b838-839c61b68b26',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_63bbd3bf8a78eb906f417396',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_T4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_ALWAYS_SAVE_LATEST',
evaluation_interval=250,
metric=None
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
features=RunFeatures(preview=True, matrix=True, using_sliding_window=False),
status=RunStatus(
overview='Creating',
message='Creating service.',
update_data=1705466384796
),
create_date=1705466384796,
update_date=1705466392067,
log_ids=['runlog_UjYxMGJhMjZjLTZmYmItNDJlYi1iODM4LTgzOWM2MWI2OGIyNg']
)
Examples
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get("run_610ba26c-6fbb-42eb-b838-839c61b68b26")
kill
kill
Kills a specific training run using the run ID.
Killed trainings cannot be resumed - all training progress of a killed training will be lost. You will need to restart the training from scratch if you wish to run another training. This action cannot be undone, so only kill your training when you are absolutely sure.
Parameters
Name | Type | Description |
---|---|---|
run_id | str | The ID of the training run. |
Return
A msgspec struct containing the killed training metadata with the following structure:
Run(
id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_64e812a7e47592ef374cbbc2',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_L4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_LOWEST_VALIDATION_LOSS',
evaluation_interval=220,
metric='Loss/total_loss'
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
status=RunStatus(
overview='Cancelled',
message='Training cancelled.',
update_data=1700204316180
),
create_date=1701927649302,
update_date=1701927649302,
features=RunFeatures(preview=True, matrix=True),
log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)
Examples
- Kill a specific training by run ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.kill("run_63eb212ff0f856bf95085095")
- Kill the latest created training:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
latest_training_id = max(project.runs.list(),
key=lambda x: x["create_date"])["id"]
project.runs.kill(latest_training_id)
start
start
Starts a new training run from a specific workflow using the flow ID.
Parameters
Name | Type | Description |
---|---|---|
flow_id | str | The ID of the workflow. |
setup | object | The metadata of the training. |
Return
A msgspec struct containing the newly-initialized training run metadata with the following structure:
Run(
id='run_e2a14cee-eacc-4335-bc95-94c3ee196b04',
project_id='proj_cd067221d5a6e4007ccbb4afb5966535',
flow_id='flow_64e812a7e47592ef374cbbc2',
execution=RunExecution(
accelerator=RunAccelerator(name='GPU_L4', count=1),
checkpoint=RunCheckpoint(
strategy='STRAT_LOWEST_VALIDATION_LOSS',
evaluation_interval=220,
metric='Loss/total_loss'
),
limit=RunLimit(metric='LIM_NONE', value=0)
),
status=RunStatus(
overview='Creating',
message='Training starting.',
update_data=1700204316180
),
create_date=1701927649302,
update_date=1701927649302,
features=RunFeatures(preview=True, matrix=True),
log_ids=['runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA']
)
Examples
- Start training with a specific workflow ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.start("flow_63d0f2d5fb1f9189db9b1c4b", {
"accelerator": {
"name": "GPU_T4",
"count": 1
},
"checkpoint": {
"strategy": "STRAT_ALWAYS_SAVE_LATEST",
"evaluation_interval": 250
},
"limit": {
"metric": "LIM_NONE",
"value": 0
},
"preview": True,
"matrix": True
})
- Start a training for the workflow named "My Awesome Workflow":
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
flow_id = [
flow for flow in project.workflows.list()
if flow.title == "My Awesome Workflow"
][0]["id"]
project.runs.start(flow_id, {
"accelerator": {
"name": "GPU_T4",
"count": 1
},
"checkpoint": {
"strategy": "STRAT_ALWAYS_SAVE_LATEST",
"evaluation_interval": 200
},
"limit": {
"metric": "LIM_NONE",
"value": 0
},
"preview": True,
"matrix": True
})
get_logs
get_logs
Retrieves a specific training log using the log ID.
Parameters
Name | Type | Description |
---|---|---|
log_id | str | The ID of the training log. |
Return
A msgspec struct with the specific training log metadata with the following structure:
RunLogs(
id='runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA',
logs=[
{
'ev': 'trainingCheckpoint',
't': 1700200466147,
'pl': {
'step': 0,
'log': 'Step 0, totalLoss: 6.4055E+01, boxLoss: 5.3928E+00, classificationLoss: 5.2646E+01, distributedFocalLoss: 6.0163E+00.',
'totalLoss': 64.055,
'boxLoss': 5.3928,
'classificationLoss': 52.646,
'distributedFocalLoss': 6.0163
}
}
]
)
Examples
- Retrieve training logs by specific log ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get_logs("runlog_UmUyYTE0Y2VlLWVhY2MtNDMzNS1iYzk1LTk0YzNlZTE5NmIwNA")
- Retrieve training logs for the workflow named "My Awesome Workflow":
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
flow_id = [
flow for flow in project.workflows.list()
if flow.title == "My Awesome Workflow"
][0]["id"]
log_id = [
run for run in project.runs.list()
if run["flow_id"] == flow_id
][0]["logs"][0]
datature.runs.get_logs(log_id)
get_confusion_matrix
get_confusion_matrix
Retrieves a training confusion matrix using the run ID.
Parameters
Name | Type | Description |
---|---|---|
run_id | str | The ID of the training run. |
Return
A dictionary containing the specific training matrix JSON string with the following structure:
RunConfusionMatrix(
"object" = "confusionMatrix",
"data" = "{"0":[{"id":"RBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"WBC","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Platelets","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"boat","data":[{"x":"RBC","y":0},{"x":"WBC","y":0},{"x":"Platelets","y":0},{"x":"boat","y":0},{"x":"Background","y":0}]},{"id":"Background","data":[{"x":"RBC","y":302},{"x":"WBC","y":27},{"x":"Platelets","y":22},{"x":"boat","y":2},{"x":"Background","y":0}]}]}"
)
The id
represents the column name and the subsequent data
represents the array of elements, where x
is the name of the row and y
represents the value of the element in the confusion matrix.
Examples
- Retrieve the training confusion matrix using the run ID:
from datature.nexus import Client
project = Client("5aa41e8ba........").get_project("proj_b705a........")
project.runs.get_confusion_matrix("run_63eb212ff0f856bf95085095")
Updated 11 months ago