Model training has 3 training stages and 3 possible error types, indicated by the status on the trainings page.
Once you start model training, the process will take several minutes to initialize. This includes setting up the instance, preprocessing the images, etc.
Upon successful initialization, model training will commence. You can monitor training performance through the graphs on the page, explained in the following sections.
Model training is completed. Model performance is displayed on the graphs, explained in the following sections.
If you see Status: Error Occurred, you should Contact Us!
This error occurs when the GPU has insufficient RAM to support model training. Ways to prevent model training error include:
- Reducing batch size
- Selecting a GPU with higher RAM
- Selecting multiple GPUs
- Choosing a smaller model
In general, the default options for each model will not result in model training errors.
Model training uses your quota for Compute Minutes. Once you hit the quota, model training will stop, even if your model is still training. Additionally, when your currently used compute minutes added with the compute minutes that are estimated to be used during a training that you are trying to start exceed the quota, the training will not be allowed to run. Therefore, please check your compute minutes and ensure that your usage is not near the maximum quota. If this is not the case for you, please contact us!
Saving trained models uses your quota for Artifacts Stored. Once you hit the quota, future trained models will no longer be saved as artifacts, even if your model is still training.
Click here for information on current usage and quota.
If you want to better understand the metrics that are being displayed during training, go to Evaluating Model Performance.
Go to Training Option : Checkpoint Strategy to see how to change the evaluation interval size.
You cannot change settings for your workflow mid-training. If you are certain that the change is necessary, you should delete the training and go to your workflow to change it.
Yes, the training will still continue even if you leave. To find the dashboard for your training again, go to the Training tab on the Project Overview sidebar, where you will be able to see the current status of your training, when it started, the general model framework it is operating on, and the number and type of GPU being used. If you click on the ... button on the bottom right, you can go to the workflow from which the training is running and Delete Training.
Updated 4 months ago