Advanced Evaluation for Model Performance
Evaluation Preview
Next to the Metrics Tab, you can now select a new feature which shows sample inference by the model at evaluation checkpoints through direct comparison between Ground Truth and Checkpoint Prediction. On the Controls Panel on the right, you can hide and show the labels, display the tag name as well as the confidence metric for how confident the model is in the prediction, as well as a sliding bar to compare inference at each evaluation checkpoint for the same image to analyse how the model is progressing in learning to perform your chosen tasks.
The image below displays an example of the Evaluation Preview page.
Confusion Matrix
Nexus’ Confusion Matrix can be used as another form of evaluation during training, alongside other features like our real-time training dashboard and Advanced Evaluation. During training, the Confusion Matrix tab can be found at the top of the training dashboard, in the Run page, as shown below.
Our confusion matrix is computed in exactly the methods described here, with ground truth classes being represented as columns, and the prediction classes represented as rows. To aid in ease of interpretability, it uses color gradients to highlight differences in distributions, where lighter colors represent low proportions and darker colors represent higher proportions.
Background Class
To improve the user experience, we’ve also added a few options to improve interpretability. In the default view, the background class entries are omitted, but these can be toggled with ‘Yes’ or ‘No’ buttons under Background Class. The background class represents any area in which there is no object of interest. How area is interpreted depends on what computer vision task is involved. For classification, background class can be assigned to a whole image. For semantic segmentation, background class is assigned on a per-pixel basis. For instance segmentation, object detection, and keypoint detection, background class is assigned to spatial regions that have no class instances in them, so if an annotation is assigned background class as the ground-truth, that means there was no annotation at all. If a predicted annotation has the background class, then it means there was no prediction made.
Additionally, in the default view, the confusion matrix is normalized based on ground truth, meaning that the numbers are percentages calculated row-wise. This means that the sum of all the entries over the row is 100%. These normalized values provide the proportion of confusion to other classes for each model’s predicted class, as well as the precision per class along the diagonal. This will automatically update to include or not include the background class, depending on whether it is toggled on or off. The percentage view allows users to more easily perceive the distribution of results that can occur. One can still view the raw computed values before normalizing by selecting Absolute. This will allow users to view and verify the underlying values.
Finally, similar to Advanced Evaluation, one can scroll across the evaluation checkpoints to see how the confusion matrix for the model has evolved. Ideally, one should observe that the proportions for each row should converge towards the downward diagonal, such that prediction and ground truth classes are maximally aligned.
Training Log
Another tab that will provide more transparency for the user is our direct logs for the training data as it is ongoing. This makes it easy for you to copy and parse results for your documentation and shows transparency on the training processes on our end to show exactly the logs we register and the data we use to render our graphs.
Recommendations
View Model Finetuning Workflow
You can view the workflow and training settings so that you can compare model performances across different hyperparameter settings.
Assets With Lowest Confidence Scores
Low-confidence sampling is a technique used in training computer vision models to identify instances where the model exhibits uncertainty in its predictions. These low-confidence samples often stem from ambiguous data or challenging features within the dataset.
By pinpointing these instances, we can update the dataset to better represent such scenarios in future training iterations. This approach increases the model's robustness and accuracy on complex data. Low-confidence sampling involves monitoring the model’s prediction scores for each image, focusing on those with confidence scores below a specified threshold.
Once training is complete, go to the Recommendations tab and navigate to the Assets with Lowest Confidence Scores section. This highlights up to 12 assets in the evaluation dataset with the lowest average prediction scores, allowing you to visualize areas where the model shows uncertainty. You can use the confidence threshold slider to filter out high-confidence predictions, helping you focus on the objects that challenge the model the most.
Confidence Score Histogram of Evaluation Dataset
This histogram shows the number of assets within a certain confidence band, providing insight into the overall asset confidence distribution, as well as the classes that contribute most to the low-confidence predictions. By focusing on these class-level insights, you can address imbalances or weaknesses in certain categories that may be hidden by overall performance metrics. This approach ensures that the model performs consistently well across all classes, which is especially important in real-world applications where success often depends on the model’s ability to handle minority or challenging classes effectively.
Training Recommendations
[Coming Soon!] The results of each training run will be supplemented by AI-powered recommendations, allowing you to know exactly what's wrong with the model or your dataset, and take action on our platform immediately.
Updated 3 months ago