Active Learning

What is Active Learning?

Active learning is an iterative machine learning approach where the model identifies the data points it is least confident about and prioritizes them for human review. Rather than labeling large volumes of data indiscriminately, active learning enables a feedback loop between your deployed model and your annotation workflow, ensuring that every new label contributes the maximum possible improvement to model performance.

When active learning is enabled on an API deployment, each inference request is evaluated using a chosen evaluation strategy. Predictions that fall below a defined confidence threshold are flagged as uncertain and automatically routed back into your Nexus project for review and re-annotation.

Why Active Learning Matters

Training a high-performing computer vision model requires high-quality labeled data. However, labeling is expensive and time-consuming. Active learning addresses this by focusing annotation effort where it matters most:

BenefitDescription
Reduced Labeling CostOnly uncertain or low-confidence predictions are surfaced for review, dramatically reducing the volume of data that requires manual annotation.
Faster Model ImprovementBy targeting the samples the model struggles with, each retraining cycle yields larger accuracy gains compared to random sampling.
Continuous Learning LoopDeployed models continuously feed difficult examples back into the training pipeline, creating a self-improving cycle without manual intervention.
Better Data CoverageActive learning naturally identifies edge cases and underrepresented scenarios in your dataset that would otherwise go unnoticed.

How Active Learning Works on Nexus

The active learning pipeline on Nexus operates through the following steps:

  1. Deploy your model as an API: Follow the steps in Deploying Your Trained Model as an API to create an API deployment from your trained artifact.
  2. Configure evaluation settings: During deployment, set the Evaluation Strategy, Evaluation Threshold, and Upload Group (see Configuration Options below).
  3. Run inference: Send prediction requests to your deployed API as described in Using and Managing Your Deployed API.
  4. Automatic evaluation: Each prediction is scored using the selected evaluation strategy. Predictions with scores below the threshold are flagged as uncertain.
  5. Route to annotation: Flagged assets are automatically uploaded to the designated Upload Group in your Nexus project, ready for review and re-annotation.
  6. Retrain and redeploy: Once the newly annotated data is added to your dataset, retrain your model and redeploy to continue the improvement cycle.

Configuration Options

Active learning is configured during the API deployment creation process. The relevant options are:

OptionDescription
Evaluation StrategyThe metric used to assess prediction uncertainty. Currently defaults to entropy score, which measures the disorder or uncertainty in the output probability distribution. Higher entropy indicates lower confidence.
Evaluation ThresholdA numeric threshold that determines when a prediction is considered uncertain. Predictions with scores below this threshold are flagged for review. A higher threshold flags more samples; a lower threshold only captures the most uncertain ones.
Upload GroupsThe asset group within your Nexus project where flagged assets are automatically uploaded. Assigning a dedicated group helps organize and track active learning samples separately from your primary dataset.
📘

Choosing a Threshold

Start with a moderate threshold and adjust based on the volume of flagged samples and your annotation capacity. If too many samples are being flagged, lower the threshold. If model improvement plateaus, raise it to capture more edge cases.

Best Practices

  • Start early: Enable active learning from your first deployment to begin capturing difficult samples immediately.
  • Use dedicated upload groups: Keep active learning samples separate so you can track their impact on retraining.
  • Review flagged samples regularly: The value of active learning depends on timely annotation of the surfaced data.
  • Monitor model metrics across cycles: Track precision, recall, and mAP across retraining iterations to measure the impact of active learning on your model.
  • Adjust thresholds over time: As your model improves, you may need to lower the threshold to continue surfacing meaningful samples.