Module : Model
Model Selection and Options
We understand that your projects may have different accuracy requirements, and you may (or may not) be willing to trade accuracy for computational complexity. Therefore we have provided 28 model architectures (and more in the future) ranging from lightweight to highly complex models. Selecting the appropriate model for your use case can help you improve time efficiency and computational cost while balancing high accuracy. The impact is further detailed in Improving Model Performance.
Each model also has options for training parameters.
Options | Input | Description |
---|---|---|
Batch Size | Any non-negative power of 2, e.g. 8 | Number of images or pieces of visual data that your model sees and trains upon at each training step. Your dataset is split up into batches of your predetermined batch size and trains on each batch. |
Training Steps | Any non-negative integer, recommended to be at least 500 | Number of times your model trains on your dataset. Each step corresponds to training on a single batch determined by your batch size. The number of training epochs is also shown in green, which indicates the number of times your model trains over your whole dataset. |
Max Detections Per Class | Any non-negative integer | Upper bound for the maximum number of instances per class that the model can make, so that the model can limit the list of possible output predictions. |
Advanced Options
There are also advanced options available for tuning specific hyperparameters.
Options | Input | Description |
---|---|---|
Solver / Optimizer | Choice of Momentum , SGD , and Adam depending on the model architecture. | Algorithm designed to efficiently update the weights of a model during training, typically using gradient descent. |
Learning Rate | Any real number between 0.0001 (1e-5 ) and 0.1 (1e-1 ). | Step size at which a model's weights are updated during the training process, effectively controlling how quickly or slowly a model learns from its training data. Larger values may quicken the process, but may suffer from non-convergence. Smaller values result in slower convergence, but training results may be sub-optimal if the optimization gets stuck in a local minimum. |
Momentum | Any real number between 0 and 1 | Technique used to accelerate the convergence of the training process by smoothing out the variations in the gradient updates over time. |
Scheduler | WarmCos (more options coming soon!) | Technique used to dynamically adjust the learning rate during the training process. It can help to avoid issues like slow convergence, oscillations, and overshooting the optimal parameter values. |
Checkpoint Selection | Pretrained weights as default with previously trained valid checkpoints listed below | Checkpoint selection allows for previously trained weights of the same model type to be used as the initial weights for a new training. This allows the new training to start off from a stronger baseline and help the model make smoother adjustments to the new dataset. |
For Video Classification Model Architectures
Options | Input | Description |
---|---|---|
Frame Size | Any integer between 1 and 120. | The total number of frames in the frame group that is fed to the model during each step. |
Frame Stride | Any integer between 1 and 100. | The sampling interval of the video when creating frame groups. |
Discard Threshold | Any real number between 0.1 and 1.0. | Frame groups with a total number of frames smaller than (Frame Size * Discard Threshold) are discarded; those above have their last frame duplicated to meet size requirements. |
General Model Selection Tips
In general, models with larger dimensions on the end of the name imply more robust, complex models that are capable of taking in more data and thus more capable of learning more complex features for prediction. If you need higher accuracy and more complexity for your use case, then you should opt for higher dimensionality. However, if compactness and quicker training and inference is more important to you, then you should consider smaller dimensions.
When model names include names like ResNet, MobileNet, or InceptionV2, these represent different backbone models that are responsible for extracting image features such that the rest of the model can utilise these features for their own different processes. As a general trend of the same idea as the above paragraph, MobileNet is the most compact, ResNet is in the middle, and the number next to it, like 50 in ResNet50 indicates how many layers the model has, so the higher the number, the more complex it is. The most complex and robust is InceptionV2.
Model outputs differ based on the task that they are designed to solve. Datature currently offers models for the following tasks:
Task | Subtype | Description | Output |
---|---|---|---|
Classification | Image | Classifies images with tags. | Outputs class tags. |
Video | Classifies videos with tags. | Outputs class tags. | |
Object Detection | Identifies objects in an image with bounding boxes and class tags. | Outputs bounding box coordinates and a class tag for each detected instance. | |
Semantic Segmentation | Used to describe which regions of pixels correspond to specific classes. | Outputs a mask array where each pixel has a value that is associated to a class. | |
Instance Segmentation | Used to describe which regions of pixels correspond to individual class instances. | Outputs a list of polygons with their associated class. | |
Keypoint Detection | Describes the pose and structure of an object using groups of keypoints joined together to form a skeleton. | Outputs a list of keypoints for each object with their associated class. |
Models
Classification
YOLOv8-CLS
YOLOv8 is an extension and improvement upon previous versions of the YOLO family of algorithms that is known for their real-time object detection capabilities. YOLOv8 builds upon the concepts of the original YOLO algorithm, aiming to improve both accuracy and speed. It incorporates advancements such as feature pyramid networks, spatial attention modules, and other architectural improvements to enhance the detection performance. YOLOv8-CLS contains a classification head used for image classification tasks.
Architecture | Resolution |
---|---|
YOLOv8-CLS Nano | 80x80 |
320x320 | |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-CLS Small | 80x80 |
320x320 | |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-CLS Medium | 80x80 |
320x320 | |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-CLS Large | 80x80 |
320x320 | |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-CLS Xtra | 80x80 |
320x320 | |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 |
MoViNet
MoViNet is a family of CNN model architectures with a focus on efficient video recognition, particularly suited for mobile devices. Its design prioritizes computational efficiency while maintaining high accuracy, making it ideal for tasks like real-time video analysis on smartphones and tablets.
Architecture | Resolution |
---|---|
MoViNet A0 | 172x172 |
MoViNet A1 | 172x172 |
MoViNet A2 | 224x224 |
MoViNet A3 | 256x256 |
MoViNet A4 | 290x290 |
MoViNet A5 | 320x320 |
Object Detection
YOLOv9 [New!]
YOLOv9 is an extension and improvement upon previous versions of the YOLO family of algorithms that is known for their real-time object detection capabilities. YOLOv9 builds upon the concepts of the original YOLO algorithm, aiming to improve both accuracy and speed. It incorporates advancements such as Programmable Gradient Information and Generalized Efficient Layer Aggregation Network to enhance the detection performance.
Architecture | Resolution |
---|---|
YOLOv9 Compact | |
YOLOv9 Extended |
YOLOv8
YOLOv8 is an extension and improvement upon previous versions of the YOLO family of algorithms that is known for their real-time object detection capabilities. YOLOv8 builds upon the concepts of the original YOLO algorithm, aiming to improve both accuracy and speed. It incorporates advancements such as feature pyramid networks, spatial attention modules, and other architectural improvements to enhance the detection performance.
Architecture | Resolution |
---|---|
YOLOv8 Nano | 320x320 |
640x640 | |
1280x1280 | |
2048x2048 | |
YOLOv8 Small | 320x320 |
640x640 | |
1280x1280 | |
2048x2048 | |
YOLOv8 Medium | 320x320 |
640x640 | |
1280x1280 | |
YOLOv8 Large | 320x320 |
640x640 | |
1280x1280 | |
YOLOv8 Xtra | 320x320 |
640x640 | |
1280x1280 |
RetinaNet
RetinaNet is a one-stage object detection model that utilises a focal loss function to address class imbalance in the training dataset. It has strong performances with dense and small scale objects.
Architecture | Resolution |
---|---|
RetinaResNet50 | 640x640 |
1024x1024 | |
RetinaResNet101 | 640x640 |
1024x1024 | |
RetinaResNet152 | 640x640 |
1024x1024 | |
Retina MobileNetV2 | 320x320 |
640x640 |
FasterRCNN
Faster R-CNN introduces a Region Proposal Network (RPN) that shares convolutional features with the detection network, enabling low-cost region proposals. Further, they merge this RPN with Fast R-CNN (another single end-to-end unified object detection network for quick object detection) to achieve high quality, rapid object detection results.
Architecture | Resolution |
---|---|
FasterRCNN ResNet50 | 640x640 |
1024x1024 | |
FasterRCNN ResNet101 | 640x640 |
1024x1024 | |
FasterRCNN ResNet152 | 640x640 |
1024x1024 | |
FasterRCNN InceptionV2 | 640x640 |
1024x1024 |
EfficientDet
EfficientDet is another object detection model which uses optimizations and scalable tweaks rather than additional modules to improve object detection. This is a model that is advantageous due to its model efficiency and ability to scale adaptively.
Architecture | Resolution |
---|---|
EfficientDetD1 | 640x640 |
EfficientDetD2 | 768x768 |
EfficientDetD3 | 896x896 |
EfficientDetD4 | 1024x1024 |
EfficientDetD5 | 1280x1280 |
EfficientDetD6 | 1408x1408 |
EfficientDetD7 | 1536x1536 |
YOLOv4 [DEPRECATED]
YOLOv4 is one of the newer one-stage object detection models running on DarkNet, which has achieved more improvements in the tradeoff in speed and accuracy of detection.
Architecture | Resolution |
---|---|
YOLOv4 DarkNet | 320x320 |
640x640 |
YOLOX [DEPRECATED]
YOLOX is an anchor-free version of YOLO, with a simpler design but better performance that makes several modifications to YOLOv3.
Architecture | Resolution |
---|---|
YOLOX Small | 320x320 |
640x640 | |
YOLOX Medium | 320x320 |
640x640 | |
YOLOX Large | 320x320 |
640x640 |
Semantic Segmentation
DeepLabV3 Semantic Segmentation
DeepLabv3 is a semantic segmentation architecture with improvements to handle the problem of segmenting objects at multiple scales.
Architecture | Resolution |
---|---|
DeepLabV3 ResNet50 | 320x320 |
640x640 | |
1024x1024 | |
1600x1600 | |
1920x1920 | |
DeepLabV3 ResNet101 | 320x320 |
640x640 | |
1024x1024 | |
1600x1600 | |
1920x1920 | |
DeepLabV3 MobileNetV3 | 320x320 |
640x640 | |
1024x1024 | |
1600x1600 | |
1920x1920 |
UNet Semantic Segmentation
U-Net is a semantic segmentation architecture. It consists of a contracting path and an expansive path that consider both typical features from a convolutional network and a progressively upsampled feature map to improve detail.
Architecture | Resolution |
---|---|
UNet ResNet50 | 320x320 |
640x640 | |
1024x1024 | |
1600x1600 | |
1920x1920 |
FCN Semantic Segmentation
Fully Convolutional Network is a semantic segmentation architecture. It exclusively uses locally connected layers, such as convolution, pooling, and upsampling, and avoids the use of dense layers. This makes it faster to train and reduces parameter size.
Architecture | Resolution |
---|---|
FCN ResNet50 | 320x320 |
640x640 | |
960x960 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
FCN ResNet101 | 320x320 |
640x640 | |
960x960 | |
1280x1280 | |
1600x1600 | |
1920x1920 |
Instance Segmentation
YOLOv8-SEG
YOLOv8 is an extension and improvement upon previous versions of the YOLO family of algorithms that is known for their real-time object detection capabilities. YOLOv8 builds upon the concepts of the original YOLO algorithm, aiming to improve both accuracy and speed. It incorporates advancements such as feature pyramid networks, spatial attention modules, and other architectural improvements to enhance the detection performance.
Architecture | Resolution |
---|---|
YOLOv8-SEG Nano | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-SEG Small | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-SEG Medium | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-SEG Large | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
YOLOv8-SEG Xtra | 320x320 |
640x640 | |
1280x1280 | |
1920x1920 |
MaskRCNN
Architecture | Resolution |
---|---|
MaskRCNN InceptionV2 | 1024x1024 |
MaskRCNN is Datature's instance segmentation model designed for predicting segmentation masks using RCNN as the base model.
Keypoint Detection
YOLOv8-Pose
Architecture | Resolution |
---|---|
YOLOv8-Pose Nano | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-Pose Small | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-Pose Medium | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-Pose Large | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 | |
YOLOv8-Pose Xtra | 320x320 |
640x640 | |
1280x1280 | |
1600x1600 | |
1920x1920 |
Updated 8 months ago