Module : Dataset
Dataset Setup
For any workflow, a Dataset module is necessary. To create one, right-click on the blank canvas, scroll to Datasets and select the relevant Dataset that you want your model to utilize. To see the options, you can select the Dataset block and a side menu will appear on the right like below.
Once the module appears on the canvas, you can select the module and change the following settings.
Options | Input | Description |
---|---|---|
Train-Test Split | 0.0 to 1.0 | Train-Test split refers to the size of the randomly selected proportion of data from your dataset that should be used for model evaluation. The other data not in that proportion is then used for the training. One should input some value between 0 and 1. |
Shuffle Dataset | Enable/Disable | Shuffling a dataset refers to randomly ordering the data for use in the dataset, such that the model receives the same data but not always in the same order to increase variability and robustness. |
Random Seed | Any non-negative integer i.e. 3338 | This is typically for experimental use to allow for reproducibility. If you set the Random Seed to the same value, the random generation or selection will be the same given all other variables and settings are the same. |
Include Background | Enable/Disable | All assets designated as background are included for training without annotations. This is typically used in negative class sampling to allow the model to train on negative examples where objects and regions of interest are not present, which can help with minimizing false positives. Disabled by default. |
Include Unlabelled Assets | Enable/Disable | Automatically designates unlabelled assets as background and includes them for training. This option is only shown if the Include Background option is enabled. |
Advanced Options
Options | Description |
---|---|
Define Asset Groups | Select Asset Groups that you want used for the training dataset, e.g. if you have asset groups main , Dataset 1 , Dataset 2 , where main usually contains all images and Dataset 1 , Dataset 2 contain specific sets of images within the overall dataset, you can select the specific group(s) Dataset 1 and/or Dataset 2 . |
Define Annotation Tags | Select specific annotation tags that you want to train specifically, e.g. you have tags RBC ,WBC , and Platelets and you want to train a model to only detect RBC and Platelets , you can select the corresponding tags here. |
Define Images for Evaluation | Specify images by image file name which will subsequently be used for tools like Advanced Evaluation for Model Performance. This allows for finer control of how you evaluate models, and acts as a benchmark dataset. You can name up to 9 images for evaluation. |
Sliding Window Setup
This sliding window option allows for your images to be sliced into smaller crops to allow your model to focus on individual crops in your dataset.
This facilitates your model in being able to detect relatively small objects and is well suited for image inputs with small objects or images with very high resolutions. To use the Sliding Window setup, select the toggle to Enable Sliding Window.
Advanced Options
Sliding Window also has the following advanced options which can be found through Show Advanced Options:
Options | Values | Description |
---|---|---|
Sliding Window Crop | 320x320 , 640x640 , 960x960 , 1024x1024 , 1280x1280 , 1600x1600 , 1920x1920 | This determines the size of the sliding window that is applied. As these will be used to train your models, you will be limited to the specific resolutions that can be fed into the models. |
Sliding Window Width Overlap Ratio | 0.1-0.95 | This determines the proportion of the crops that overlap horizontally with each other. |
Sliding Window Height Overlap Ratio | 0.1-0.95 | This determines the proportion of the crops that overlap vertically with each other. |
Annotation Completeness Threshold | 0.0-1.0 | This removes sliced annotations whose Intersection over Union (IoU) compared to their original annotation is below the threshold. |
Preview Sliding Window
You can preview your sliding window setup as well by selecting Preview Sliding Window, which will appear on the bottom of the page after you enabled the sliding window option.
To learn more about why you might want to use sliding windows, you can read this blog here!
Common Questions
How do I decide what Train-Test Split to select?
Literature typically suggests some number between 0.2 and 0.3. However, it depends on your use case and the results of your training. If after Evaluating Model Performance or Generating Predictions, you notice that the evaluation metrics are not representative of either training or test inference, you may want to increase the size of your test split to determine whether your model is evaluating correctly.
Will we be able to use different datasets?
You can now use these advanced options to select groups of data within your dataset for training and select specific images for evaluation as well.
Updated 2 months ago