Module : Dataset

Dataset Setup

For any workflow, a Dataset module is necessary. To create one, right-click on the blank canvas, scroll to Datasets and select the relevant Dataset that you want your model to utilize. To see the options, you can select the Dataset block and a side menu will appear on the right like below.

Once the module appears on the canvas, you can select the module and change the following settings.

OptionsInputDescription
Train-Test Split0.0 to 1.0Train-Test split refers to the size of the randomly selected proportion of data from your dataset that should be used for model evaluation. The other data not in that proportion is then used for the training. One should input some value between 0 and 1.
Shuffle DatasetEnable/DisableShuffling a dataset refers to randomly ordering the data for use in the dataset, such that the model receives the same data but not always in the same order to increase variability and robustness.
Random SeedAny non-negative integer i.e. 3338This is typically for experimental use to allow for reproducibility. If you set the Random Seed to the same value, the random generation or selection will be the same given all other variables and settings are the same.

Advanced Options

OptionsDescription
Define Asset GroupsSelect Asset Groups that you want used for the training dataset, e.g. if you have asset groups main, Dataset 1, Dataset 2, where main usually contains all images and Dataset 1, Dataset 2 contain specific sets of images within the overall dataset, you can select the specific group(s) Dataset 1 and/or Dataset 2.
Define Annotation TagsSelect specific annotation tags that you want to train specifically, e.g. you have tags RBC,WBC, and Platelets and you want to train a model to only detect RBC and Platelets, you can select the corresponding tags here.
Define Images for EvaluationSpecify images by image file name which will subsequently be used for tools like Advanced Evaluation for Model Performance. This allows for finer control of how you evaluate models, and acts as a benchmark dataset. You can name up to 9 images for evaluation.

Sliding Window Setup

This sliding window option allows for your images to be sliced into smaller crops to allow your model to focus on individual crops in your dataset.

This facilitates your model in being able to detect relatively small objects and is well suited for image inputs with small objects or images with very high resolutions. To use the Sliding Window setup, select the toggle to Enable Sliding Window.

Advanced Options

Sliding Window also has the following advanced options which can be found through Show Advanced Options:

OptionsValuesDescription
Sliding Window Crop320x320 , 640x640, 960x960, 1024x1024, 1280x1280, 1600x1600, 1920x1920This determines the size of the sliding window that is applied. As these will be used to train your models, you will be limited to the specific resolutions that can be fed into the models.
Sliding Window Width Overlap Ratio0.1-0.95This determines the proportion of the crops that overlap horizontally with each other.
Sliding Window Height Overlap Ratio0.1-0.95This determines the proportion of the crops that overlap vertically with each other.
Annotation Completeness Threshold0.0-1.0This removes sliced annotations whose Intersection over Union (IoU) compared to their original annotation is below the threshold.

Preview Sliding Window

You can preview your sliding window setup as well by selecting Preview Sliding Window, which will appear on the bottom of the page after you enabled the sliding window option.

To learn more about why you might want to use sliding windows, you can read this blog here!


Common Questions

How do I decide what Train-Test Split to select?

Literature typically suggests some number between 0.2 and 0.3. However, it depends on your use case and the results of your training. If after Evaluating Model Performance or Generating Predictions, you notice that the evaluation metrics are not representative of either training or test inference, you may want to increase the size of your test split to determine whether your model is evaluating correctly.

Will we be able to use different datasets?

You can now use these advanced options to select groups of data within your dataset for training and select specific images for evaluation as well.