What is Deep Learning?

Machine learning is an approach in which models are trained on training input data to learn to accomplish certain tasks in that data space. Deep learning is a subsection of machine learning based on artificial neural networks as a way to learn representations of the space of interest. Deep learning architectures like convolutional neural networks have been the foundation of the wave of state-of-the-art improvements in computer vision tasks and challenges. In this section, we will outline the general process of a deep learning pipeline, different deep learning architectures and techniques, and why you would want to choose a deep learning approach to solve your computer vision tasks or problems.

Machine Learning Pipeline

Machine learning pipelines begin with the development of a training dataset. It is important to have a high quality, representative, well annotated training dataset for the computer vision task that the model is meant to solve.

Once a dataset has been compiled, one can artificially augment the dataset with data modifying techniques, such that the model receives more diverse data input and is more robust towards different data inputs.

Once the data is prepared, one typically needs to select or design a deep neural network that is suitable for your use case. Some general considerations are how robust you need the model to be, how long you can train the model for, and whether the model produces an output that is tuned for the context of your task.

Finally, the model can be trained through various settings. Once the model is trained, you should evaluate the model on a separate test dataset that is distinct from your training set. From there you might want to make changes to your model setup based on the results of the evaluation and retrain the model.

Once you are happy with your trained model, it can now be implemented and deployed to your use case.

Deep Learning Architecture

The base version of an artificial neural network is a multilayer perceptron (MLP) neural network, which is a series of layers of neurons that are all connected to each of the neurons in the next layer. This means that every neuron in the network is fully connected in some way to another neuron, so in theory, given enough neurons, an MLP can be tuned to model any function because it has enough adjustable parameters to adapt to any input. However, in practice, this means that MLPs are susceptible to overfitting to their training data and don't generalize well to related but unseen data input.

Convolutional neural networks are a type of artificial neural network that are used ubiquitously in computer vision deep learning techniques. They are essentially regularized versions of MLPs in that they take advantage of hierarchical patterns in data to collate patterns of increasing complexity using smaller and simpler pattern detectors in the form of small filters that go through the data. This kind of hierarchical pattern extraction is well suited for extracting multi-level features in visual data, going from tracking edges and contours initially to full object representations in later stages.

State-of-the-art deep learning computer vision approaches are all reliant on this feature extraction fundamentally, and are continuing to improve in the traditional Computer Vision tasks but also in novel applications as well.

If you are ready to try out the power of deep learning yourself, go to Quickstart . If you want to see the incredible feats that have been accomplished with deep learning and computer vision, go to Use Cases.