1. Introduction

In this article, we’ll explain what the term “depth” refers to when it comes to convolutional neural networks. Also, we’ll explain the difference between the depth of the whole neural network and the depth of the convolutional layer.

2. Neural Networks

Neural networks are algorithms explicitly created as an inspiration for biological neural networks. The basis of neural networks are neurons that interconnect according to the type of network. Initially, the idea was to create an artificial system that would function just like the human brain.

There are many types of neural networks, but they roughly fall into three main classes:

For the most part, the difference between them is the type of neurons that form them and how the information flows through the network. In this article, we’ll briefly explain only convolutional neural networks.

3. Convolutional Neural Networks

Convolutional neural networks (CNN) are a type of artificial neural network, a machine learning technique. They’ve been around for a while but have recently gained more exposure because of their success in image recognition. A convolutional neural network is a powerful tool that we can use to process any data to apply the convolution operation.

The success of CNN is because they can process large amounts of data such as images, videos, and text. Primarily, we can use them to classify images, localize objects, and extract features from the image, such as edges or corners. They’re typically composed of one or more hidden layers, each of which contains a set of learnable filters called neurons.

4. Depth in Convolutional Neural Networks

When it comes to CNN, the term “depth” can be found in the literature in two contexts:

  • Depth of the whole neural network.
  • Depth of the one convolutional layer.

Below, we’ll explain both of the concepts.

4.1. Neural Network Depth

Neural Networks consist of layers where each layer has multiple neurons. The number of layers in a neural network defines its depth. Also, a neural network must have at least two layers:

  • Input layer  – it brings the input data into the system and represents the beginning of the neural network architecture.
  • Output layer – this is the last layer in the neural networks, and it produces the result of a model.

In addition, all layers different from the input and output layers are hidden layers. It’s common that CNN has around five to ten layers, but some modern architectures have up to one hundred layers.

4.2. Depth of the Convolutional Layer

A convolutional layer is a layer where we apply filters to the input images or tensors. We can visualize this process more intuitively by looking at the following figure:

conv

The figure above shows the matrix I to apply the convolution using filter K. This means that filter K passes through matrix I, and an element-by-element multiplication is applied between the corresponding element of the matrix I and filter K. Then we sum the results of this multiplication into a number.

Usually, the inputs to the CNN are color images. They consist of the three channels that represent the intensity of red, green, and blue colors. Every pixel in the image combines these three colors, where the intensity of the color is described with an integer number from 0 to 255.

Hence, input images have their width, height, and depth. The depth of the input images defines the depth of the input layer. Consequently, the depth of the second layer depends on the number of kernels we used in the input layer.

For instance, let input image I has dimension 6\times 6\times 3 and filter or kernel K has dimension 3\times 3\times 3. Notice that the kernel depth must be the same as the depth of the input image. Let the convolution between I and K be matrix R. With a kernel step of one, the matrix R has a dimension 4\times 4\times 1.

Similarly, if we apply two filters K_{1} and K_{2}, we’ll get two result matrices R_{1} and R_{2}. After that, we stack together R_{1} and R_{2} into one tensor R with the dimension 4\times 4\times 2. Analogous to that, if we apply k filters, the output tensor will have dimension 4\times 4\times k, where k defines depth:

conv depth 1

5. Conclusion

In this short article, we presented the relationship between the term “depth” and CNN’s. Dimensions such as width, height, and depth often sound confusing for beginners, and because of that, we provided a simple example with illustrations.