Convolutional Neural Network (CNN) or Conve Net
A convolutional neural network is a subset of ML. It is one of the Artificial Neural Networks (ANN) which are used for different applications and data types. A CNN is a kind of network architecture for deep learning algorithms and is specifically used for image recognition and tasks that involve the processing of pixel data.
The ANN is a core element of deep learning algorithms. Recurrent
neural network is one type of an ANN, that uses sequential or time series data
as input. It is suitable for applications involving natural language processing
(NLP), language translation, speech recognition and image captioning.
The CNN is another type of neural network that can expose key
information in both time series and image data. For this reason, it is highly
valuable for image-related tasks, such as image recognition, object classification
and pattern recognition. To identify patterns within an image, a CNN leverages
principles from linear algebra, such as matrix multiplication. CNNs can also
classify audio and signal data.
CNN layers
A deep learning CNN consists of three layers: a convolutional layer, a
pooling layer and a fully connected (FC) layer. The convolutional layer is the
first layer while the FC layer is the last.
From the convolutional layer to the FC layer, the complexity of the CNN
increases. It is this increasing complexity that allows the CNN to successively
identify larger portions and more complex features of an image until it finally
identifies the object in its entirety.
Convolutional layer: The
majority of computations happen in the convolutional layer, which is the core
building block of a CNN. A second convolutional layer can follow the initial
convolutional layer. The process of convolution involves a kernel or filter
inside this layer moving across the receptive fields of the image, checking if
a feature is present in the image.
Over multiple iterations, the kernel sweeps over the entire image. After
each iteration a dot product is calculated between the input pixels and the
filter. The final output from the series of dots is known as a feature map or
convolved feature. Ultimately, the image is converted into numerical values in
this layer, which allows the CNN to interpret the image and extract relevant
patterns from it.
Pooling layer. Like
the convolutional layer, the pooling layer also sweeps a kernel or filter
across the input image. But unlike the convolutional layer, the pooling layer
reduces the number of parameters in the input and also results in some information
loss. On the positive side, this layer reduces complexity and improves the
efficiency of the CNN.
Fully connected layer. The FC layer is where image classification happens in the CNN based on
the features extracted in the previous layers. Here, fully connected means
that all the inputs or nodes from one layer are connected to every activation
unit or node of the next layer.
All the layers in the CNN are not fully connected because it would
result in an unnecessarily dense network. It also would increase losses and
affect the output quality, and it would be computationally expensive.
How do CNN work?
A CNN can have multiple layers, each of which learns to detect the
different features of an input image. A filter or kernel is applied to each
image to produce an output that gets progressively better and more detailed
after each layer. In the lower layers, the filters can start as simple
features.
At each successive layer, the filters increase in complexity to check
and identify features that uniquely represent the input object. Thus, the
output of each convolved image -- the partially recognized image after each
layer -- becomes the input for the next layer. In the last layer, which is an
FC layer, the CNN recognizes the image or the object it represents.
With convolution, the input image goes through a set of these filters.
As each filter activates certain features from the image, it does its work and
passes on its output to the filter in the next layer. Each layer learns to
identify different features and the operations end up being repeated for
dozens, hundreds or even thousands of layers. Finally, all the image data
progressing through the CNN's multiple layers allow the CNN to identify the
entire object.
Benefits of using CNNs
for deep learning
Deep learning is a subset
of machine learning that uses neural networks with at least three layers.
Compared to a network with just one layer, a network with multiple layers can
deliver more accurate results. Both RNNs and CNNs are used in deep learning,
depending on the application.
For image recognition,
image classification and computer vision (CV) applications, CNNs are
particularly useful because they provide highly accurate results, especially
when a lot of data is involved. The CNN also learns the object's features in
successive iterations as the object data moves through the CNN's many layers.
This direct (and deep) learning eliminates the need for manual feature
extraction.
CNNs can be retrained for
new recognition tasks and built on preexisting networks. These advantages open
up new opportunities to use CNNs for real-world applications without increasing
computational complexities or costs.
Home