Deep Convolutional Neural Networks


Deep Convolutional Neural Networks such as AlexNet are used to analyse visual imagery. The AlexNet architecture is much larger than previous convolutional neural networks (CNNs) used for computer vision tasks back in 2012.  At that time, AlexNet had 60 million parameters and 650,000 neurons and it took almost 6 days to train on two GeForce GTX 580 (3GB) graphics processing units. Currently, complex CNNs (ImageNet, ResNets) can run faster on GPUs containing vast datasets. These neural networks are used to perform object recognition within scenes, cluster images by photo searching and the classification of images by name.  For example, CNNs can be used to recognise faces and street signs. In medicine, they can be used to identify tumours and other visual data.  Furthermore, CNNs are leading major advances in computer vision, especially in the areas of self-driving cars, drones, robotics, cyber security, treatments for the visually impaired and medical diagnosis.  Convolutional networks can perform optical character recognition (OCR) to digitise text.  Deep Convolutional Neural Networks are simply deep neural networks made up of numerous layers.  However, CNN is a neural network using convolution and pooling layers. The convolution layer combines an area or elements of input data into a smaller area to extract a feature. It is achieved by filtering an area, which is the same as to multiplying weights to input data. The pooling layer selects data with the highest value within an area. These layers act to extract an important feature from the input for classification.