Convolutional Neural Networks: Skillful Mimicry

It’s undeniable: modern computers still pale in comparison to the human brain when given many problems. But as artificial intelligence (AI) advances and proliferates, one is compelled to wonder how these two intelligences are connected. It seems a natural goal to develop a program that mimics the brain, but many researchers have dismissed this approach as impractical. Instead, AI has often departed from its biologically inspired roots, borrowing only when necessary from the brain’s example. Its purpose is supposedly clear: to rival the practical capabilities of the brain, even if in an entirely different fashion.

A recent indicator of accelerated progress in AI is ImageNet, a database of over 14 million labeled images designed for computer vision research. Over the past five years, participants in the Large Scale Visual Recognition Challenge (ILSVRC) have invented statistical models that exceed 95% accuracy in image classification. Using these techniques, computers have outperformed humans in identifying certain objects in images. The same classifiers can run on cell-phone hardware in near real-time, making them an inexpensive and versatile choice for deployment in jobs where constant vigilance is necessary, such as in transportation and security.

The enabling technology for this startling progress in computer vision is the convolutional neural network or CNN. A variant of the age-old neural network model, the CNN takes inspiration from the layered structure of the brain’s visual cortex, a discovery of Hubel and Wiesel. In CNNs and the human visual cortex alike, the image flows through a deep hierarchy of neural layers of simultaneously growing specificity and abstraction. At the network’s input, neurons notice small details like edges, transmitting information into a second layer that processes corners, and then into a layer that detects specific combinations of previous features, and so on. At the output of the network one can find “grandmother neurons,�? each of which signals the presence of a particular kind of object.

This explains how a CNN functions in practice, but how are these statistical models “taught�? to be so accurate? In essence, researchers feed many labeled images (training examples) into the network. They then apply machine learning, updating the network’s parameters with each image to improve the accuracy and confidence of classification. The result of many iterations is a network that accurately classifies training data, and if well configured, one that generalizes to images it has not yet encountered.

CNNs have come into widespread use only recently, and the reason is threefold. In the last decade, Moore’s law and the invention of parallel processors like GPUs have reduced CNNs’ lengthy training times; the amassment of large datasets like ImageNet has allowed for deeper, more expansive networks; and further theoretical research has improved the overall quality of CNN classification.

With potential deployment in street imaging, surveillance, medical imaging, traffic monitoring, and self-driving vehicles, CNNs prove to be a highly practical technology. But beyond all the problems they solve, CNNs challenge the notion that artificial intelligence is alien intelligence, or that just because a computer is made of transistors and a brain is made of cells, their ways of processing information must also be different. Is it a coincidence that silicon’s image recognition works like the brain’s? Whether an emergent property, a mathematical optimum, or simply a convenient starting point for investigation, CNNs’ biological mimicry is just one instance of the interplay between abstract computing and the natural world—a connection we are only beginning to unearth.