ANN-Artificial Neural Networks: A Deep Yet Intuitive Guide to the Foundations of Deep Learning
by Helen K Joy
Artificial Neural Networks (ANNs) are at the heart of modern artificial intelligence. From recognizing faces and translating languages to powering autonomous vehicles and medical diagnostics, ANNs enable machines to learn complex patterns directly from data. Despite their widespread use, the fundamental ideas behind neural networks are often misunderstood or viewed as overly complex.
This blog aims to explain artificial neural networks in a very basic, intuitive, and readable manner, bridging the gap between theory and understanding.
1. What Is an Artificial Neural Network?
Artificial Neural Networks are called ANNs because they are inspired by the structure and functioning of biological neural networks, particularly the human brain. The name reflects both a conceptual analogy and a historical motivation, rather than a claim that they perfectly replicate real neurons.
An Artificial Neural Network is a computational model designed to approximate an unknown target function. Suppose there exists a true function
2. Why Are Neural Networks Called “Networks”?
Neural networks are structured as a sequence of interconnected transformations. Each transformation applies a simple operation, but when composed together, they create a powerful system capable of modeling highly complex relationships.
It is called an Artificial Neural Network because:
- Neural: It is inspired by how biological neurons process information
- Network: It consists of many interconnected processing units
- Artificial: It is a computational, man-made system
Together, the name reflects the original idea of building machines that learn by imitating, in a simplified way, how the brain works.
Layers and Depth
Neural networks are organized into layers:
- Input layer: Receives raw data such as numerical values, pixels, or word embeddings.
- Hidden layers: Perform intermediate computations and extract increasingly abstract features.
- Output layer: Produces the final prediction or decision.
The number of hidden layers determines the depth of the network. Deep neural networks typically contain many hidden layers, allowing them to learn hierarchical representations of data.
Hidden Units and Width
Each hidden layer consists of multiple units, also called neurons. These neurons operate in parallel, and the number of neurons in a layer determines the width of the network.
- A narrow network may struggle to represent complex patterns.
- A wider network can capture richer representations but may require more data and computation.
Hidden units are not directly supervised. Instead, they organize themselves during training to support the final task.
3. The Role of Nonlinearity
An activation function determines whether a neuron should be activated and how strongly it should contribute to the next layer.
A neuron computes: Then applies:
Without nonlinearity, a neural network would collapse into a single linear transformation, regardless of how many layers it had. This would severely limit its expressive power.
To overcome this limitation, neural networks apply a nonlinear activation function after each affine transformation. Common activation functions include sigmoid, hyperbolic tangent, and Rectified Linear Units (ReLU).
The ReLU function is defined as:
Nonlinear activations enable neural networks to learn complex, non-linear decision boundaries and interactions within the data.
4. Learning Representations Automatically
One of the defining characteristics of deep learning is representation learning. Traditional machine learning approaches often rely on manually engineered features, crafted using domain expertise. Neural networks, in contrast, learn a transformation:
Get Helen K Joy’s stories in your inbox
Lower layers may learn simple features, while deeper layers combine them into more abstract concepts. This hierarchical learning process is what allows deep neural networks to excel in tasks such as image recognition and natural language processing.
5. Training Neural Networks
Training a neural network involves adjusting its parameters so that its predictions align with the training data.
Cost Functions
A cost function (also called a loss function or objective function) is a mathematical function that measures how well a neural network performs on a given task Most neural networks are trained using the principle of maximum likelihood estimation. This leads to cost functions such as:
- Negative log-likelihood
- Cross-entropy loss for classification tasks
The cost function quantifies how far the model’s predictions are from the true labels.
Loss function:Error for one training example
A loss function is a mathematical function that measures how incorrect a model’s prediction is for a single training example.
Why Is a Loss Function Important?
A loss function is important because it:
Quantifies error
It converts prediction mistakes into a numerical value.
Guides learning
The model updates its parameters to reduce the loss.
Enables backpropagation
Gradients of the loss are used to adjust weights.
Cost function: Average or total loss over the entire dataset
Backpropagation
Backpropagation is the algorithm that enables neural networks to learn efficiently. It works in two phases:
Forward propagation: Input data flows through the network to produce an output and compute the loss.
Backward propagation: The loss is propagated backward through the network using the chain rule of calculus, computing gradients with respect to each parameter.
These gradients indicate how each parameter should change to reduce the error.
Optimization Algorithms
Once gradients are computed, an optimization algorithm updates the parameters. Common optimizers include:
- Stochastic Gradient Descent (SGD)
- Adam
- RMSProp
Through repeated updates over many iterations, the network gradually improves its approximation of the target function.
6. Historical Development of Neural Networks
The evolution of neural networks can be divided into three major historical waves.
Cybernetics Era (1940s–1960s):Early research focused on biologically inspired models such as the McCulloch–Pitts neuron and the Perceptron. These models demonstrated that machines could learn simple decision rules but were limited to linearly separable problems.
Connectionism Era (1980s–1990s):This period emphasized networks of interconnected units and introduced multilayer architectures. The backpropagation algorithm became widely known, enabling the training of deeper networks. However, limited data and computational power restricted practical applications.
Deep Learning Era (2006–Present):The modern deep learning era began with breakthroughs in training deep architectures, combined with the availability of large datasets and powerful hardware such as GPUs. These advances transformed neural networks into practical and highly effective tools.
7. Key Theoretical Properties
Universal Approximation
The universal approximation theorem states that a feedforward neural network with at least one hidden layer and a suitable activation function can approximate any Borel measurable function to arbitrary accuracy, provided it has enough hidden units.This result highlights the expressive power of neural networks.
Distributed Representations
Neural networks rely on distributed representations, where information is encoded across many neurons rather than in a single unit. Each neuron participates in representing multiple concepts, allowing the network to generalize efficiently across related inputs.
8. An Intuitive Analogy
A neural network can be compared to a relay race team. The input layer starts the process by receiving raw information. Each hidden layer refines and transforms that information before passing it forward. The output layer delivers the final result. Each layer only interacts with its immediate neighbors, yet the final outcome depends on the combined effort of all layers.
Artificial Neural Networks are powerful function approximators built from simple, composable components. Their success lies in their ability to learn representations, incorporate nonlinearity, and optimize parameters using gradient-based methods. Understanding these foundations is essential for anyone studying deep learning, artificial intelligence, or modern data-driven systems.
https://medium.com/@helenjoy88/ann-artificial-neural-networks-a-deep-yet-intuitive-guide-to-the-foundations-of-deep-learning-6b20dbb79e92a>