Artificial neural network: Learning

Theoretical basis

The concept of learning, although already known from Sumer, is not modeled in the context of deductive logic: it indeed proceeds from knowledge already established which draws on knowledge derived. But this is the opposite approach: by limited observations, make generalizations plausible: it is a process by induction.

The concept of learning encompasses two realities often treated in succession:

  • data: the fact to assimilate in a dense form of many possible examples,
  • generalization being able, through examples learned to deal with different examples, yet unmet, but similar.

These two points are partly in opposition. If one prefers, we will develop a system that does not necessarily deal very effectively another.

In the case of statistical learning systems, used to optimize conventional statistical models, neural networks and Markov automata is the generalization which is the object of attention.

This notion of generalization is treated more or less complete by several theoretical approaches.

  • The generalization is treated in a comprehensive and generic in the theory of statistical regularization introduced by Vladimir Vapnik. This theory, originally developed in the Soviet Union, has played in the West since the fall of the Wall. The theory of statistical adjustment was widely circulated among those who study the neural networks because of the generic form of curves of residual errors of learning and generalization from the iterative learning procedures such as raids gradient used for the optimization of multi-layer perceptrons. These forms are generic forms provided by the theory of statistical adjustment, this is because the learning processes by gradient descent, starting from an initial configuration of synaptic weights gradually explore the space of possible synaptic weights, we found then the problem of the gradual increase in learning ability, fundamental concept at the heart of the theory of statistical adjustment.
  • Generalization is also at the heart of the approach of the Bayesian inference, taught for longer. The theorem of Cox-Jaynes thus provides an important basis for such learning, we learn that any method of learning is either isomorphic to probabilities fitted relationship Bayes is inconsistent. This result is extremely strong, and so the Bayesian methods are widely used in the field.

Class of solvable problems

Depending on the network structure, different types of function are approachable through networks of neurons:

Functions representable by a perceptron

A perceptron (network to a unit) may represent the following Boolean functions: AND, OR, NAND, NOR, but not XOR. Since any Boolean function can be represented using AND, OR, NAND and NOR, a network of perceptrons can represent all Boolean functions.

Functions representable by multilayer neural networks acyclic

  • Boolean Functions: All Boolean functions can be represented by a network of two layers. At worst, the number of neurons in the hidden layer increases exponentially with the number of entries.
  • Continuous functions: All bounded continuous functions can be represented with arbitrary precision, by a two-layer network (Cybenko, 1989). This theorem applies to the network whose neurons use the sigmoid in the hidden layer neurons and linear (no threshold) in the output layer. The number of neurons in the hidden layer depends on the function to approximate.
  • Arbitrary function: Any function can be approximated with arbitrary precision using a 3-layer network (Cybenko theorem, 1988).


The large majority of neural network algorithm has a "training" that is to change the synaptic weights according to a set of data presented in the network input. The purpose of this training is to enable the neural network "learn" from examples. If the training is successfully completed, the network is able to provide answers output values very close to origins of the game of training data.

But the whole point of neural networks is their ability to generalize from the test set.

We can use a neural network to perform a memory is called memory neurons.

The vision topology of learning is the determination of the hypersurface on  where  is the set of real numbers and n is the number of entries in the network.


Often the examples of the training set contain approximate or noisy. If it requires the network to respond almost perfect on these examples, we can get a network that is biased by false values. For example, imagine that this network of pairs (xi, f (xi)) located on a line y = ax + b, but noisy so that items are not exactly right. If there is a good learning, the network responds ax + b for any value of x presented. If there is overfitting, the network responds a little more than ax + b or a little less, for each pair (xi, f (xi) positioned outside of the right will influence the decision. To avoid overfitting, it is a simple: just share the basic examples in 2 subsets. The first is learning and the 2nd is used to assess learning. As long as the error obtained on the 2nd set decreases, we can continue learning, otherwise it stops.


The backpropagation backpropagation is the error committed by a neuron to its synapses and neurons that are connected. For neural networks, one usually uses the backpropagation gradient error, which is to correct mistakes according to the importance of elements that have just participated in the realization of these errors: the synaptic weights which contribute to generate an error significant change will be more significant than the weight that resulted in an error margin.


Pruning is a method to avoid overfitting while limiting the complexity of the model. It is to remove connections (or synapses), inputs or neural network learning once completed. In practice, the elements that have the smallest influence on the output error of the network are removed. The two pruning algorithms commonly used are:

  • Optimal Brain Damage (OBD) of Y. Lecuna et al.
  • Optimal Brain Surgeon (OBS) of B. Hassiba and DG Stork Android app