The perceptron is a simple mathematical model of how a nerve cell functions in receiving signals from sense cells and other nerve cells (the input variables) and from this sends a signal to the next nerve cell or remains inactive. In spite of all of the disadvantages the perceptron is very influential on the way of thinking with respect to neural networks, so that it is a good starting point for the discussion of components from which neural networks are constructed. The perceptron works in two steps:
- the input variables
are multiplied and added
with weights
,
- a threshold operation is applied to the result.
represent the input vector and weight vector respectively, and for
a given
let
be the corresponding
threshold function. The output variables
of the perceptron is 1 (the nerve cell
fires
), when
the sum of the weighted input signales lies above the threshold
and is 0 otherwise (the nerve cells remain inactive).
A perceptron can be trained to solve classification problems of
the following type: Given are objects which belong to one of two
classes, or
. Decisions are made based on observations
of the object
whether it belongs to
or
.
The perceptron characterized by the weights
classifies an object as belonging to
respectively
when
the output variable
is 0 respectively 1.
So that the classification problem ''may be'' solved, the weights
must be ''learned''. To do this there is a training set
In statistical terms the problem is to estimate the parameters of
the perceptron from the data
,
. A learning rule is an estimation method which
produces estimates
.
A learning rule is, for example, the Delta or Widrow-Hoff
learning rule: The input vectors
are used consecutively as input variables of the perceptron and
the output variables
are compared
to the correct classification
If in
one step
then the weights remain unchanged.
If on the other hand
, then the weight
vector
is adjusted in the
following manner:
The weights
can be identified up to a positive
scale factor, i.e., for
lead to the same classification. By applying the
learning rule, such as that of Widrow-Hoff, it can happen that
continuously increases, which can lead to numerical
problems. In order to prohibit this, one uses the so called weight decay technique, i.e., a modified learning rule in which
remains stable.
are used as the training set with the correct classification
The perceptron
with the weights
classifies an object as 1 if and
only if
Supervised Learning: Compare the network outputs
with the correct
When
the weights are changed according to the learning rule.
Reinforcement Learning: From every network output
one discovers, whether it is
correct
or
incorrect
- in the latter
case though one does not know the correct value. When
is
incorrect
, the weights are changed according to the learning rule.
Unsupervised Learning: There is no feedback while learning.
Similar to the cluster analysis random errors are filtered from
the data with the help of redundant information.
For
supervised and reinforcement learning are
the same. Included in this type is the Widrow-Hoff learning rule
for the perceptron.
The perceptron can not learn all of the desired classifications.
The classical counter example is the logical argument XOR =
''exclusive or'':
The perceptron with
Assuming that the training set
is given. Determine for some given weight
the weights
so that
![]() |
![]() |
![]() |
|
with ![]() |
![]() ![]() |
The Widrow-Hoff learning rule solves the LS classification problem; there are, however, a series of other learning rules or estimation methods which can also solve the problem. The perceptron has proven to be too inflexible for many applications. Therefore, one considers general forms of neurons as components used to build a neuron network:
Let
be input and weight vectors respectively. For
For
approaches a
threshold function:
is often not explicitly chosen, since it can be
integrated as a scale factor in the other parameters
of the neurons. If one also sets
and
, then the output variables can also be
written in the form:
Neural networks can also be constructed with multiple hidden layers that give multiple output variables. The connections do not have to be complete, i.e., edges between the nodes of consecutive layers may be missing or equivalently several weights can be set to 0. Instead of the logical function or similar sigmoid functions threshold functions may also appear in some neurons. Another probability are the so called radial basis functions (RBF). To the former belongs the density of the standard normal distribution and similar symmetrical kernel functions. In this case one no longer speaks of a MLP, but of a RBF network.
Figure 18.7 shows an incomplete neural network with two
output variables. The weights
and
are set to 0, and the corresponding edges are not
displayed in the network graphs. The output variable
is for
example
Until now we have only discussed those cases that are most often
handled in the literature, where a neuron has an effect on the
linear combination of variables from the previous layer.
Occasionally the case where the output of a neural of the form
respectively
is considered.
Neural networks of MLP types can be used for classification
problems as well as for regression and forecast problems. In
order to find an adequate network for each problem, the weights
have to be learned through a training set, i.e., the network
parameters are estimated from the data. Since we are restricting
ourselves to the case of supervised learning, this means that
are given for
the training set. The
are input vectors, the
are the corresponding desired output values
from the network. The vectors
are compared to the
actual output vectors
of the network. The
weights are determined so that the deviations between
and
are small. An example of this is the least
squares (LS) application already mentioned in the analysis of the
perceptron:
Assuming that the training set
is given. The weights
are given, where
is the number of neurons in the
first hidden layer. The weights of all the other edges in the
network (between the input layer, the hidden layers and the output
layer) are determined so that
Instead of the LS method also other loss functions can be minimized, for example, weighted quadratic distances or, above all in classification, the Kullback-Leibler distance: