8.2 Computing a Neural Network


net = 17178 nnrnet (x, y, w, size, {, param, wts})
trains a single layer feed-forward network with input x, output y, prior weights w, and number of hidden units size; optionally the type of the network can be determined by param and initial weights wts can be given
net = 17181 nnrpredict (x, net)
predicts the responses for given variables x and network net
net = 17184 nnrinfo (net)
shows information about network net
17187 nnrsave (net, "nnfile")
saves network net to files nnfile.*
net = 17190 nnrload ("nnfile")
loads network net from files nnfile.*

The function 17193 nnrnet allows for constructing and training a single hidden layer network with maximal 100 units. The call looks like

  net = nnrnet (x, y, w, size, param, wts)
where x and y are the input and output variables. Note that x as well as y can consist of several variables (columns). We assume that x and y have dimensions $ n\times I$ and $ n\times Q$, respectively.

With the w parameter, we can associate a prior weight to each observation. This is useful, e.g. for ties in the data. Note that the prior weights w have nothing in common with the weights calculated in the net.

The parameter size determines the number of units in the hidden layer. The total number of units must not exceed 100, i.e.

columns of $ x$ + columns of $ y$ + units in hidden layer $ \le 100$.

The default network is a classification network: logistic output units, no softmax, no ``skip-layer'' connections, no weight decay and the training stops after 100 iterations. The default model for the output units $ y_k$, $ k=1,\ldots,Q$, is hence

$\displaystyle f_k(x)= F\left\{w_{0k}^{(2)} + \sum_{j=1}^{\textrm{size}}
\ w_{jk}^{(2)}F \left( w_{0j}^{(1)}
+ \sum_{i=1}^{I} w_{ij}^{(1)}x_i \right)\right\}
$

with $ F(\bullet)$ the logistic function. If a model different to the default is fitted, the parameter param needs to be modified. We explain this in more detail in Subsection 8.2.1.

The result of 17196 nnrnet is a composed object net. More information on the components of net can be found in Subsection 8.2.2. The function 17199 nnrinfo shows a short information about the fitted network. The result of

  nnrinfo(net)
could for example print the following information in the output window:
  [ 1,] "A 2 - 1 - 1 network:"
  [ 2,] "# weights     : 5"
  [ 3,] "linear output : no"
  [ 4,] "error function: least squares"
  [ 5,] "log prob model: no"
  [ 6,] "skip links    : no"
  [ 7,] "decay         : 0"
  [ 8,] ""
  [ 9,] " From    To Weights"
  [10,] "    0     3  -0.751"
  [11,] "    1     3    0.81"
  [12,] "    2     3   0.575"
  [13,] "    0     4   -4.95"
  [14,] "    3     4    14.8"
The abbreviation 2 - 1 - 1 means two input units, one hidden layer and one output unit. Altogether five weights $ w_{st}$ have been calculated, the values of these weights are given in the last lines. The other items show which parameters have been specified for the network.

Typically, a neural network is applied to a subsample of the data which is used as a training data set. The remaining observations are then used to validate the network. To compute predicted values for the validation set, 17202 nnrpredict is used:

  ypred = nnrpredict (xval, net)

Since the result of a neural network fitting is a composed object, two convenient functions for saving and loading neural networks are provided. The network net can be stored into a set of files by

  nnrsave (net, "mynet")
All created files start with the prefix mynet. The network can be reloaded by
  net = nnrload ("mynet")


8.2.1 Controlling the Parameters of the Neural Network

The type of a network and the control parameters for the iteration are determined by the parameter param of 17405 nnrnet . If, for instance, a model different to the default is fitted, this parameter needs to be modified. param is a vector of eight elements:

param[1]
determines if the activation function for the output is the logistic function (default value 0). Setting param[1] to the value 1 changes the activation function of the output unit to the identity function.
param[2]
determines the error function (the optimization criterion). The default value 0 indicates the quadratic least squares error function

$\displaystyle \sum_{k=1}^{\textrm{size}} \sum_{i=1}^{n} \left\{ f_k(x_i) - y_{i,k}
\right\}^2\,.
$

Setting param[2] to the value 1 changes the error function to the entropy for the classification case

$\displaystyle \sum_{k=1}^{\textrm{size}} \sum_{i=1}^{n} \left\{ f_k(x_i) \log
\...
...+
\{1-f_k(x_i)\} \log \left( \frac{1-f_k(x_i)}{1-y_{i,k}} \right)
\right\}\,.
$

param[3]
If param[3] is set to the value 1, then the softmax activation function is used for the outputs. This means the output is

$\displaystyle f_k(x_i)=\frac{\exp\{f^*_k(x_i)\}}{\sum_{\ell=1}^{Q} \exp\{f^*_\ell(x_i)\}}\,. $

The default value is 0, which means no softmax.
param[4]
includes ``skip-layer'' connections. Setting param[4] to the value 1 generates ``skip-layer'' connections, i.e.

$\displaystyle f_k(x) = w_{0k}^{(2)} + \sum_{i=1}^{p} w_{ij}^{(2)}x_i +
\sum_{j=...
... w_{kj}^{(2)}F \left( w_{0j}^{(1)} +
\sum_{i=1}^{p} w_{ij}^{(1)}x_i \right)\,.
$

The default value is 0, which means no ``skip-layer'' connections.
param[5]
sets the maximal value $ \delta$ for the initial weights. If the optional input parameter wts is not given, uniform random numbers from $ [-\delta,\delta]$ are used. The default value is $ \delta=0.7$.
param[6]
sets the weight decay, the default is 0.
param[7]
sets the maximal number of iterations, the default is 100.
param[8]
shows information about the iteration. Setting param[8] to the value 1 produces control output in the output window during the optimization. The default is 0, i.e. not to show control output.


8.2.2 The Resulting Neural Network

The result of 17512 nnrnet is a composed object, the list net, which contains the resulting fit and information about the network. The components are the following:

net.n
three-dimensional vector that contains the number of input, hidden and output units, respectively
net.nunits, net.nconn, net.conn
internal information about the network topology
net.decay
scalar, the weight decay parameter (=param[6])
net.entropy
scalar, the value of the entropy
net.softmax
scalar, softmax indicator (=param[3])
net.value
scalar, the value of the error function
net.wts
vector of final weights
net.yh.result
$ n\times Q$ matrix, the estimated outputs
net.yh.hess
the Hessian matrix