8.3 Running a Neural Network

In the following two sections we run simple neural nets on clustered data. Before proceeding to the examples, the following libraries need to be loaded:

  library ("plot")
  library ("nn")
The nn library contains the functions for running the networks. The plot library is used to produce scatter plots of the clusters.


8.3.1 Implementing a Simple Discriminant Analysis

In the following, we will use a single hidden layer network with one hidden unit to perform a discriminant analysis on an artificially generated data set with two clusters.

All XploRe codes for this subsection can be found in 17638 XLGnn1.xpl . The first step is to generate the training data set:

  randomize(0)
  n  = 200
  xt = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,+1)'
Here, a mixture of two two-dimensional normal distributions is generated. Each cluster consists of n = 200 observations. The variances are identical (equal to 1 in both directions) whereas the means are shifted by (+1,+1) and (-1,-1), respectively. The following code lines can be used to display the data set graphically:
  color  = string("red",1:n) | string("blue",1:n)
  symbol = string("circle",1:n) | string("triangle",1:n)
  xt     = setmask(xt, color, symbol)
  plot(xt)
  xl="x1"
  yl="x2"
  tl="Training Data Set"
  setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
The generated two-dimensional data are shown in Figure 8.3. We have labeled the observations from the first cluster by red circles, whereas the observation from the second cluster are labeled as blue triangles.

Figure 8.3: A generated training data set with two clusters.
\includegraphics[scale=0.45]{nndata1}

To apply the neural network, we need to create now the output variable y and the prior weights w. For y, we use a value of 0 for the first and a value of 1 for the second cluster. The prior weights are all set to 1. The last statement of the following code computes the neural network using one hidden unit and assigns the result to net.

  yt = (matrix(n)-1)|matrix(n)
  w  = matrix(2*n)
  param = 1
  net = nnrnet(xt,yt,w,1)

We can obtain a summary of the fitted network from

  nnrinfo(net)
which prints into the output window:
  Contents of ts
  [ 1,] "A 2 - 1 - 1 network:"
  [ 2,] "# weights     : 5"
  [ 3,] "linear output : no"
  [ 4,] "error function: least squares"
  [ 5,] "log prob model: no"
  [ 6,] "skip links    : no"
  [ 7,] "decay         : 0"
  [ 8,] ""
  [ 9,] " From    To Weights"
  [10,] "    0     3   -1.18"
  [11,] "    1     3  -0.285"
  [12,] "    2     3  -0.198"
  [13,] "    0     4    99.8"
  [14,] "    3     4    44.8"

To validate the obtained network, we generate new random data from the same mixture of two-dimensional normal distributions. The classification of these data using the network net is done by 17642 nnrpredict .

  x = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,+1)'
  pred  = nnrpredict(x, net)
  prob  = pred.result
The macro nnrpredict calculates the predicted values and the Hessian matrix. pred.result extracts the predicted values.

Now we compute the misclassified observations and show them in comparison with the original data x.

  y  = (matrix(n)-1)|matrix(n) ; true 
  yp = prob > 0.5              ; predicted
  misc = paf(1:2*n,y!=yp)      ; misclassified
  good = paf(1:2*n,y==yp)      ; correctly classified
  nm = rows(misc)
  sm = string("fill",1:nm)+symbol[misc]
  xm = setmask(x[misc],color[misc],sm,"huge")
  xg = setmask(x[good],color[good],symbol[good])

  pm = 100*nm/(2*n)            ; percentage of misclassified
  spm = string("%1.2f",pm)+"%"
  Network = createdisplay(1,1)
  show(Network,1,1,xg,xm)
  tl="Network: misclassified = "+spm
  setgopt(Network,1,1,"title",tl,"xlabel",xl,"ylabel",yl)

Figure 8.4 shows the two-dimensional data that we used for validation. As before, observations from the first cluster are labeled by red circles, whereas the observation from the second cluster are labeled as blue triangles. All misclassified data are labeled by large filled symbols.

Figure 8.4: Neural network classification.
\includegraphics[scale=0.45]{nnnet1}

Let's compare the classification obtained from the neural network with that from a classical linear discriminant analysis. Apart from the discrimination rule that is used for the prediction here, the code is almost identical to the above.

  mu0 = mean(xt[1:n])
  mu1 = mean(xt[n+1:2*n])
  mu  = (mu0+mu1)/2
  lin = inv(cov(xt))*(mu0-mu1)'

  y  = (matrix(n)-1)|matrix(n) ; true 
  yp = (x-mu)*lin<=0           ; predicted
  misc = paf(1:2*n,y!=yp)      ; misclassified
  good = paf(1:2*n,y==yp)      ; correctly classified
  nm = rows(misc)
  sm = string("fill",1:nm)+symbol[misc]
  xm = setmask(x[misc],color[misc],sm,"huge")
  xg = setmask(x[good],color[good],symbol[good])
  x  = setmask(x, color, symbol)

  pm = 100*nm/(2*n)            ; percentage of misclassified
  spm = string("%1.2f",pm)+"%"
  Discrim = createdisplay(1,1)
  show(Discrim,1,1,xg,xm)
  tl="Linear misclassified = "+spm
  setgopt(Discrim,1,1,"title",tl,"xlabel",xl,"ylabel",yl)

Figure 8.5: Linear discriminant analysis.
\includegraphics[scale=0.45]{nndis1}

Figure 8.5 shows the resulting classification. Again, all misclassified data are labeled by large filled symbols. Comparing Figures 8.4 and 8.5 shows that the percentage of misclassification is nearly equal for both methods. The linear discriminant analysis performs slightly better. This is not astonishing, since the linear discriminant analysis is designed to handle the data that we generated.


8.3.2 Implementing a More Complex Discriminant Analysis

In contrast to the previous subsection, we will now consider a generated data set where the linear discriminant analysis performs worse than the neural network. The XploRe codes are largely identical to the previous examples and can be found in 17773 XLGnn2.xpl .

As before we generate a training data set, which features two clusters.

  randomize(0)
  n  = 100
  xt = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,-2)'
  xt = xt | normal(n,2)+#(+4, 0)' | normal(n,2)+#(+1,+1)'

  color  = string("red",1:3*n) | string("blue",1:n)
  symbol = string("circle",1:3*n) | string("triangle",1:n)
  xt     = setmask(xt, color, symbol)
  plot(xt)
  xl="x1"
  yl="x2"
  tl="Training Data Set"
  setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
The generated two-dimensional data are shown in Figure 8.6. It is obvious that here the points from the second group (labeled by blue triangles) overlap the points from the first group (red circles) in a more complicated way.

Figure 8.6: A generated training data set with two clusters.
\includegraphics[scale=0.45]{nndata2}

We proceed in the same way as before, i.e. we create the output variable y and set all prior weights w to 1. Then the neural network is fitted. In contrast to the previous section, we now use 3 hidden layers to take the more complex structure of the data into account.

  yt = (matrix(3*n)-1)|matrix(n)
  w  = matrix(4*n)
  param = 1
  net = nnrnet(xt,yt,w,3)
  nnrinfo(net)
The resulting fit is summarized as follows:
  Contents of ts
  [ 1,] "A 2 - 3 - 1 network:"
  [ 2,] "# weights     : 13"
  [ 3,] "linear output : no"
  [ 4,] "error function: least squares"
  [ 5,] "log prob model: no"
  [ 6,] "skip links    : no"
  [ 7,] "decay         : 0"
  [ 8,] ""
  [ 9,] " From    To Weights"
  [10,] "    0     3    1.26"
  [11,] "    1     3  -0.106"
  [12,] "    2     3   -5.15"
  [13,] "    0     4    2.92"
  [14,] "    1     4   -1.32"
  [15,] "    2     4    0.37"
  [16,] "    0     5   -7.61"
  [17,] "    1     5   -56.1"
  [18,] "    2     5   -28.3"
  [19,] "    0     6   -2.76"
  [20,] "    3     6   -4.38"
  [21,] "    4     6    7.64"
  [22,] "    5     6   -4.71"
Again, we assess the quality of the obtained network by counting the misclassified observations for a validation data set.
  x = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,-2)'
  x = x | normal(n,2)+#(+4, 0)' | normal(n,2)+#(+1,+1)'
  pred  = nnrpredict(x, net)
  prob  = pred.result

  y  = (matrix(3*n)-1)|matrix(n) ; true 
  yp = prob > 0.5                ; predicted
  misc = paf(1:4*n,y!=yp)        ; misclassified
  good = paf(1:4*n,y==yp)        ; correctly classified
  nm = rows(misc)
  sm = string("fill",1:nm)+symbol[misc]
  xm = setmask(x[misc],color[misc],sm,"huge")
  xg = setmask(x[good],color[good],symbol[good])

  pm = 100*nm/(4*n)                ; percentage of misclassified
  spm = string("%1.2f",pm)+"%"
  Network = createdisplay(1,1)
  show(Network,1,1,xg,xm)
  tl="Network: misclassified = "+spm
  setgopt(Network,1,1,"title",tl,"xlabel",xl,"ylabel",yl)

Figure 8.7: Neural network classification.
\includegraphics[scale=0.45]{nnnet2}

Figure 8.7 shows the resulting plot of the two-dimensional data that we used for prediction, with misclassified data labeled by large filled symbols.

The comparison with the classical linear discriminant analysis is implemented in the following lines:

  mu0 = mean(xt[1:3*n])
  mu1 = mean(xt[3*n+1:4*n])
  mu  = (mu0+mu1)/2
  lin = inv(cov(xt))*(mu0-mu1)'

  y  = (matrix(3*n)-1)|matrix(n) ; true 
  yp = (x-mu)*lin<=0             ; predicted
  misc = paf(1:4*n,y!=yp)        ; misclassified
  good = paf(1:4*n,y==yp)        ; correctly classified
  nm = rows(misc)
  sm = string("fill",1:nm)+symbol[misc]
  xm = setmask(x[misc],color[misc],sm,"huge")
  xg = setmask(x[good],color[good],symbol[good])
  x  = setmask(x, color, symbol)

  pm = 100*nm/(4*n)            ; percentage of misclassified
  spm = string("%1.2f",pm)+"%"
  Discrim = createdisplay(1,1)
  show(Discrim,1,1,xg,xm)
  tl="Linear misclassified = "+spm
  setgopt(Discrim,1,1,"title",tl,"xlabel",xl,"ylabel",yl)

Figure 8.8: Linear discriminant analysis.
\includegraphics[scale=0.45]{nndis2}

Figure 8.8 shows the resulting classification. The comparison of Figures 8.7 and 8.8 reveals now that the neural network separates the clusters more accurately. This is due to the fact that the neural network with three hidden units can better adapt to a nonlinear discrimination rule.