In the following two sections we run simple neural nets on clustered data. Before proceeding to the examples, the following libraries need to be loaded:
library ("plot") library ("nn")The nn library contains the functions for running the networks. The plot library is used to produce scatter plots of the clusters.
In the following, we will use a single hidden layer network with one hidden unit to perform a discriminant analysis on an artificially generated data set with two clusters.
All
XploRe
codes for this subsection can be found in
XLGnn1.xpl
.
The first step is to generate the training data set:
randomize(0) n = 200 xt = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,+1)'Here, a mixture of two two-dimensional normal distributions is generated. Each cluster consists of n = 200 observations. The variances are identical (equal to 1 in both directions) whereas the means are shifted by (+1,+1) and (-1,-1), respectively. The following code lines can be used to display the data set graphically:
color = string("red",1:n) | string("blue",1:n) symbol = string("circle",1:n) | string("triangle",1:n) xt = setmask(xt, color, symbol) plot(xt) xl="x1" yl="x2" tl="Training Data Set" setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)The generated two-dimensional data are shown in Figure 8.3. We have labeled the observations from the first cluster by red circles, whereas the observation from the second cluster are labeled as blue triangles.
To apply the neural network, we need to create now the output variable y and the prior weights w. For y, we use a value of 0 for the first and a value of 1 for the second cluster. The prior weights are all set to 1. The last statement of the following code computes the neural network using one hidden unit and assigns the result to net.
yt = (matrix(n)-1)|matrix(n) w = matrix(2*n) param = 1 net = nnrnet(xt,yt,w,1)
We can obtain a summary of the fitted network from
nnrinfo(net)which prints into the output window:
Contents of ts [ 1,] "A 2 - 1 - 1 network:" [ 2,] "# weights : 5" [ 3,] "linear output : no" [ 4,] "error function: least squares" [ 5,] "log prob model: no" [ 6,] "skip links : no" [ 7,] "decay : 0" [ 8,] "" [ 9,] " From To Weights" [10,] " 0 3 -1.18" [11,] " 1 3 -0.285" [12,] " 2 3 -0.198" [13,] " 0 4 99.8" [14,] " 3 4 44.8"
To validate the obtained network, we generate
new random data from the same mixture of two-dimensional
normal distributions. The classification of these data
using the network net is done by
nnrpredict
.
x = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,+1)' pred = nnrpredict(x, net) prob = pred.resultThe macro nnrpredict calculates the predicted values and the Hessian matrix. pred.result extracts the predicted values.
Now we compute the misclassified observations and show them in comparison with the original data x.
y = (matrix(n)-1)|matrix(n) ; true yp = prob > 0.5 ; predicted misc = paf(1:2*n,y!=yp) ; misclassified good = paf(1:2*n,y==yp) ; correctly classified nm = rows(misc) sm = string("fill",1:nm)+symbol[misc] xm = setmask(x[misc],color[misc],sm,"huge") xg = setmask(x[good],color[good],symbol[good]) pm = 100*nm/(2*n) ; percentage of misclassified spm = string("%1.2f",pm)+"%" Network = createdisplay(1,1) show(Network,1,1,xg,xm) tl="Network: misclassified = "+spm setgopt(Network,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
Figure 8.4 shows the two-dimensional data that we used for validation. As before, observations from the first cluster are labeled by red circles, whereas the observation from the second cluster are labeled as blue triangles. All misclassified data are labeled by large filled symbols.
Let's compare the classification obtained from the neural network with that from a classical linear discriminant analysis. Apart from the discrimination rule that is used for the prediction here, the code is almost identical to the above.
mu0 = mean(xt[1:n]) mu1 = mean(xt[n+1:2*n]) mu = (mu0+mu1)/2 lin = inv(cov(xt))*(mu0-mu1)' y = (matrix(n)-1)|matrix(n) ; true yp = (x-mu)*lin<=0 ; predicted misc = paf(1:2*n,y!=yp) ; misclassified good = paf(1:2*n,y==yp) ; correctly classified nm = rows(misc) sm = string("fill",1:nm)+symbol[misc] xm = setmask(x[misc],color[misc],sm,"huge") xg = setmask(x[good],color[good],symbol[good]) x = setmask(x, color, symbol) pm = 100*nm/(2*n) ; percentage of misclassified spm = string("%1.2f",pm)+"%" Discrim = createdisplay(1,1) show(Discrim,1,1,xg,xm) tl="Linear misclassified = "+spm setgopt(Discrim,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
Figure 8.5 shows the resulting classification. Again, all misclassified data are labeled by large filled symbols. Comparing Figures 8.4 and 8.5 shows that the percentage of misclassification is nearly equal for both methods. The linear discriminant analysis performs slightly better. This is not astonishing, since the linear discriminant analysis is designed to handle the data that we generated.
In contrast to the previous subsection, we will now consider
a generated data set where the linear discriminant analysis
performs worse than the neural network.
The
XploRe
codes are largely identical to the previous
examples and can be found in
XLGnn2.xpl
.
As before we generate a training data set, which features two clusters.
randomize(0) n = 100 xt = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,-2)' xt = xt | normal(n,2)+#(+4, 0)' | normal(n,2)+#(+1,+1)' color = string("red",1:3*n) | string("blue",1:n) symbol = string("circle",1:3*n) | string("triangle",1:n) xt = setmask(xt, color, symbol) plot(xt) xl="x1" yl="x2" tl="Training Data Set" setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)The generated two-dimensional data are shown in Figure 8.6. It is obvious that here the points from the second group (labeled by blue triangles) overlap the points from the first group (red circles) in a more complicated way.
We proceed in the same way as before, i.e. we create the output variable y and set all prior weights w to 1. Then the neural network is fitted. In contrast to the previous section, we now use 3 hidden layers to take the more complex structure of the data into account.
yt = (matrix(3*n)-1)|matrix(n) w = matrix(4*n) param = 1 net = nnrnet(xt,yt,w,3) nnrinfo(net)The resulting fit is summarized as follows:
Contents of ts [ 1,] "A 2 - 3 - 1 network:" [ 2,] "# weights : 13" [ 3,] "linear output : no" [ 4,] "error function: least squares" [ 5,] "log prob model: no" [ 6,] "skip links : no" [ 7,] "decay : 0" [ 8,] "" [ 9,] " From To Weights" [10,] " 0 3 1.26" [11,] " 1 3 -0.106" [12,] " 2 3 -5.15" [13,] " 0 4 2.92" [14,] " 1 4 -1.32" [15,] " 2 4 0.37" [16,] " 0 5 -7.61" [17,] " 1 5 -56.1" [18,] " 2 5 -28.3" [19,] " 0 6 -2.76" [20,] " 3 6 -4.38" [21,] " 4 6 7.64" [22,] " 5 6 -4.71"Again, we assess the quality of the obtained network by counting the misclassified observations for a validation data set.
x = normal(n,2)+#(-1,-1)' | normal(n,2)+#(+1,-2)' x = x | normal(n,2)+#(+4, 0)' | normal(n,2)+#(+1,+1)' pred = nnrpredict(x, net) prob = pred.result y = (matrix(3*n)-1)|matrix(n) ; true yp = prob > 0.5 ; predicted misc = paf(1:4*n,y!=yp) ; misclassified good = paf(1:4*n,y==yp) ; correctly classified nm = rows(misc) sm = string("fill",1:nm)+symbol[misc] xm = setmask(x[misc],color[misc],sm,"huge") xg = setmask(x[good],color[good],symbol[good]) pm = 100*nm/(4*n) ; percentage of misclassified spm = string("%1.2f",pm)+"%" Network = createdisplay(1,1) show(Network,1,1,xg,xm) tl="Network: misclassified = "+spm setgopt(Network,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
Figure 8.7 shows the resulting plot of the two-dimensional data that we used for prediction, with misclassified data labeled by large filled symbols.
The comparison with the classical linear discriminant analysis is implemented in the following lines:
mu0 = mean(xt[1:3*n]) mu1 = mean(xt[3*n+1:4*n]) mu = (mu0+mu1)/2 lin = inv(cov(xt))*(mu0-mu1)' y = (matrix(3*n)-1)|matrix(n) ; true yp = (x-mu)*lin<=0 ; predicted misc = paf(1:4*n,y!=yp) ; misclassified good = paf(1:4*n,y==yp) ; correctly classified nm = rows(misc) sm = string("fill",1:nm)+symbol[misc] xm = setmask(x[misc],color[misc],sm,"huge") xg = setmask(x[good],color[good],symbol[good]) x = setmask(x, color, symbol) pm = 100*nm/(4*n) ; percentage of misclassified spm = string("%1.2f",pm)+"%" Discrim = createdisplay(1,1) show(Discrim,1,1,xg,xm) tl="Linear misclassified = "+spm setgopt(Discrim,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
Figure 8.8 shows the resulting classification. The comparison of Figures 8.7 and 8.8 reveals now that the neural network separates the clusters more accurately. This is due to the fact that the neural network with three hidden units can better adapt to a nonlinear discrimination rule.