13.4 Example: Media


13.4.1 Description of the Data Set

The data set comes from a survey where 12,388 contacts with various media have been identified (Lebart, L., Morineau, A., and Piron, M.; 1995). These contacts are crossed by activities (the statistical units are the media contacts). Besides, they are crossed with some supplementary variables: sex, age and education level.

The active data is stored in the file m.dat which contains six items (columns) of media and eight activities (rows)

    96      118        2       71       50       17
   122      136       11       76       49       41
   193      184       74       63      103       79
   360      365       63      145      141      184
   511      593       57      217      172      306
   385      457       42      174      104      220
   156      185        8       69       42       85
  1474     1931      181      852      642      782
The column labels are stored in file mctxt.dat as shown below
  RADIO
  TV
  N_NEWS
  R_NEWS
  MAGAZ
  TVMAG
The vector of row labels is stored in the file mltxt.dat
  la_Farmer
  s_busin
  h_manag
  i_manag
  empl
  skil
  unsk
  Nowork
Supplementary row data are stored in the file msl.dat :
  1630     1900      285      854      621      776
  1667     2069      152      815      683      938
   660      713       69      216      234      360
   640      719       84      230      212      380
   888     1000      130      429      345      466
   617      774       84      391      262      263
   491      761       70      402      251      245
   908     1307       73      642      360      435
   869     1008      107      408      336      494
   901     1035       80      140      311      504
   619      612      177      209      298      281
The eleven supplementary row labels are stored in the file msltxt.dat :
  MALE
  FEMALE
  A14-24
  A25-34
  A35-49
  A50-64
  A65+
  PRIMARY
  SECOND
  H_TECH
  UNIVER


13.4.2 Calling the Quantlet

The next code which calls the quantlet 23767 corresp and analyzes the dataset m.dat .

  library("stats")
  corresp("m.dat","msl.dat","null","MEDIA","mltxt.dat",
                               "mctxt.dat","msltxt.dat","null")
23773 XAGcorre02.xpl


13.4.3 Brief Interpretation

We obtain the following output.

  [1,] EIGENVALUES AND PERCENTAGES
 
  Contents of seig 
 
  [1,]   0.0139   62.1982   62.1982
  [2,]   0.0072   32.3650   94.5632
  [3,]   0.0008    3.7018   98.2650
  [4,]   0.0003    1.3638   99.6288
  [5,]   0.0001    0.3712  100.0000
The first two axes together account for 95% of total variation and are very dominant. This percentage gives an idea of the share of information accounted for by the first two principal axes.

Coordinates on different axes and other indices helpful for interpreting the results are shown in following output which also includes the coordinates and the squared correlations of supplementary items.

  [1,] Row relative weights and distances to the origin 
 
  Contents of spdai 
 
  [1,]   0.0286    0.0032
  [2,]   0.0351    0.0016
  [3,]   0.0562    0.0039
  [4,]   0.1015    0.0011
  [5,]   0.1498    0.0009
  [6,]   0.1116    0.0011
  [7,]   0.0440    0.0014
  [8,]   0.4732    0.0005

  [1,] Coordinates of the rows

  Contents of scoordi

  [1,]   -0.0015   -0.0028    0.0006    0.0001   -0.0002
  [2,]   -0.0006   -0.0013    0.0006   -0.0002    0.0002
  [3,]    0.0039   -0.0005    0.0000   -0.0002   -0.0001
  [4,]    0.0010    0.0003    0.0003    0.0002    0.0001
  [5,]   -0.0001    0.0009    0.0000    0.0002    0.0000
  [6,]   -0.0004    0.0009    0.0002   -0.0003    0.0000
  [7,]   -0.0011    0.0009    0.0004    0.0000   -0.0002
  [8,]   -0.0003   -0.0003   -0.0002    0.0000    0.0000
In the following window we remark, for instance, that the relative frequency of national newspapers (N NEWS) (3-rd active column item) is very small (3.54%).
  [1,] Column relative weights and distances to the origin
 
  Contents of spdaj
 
  [1,]   0.2661    0.0005
  [2,]   0.3204    0.0005
  [3,]   0.0354    0.0049
  [4,]   0.1346    0.0014
  [5,]   0.1052    0.0015
  [6,]   0.1384    0.0015
 
  [1,] Coordinates of the columns

  Contents of scoordj

  [1,]    0.0001    0.0002    0.0004    0.0000    0.0000
  [2,]   -0.0005    0.0000   -0.0001   -0.0001   -0.0001
  [3,]    0.0049   -0.0001   -0.0002   -0.0004    0.0001
  [4,]   -0.0010   -0.0010    0.0000   -0.0001    0.0001
  [5,]    0.0009   -0.0012   -0.0002    0.0003    0.0000
  [6,]   -0.0001    0.0015   -0.0002    0.0001    0.0001
but its distance to the origin is very high (0.049), which tells that its profile is very specific in terms of activities. As a result it contributes 74.6% as can be seen from the following output, to the construction of the first axis. Geometrically it is very close to this axis (squared correlation is 0.99).
  [1,] Contributions of the columns
 
  Contents of scontrj
 
  [1,]   0.4287    1.8037   70.3836    0.6207    0.1489
  [2,]   6.5641    0.0192   10.5160   13.2700   37.5915
  [3,]  74.5877    0.0189    1.8090   18.1763    1.8723
  [4,]  11.5011   22.4356    0.4460    7.5324   44.6282
  [5,]   6.8233   25.6080    4.4877   50.8035    1.7592
  [6,]   0.0950   50.1145   12.3576    9.5970   13.9999
 
  [1,] Squared correlations of the columns 
 
  Contents of scorrj 

  [1,]   0.0770    0.1685    0.7520    0.0024    0.0002
  [2,]   0.8508    0.0013    0.0811    0.0377    0.0291
  [3,]   0.9930    0.0001    0.0014    0.0053    0.0001
  [4,]   0.4866    0.4940    0.0011    0.0070    0.0113
  [5,]   0.3168    0.6186    0.0124    0.0517    0.0005
  [6,]   0.0035    0.9587    0.0270    0.0077    0.0031
The first axis is highly explained by the 3-rd active row item high manager (h manag) in the following output window:
  [1,] Contributions of the rows
 
  Contents of scontri
 
  [1,]   5.6928   37.9892   17.8813    1.9590   15.8850
  [2,]   1.1848    9.9793   17.6701    4.7954   28.0180
  [3,]  74.9579    2.8872    0.0622    5.2257    8.5732
  [4,]   8.3279    1.4964   11.7552   21.4483   17.5522
  [5,]   0.2675   18.9376    0.4701   20.3081    2.1711
  [6,]   1.5383   15.9009    5.0508   46.0393    0.4038
  [7,]   4.4054    5.4906    8.4193    0.1767   26.8961
  [8,]   3.6255    7.3188   38.6910    0.0476    0.5005

  [1,] Squared correlations of the rows
 
  Contents of scorri
 
  [1,]   0.2135    0.7414    0.0399    0.0016    0.0036
  [2,]   0.1538    0.6742    0.1366    0.0137    0.0217
  [3,]   0.9782    0.0196    0.0000    0.0015    0.0007
  [4,]   0.8022    0.0750    0.0674    0.0453    0.0101
  [5,]   0.0252    0.9289    0.0026    0.0420    0.0012
  [6,]   0.1383    0.7437    0.0270    0.0907    0.0002
  [7,]   0.5557    0.3604    0.0632    0.0005    0.0202
  [8,]   0.3722    0.3910    0.2364    0.0001    0.0003

  [1,] SUPPLEMENTARY ITEMS
 
  [1,] Row relative weights and distances to the origin
 
  Contents of spdsl
 
  [ 1,]   0.1644    0.0006
  [ 2,]   0.1714    0.0006
  [ 3,]   0.0610    0.0012
  [ 4,]   0.0614    0.0012
  [ 5,]   0.0883    0.0004
  [ 6,]   0.0648    0.0010
  [ 7,]   0.0602    0.0016
  [ 8,]   0.1010    0.0015
  [ 9,]   0.0873    0.0004
  [10,]   0.0805    0.0024
  [11,]   0.0595    0.0026
The 11-th supplementary row item university education (UNIVER) is closely linked to factor 1, see the following output:
  [1,] Squared correlations of the rows
 
  Contents of scontrsi
 
  [ 1,]   0.4813    0.1104    0.0215    0.3239    0.0629
  [ 2,]   0.4910    0.1025    0.0213    0.3261    0.0591
  [ 3,]   0.0150    0.5609    0.0762    0.2102    0.1377
  [ 4,]   0.0542    0.8704    0.0100    0.0350    0.0304
  [ 5,]   0.6140    0.1026    0.0726    0.0316    0.1791
  [ 6,]   0.0478    0.8030    0.0011    0.1184    0.0296
  [ 7,]   0.1438    0.5840    0.1552    0.0894    0.0275
  [ 8,]   0.6289    0.2446    0.0209    0.1034    0.0023
  [ 9,]   0.0002    0.6872    0.0001    0.2908    0.0218
  [10,]   0.0132    0.4614    0.0187    0.1283    0.3783
  [11,]   0.9882    0.0033    0.0024    0.0025    0.0037

  [1,] Coordinates of the rows
 
  Contents of scodsi
 
  [ 1,]   0.0004   -0.0002    0.0001   -0.0004    0.0002
  [ 2,]  -0.0004    0.0002   -0.0001    0.0004   -0.0002
  [ 3,]   0.0001    0.0009    0.0003    0.0006   -0.0004
  [ 4,]   0.0003    0.0011    0.0001    0.0002   -0.0002
  [ 5,]   0.0003    0.0001    0.0001    0.0001    0.0001
  [ 6,]  -0.0002   -0.0009    0.0000   -0.0003    0.0002
  [ 7,]  -0.0006   -0.0012   -0.0006   -0.0005    0.0003
  [ 8,]  -0.0012   -0.0007   -0.0002   -0.0005    0.0001
  [ 9,]   0.0000    0.0004    0.0000    0.0002    0.0001
  [10,]   0.0003    0.0017    0.0003    0.0009   -0.0015
  [11,]   0.0026   -0.0002    0.0001    0.0001   -0.0002

Figure 13.2: Biplot for media data set.
Image /local/mdstat/xagtmp//media.gif

It is clear in this analysis that main trait (first axis) is that the contact of national newspapers corresponds, in a highly significant way, to high manager and (or) people with university education.

The second axis characterizes mostly an opposition between TV magazines (TVMAG) (associated with employer, worker , and the younger people) and magazine (MAGAZ), and regional newspapers (R NEWS) associated with farmer, small business (s busin) and older people (A50-64, A65+). Figure 13.2 summarizes this set of associations.

The positions of items on Figure 13.2 explain a nuance interpretation on the second axis: the employer and worker, people of middle level education (SECOND), associated in particular with the young (A25-34, A14-24) (contact media such as TV magazine), are opposed to small business and farmers, who are primarily older (A50-64, A65+) with less education (PRIMARY) and contact media such as magazine (MAGA) and regional newspapers (R NEWS).