In
XploRe
, data can be stored in matrices () or arrays
(
).
Here, we will concentrate on data matrices.
Small data matrices can be created directly from the command
line or within an
XploRe
quantlet. Large data matrices are
typically read from data files.
The following subsections provide a short introduction on matrix and data handling. Consult Read and Write (15) to learn more about loading data files into XploRe . More details on data and matrix manipulation can be found in Matrix handling (16).
Small data matrices can be directly given at the command
line or within an
XploRe
program. The following
XploRe
codes
are all available from the quantlet
XLGdesc01.xpl
.
As a first example, consider the data matrix
col1=#(1,5,9,8) col1=1|5|9|8Both create the column vector
col1at the command line, which results in
Contents of col1 [1,] 1 [2,] 5 [3,] 9 [4,] 8in the output window. In the same way as for col1, we build the second and third columns:
col2=#(2.0,6.0,0.0,7.0) col3=#(3.4,7.8,1.44,10.432)and group all three vectors together by means of the
mat=col1~col2~col3When we check the contents of mat we see
Contents of mat [1,] 1 2 3.4 [2,] 5 6 7.8 [3,] 9 0 1.44 [4,] 8 7 10.432Note that we could have created mat within a single step
mat= #(1,5,9,8) ~ #(2.0,6.0,0.0,7.0) ~ #(3.4,7.8,1.44,10.432)Let us also remark that XploRe does not distinguish between integer and float values. Therefore, the first two columns of the matrix mat appear in the same format.
It is also possible to create text matrices. For example
textmat= #("aa","c") ~ #("b","d2")creates the text matrix
Large data sets are usually stored in data files. XploRe can read data from ASCII files, consisting of both numeric and text data. In the following we will use two data sets: cps85 and uscomp2 (see Data Sets (B.2)).
The file
cps85.dat
consists of a subsample
of the 1985 U.S. current population survey.
The file contains only numeric data. We will
assign columns 1 (years of education),
2 ( if living in south), 5 (
if female)
8 (years of labor market experience),
10 (
if working on a union job),
11 (natural logarithm of average hourly earnings)
and 12 (age in years) to the
XploRe
variable earn:
earn=read("cps85") earn=earn[,1|2|5|8|10|11|12]
uscomp=readm("uscomp2") branch=uscomp.text[,2] salpro=uscomp.double[,2|4]
The first step in data analysis is to find out information
on the dimension of the data. This can be done generally by
using the function
dim
. We apply this function now
to the data matrices mat, earn, branch, and
salpro that we specified in Subsections 2.1.1
and 2.1.2. The codes for this section are available
from the quantlet
XLGdesc02.xpl
.
dim(mat) dim(earn) dim(branch) dim(salpro)yields
Contents of dim [1,] 4 [2,] 3 Contents of dim [1,] 534 [2,] 7 Contents of dim [1,] 79 Contents of dim [1,] 79 [2,] 2and tells us that mat is a
rows(earn) cols(earn)gives
Contents of rows [1,] 534 Contents of cols [1,] 7
To extract elements or submatrices of a matrix, we can use
the subarray operator []. The
following three lines extract the first row, the second column
and -element (fourth row, third column), for example:
mat[1,] mat[,2] mat[4,3]This operator can also be used for extracting several rows and columns at once. The statement mat[1:3,1|3] extracts the elements which are in the 1st to 3rd rows of mat and in the 1st and 3rd columns. The operator : is used to specify a range of subsequent integers.