|
The function readascii is a powerful tool to read arbitrary ASCII data files. The use of format strings allows us to handle different delimiters when reading sequences of ASCII strings.
You find the XploRe code for the following examples in the XLGiofmt02.xpl . As input, the data file testascii.dat is used:
1;2;3;4 5;6;7;8 ab;cd;de;fgWe can read this file and print the resulting list by
dat = readascii ("testascii.dat", "\t\n;") datThe list dat is a list that consists of two components, dat.data and dat.type which print as follows:
Contents of dat.data [ 1,] "1" [ 2,] "2" [ 3,] "3" [ 4,] "4" [ 5,] " " [ 6,] "5" [ 7,] "6" [ 8,] "7" [ 9,] "8" [10,] " " [11,] "ab" [12,] "cd" [13,] "de" [14,] "fg" [15,] " " Contents of dat.type [ 1,] 0 [ 2,] 0 [ 3,] 0 [ 4,] 0 [ 5,] 20 [ 6,] 0 [ 7,] 0 [ 8,] 0 [ 9,] 0 [10,] 20 [11,] 10 [12,] 10 [13,] 10 [14,] 10 [15,] 20
The data component is a string vector, based on the contents
of the input file. In this example, the format string used consists
of \t\n;
. Here, \t
stands for tabulator
and \n
for newline. Important is the ";"
which tells
XploRe
that a semicolon is used as the delimiter. This way,
XploRe
is able to read ASCII strings from files that contain delimiters
other than the default blank delimiter.
The type component indicates which data type has been read for each cell of the vector. Possible types are 0 (number), 1 (missing), 10 (text), and 20 (newline). Thus, cells 5, 10, and 15 in the example above are newlines. Cells 1 through 4 and 6 through 9 are numbers, and cells 11 through 14 are text lines.
To see the difference, we study what happens when using another format string for the same data file:
dat = readascii ("testascii.dat", "\t\n") datproduces
Contents of dat.data [1,] "1;2;3;4" [2,] " " [3,] "5;6;7;8" [4,] " " [5,] "ab;cd;de;fg" [6,] " " Contents of dat.type [1,] 10 [2,] 20 [3,] 10 [4,] 20 [5,] 10 [6,] 20
Here, each line of the data file is read as a string.
The ";"
is considered as a regular character but
not as a delimiter as in the previous example. Obviously,
only three text lines (cells 1, 3, and 5) are read,
each followed by a newline (cells 2, 4, and 6).