15.2 Input Format Strings


x = 34780 readascii ("file")
reads any type of ASCII data from file.dat

The function 34783 readascii is a powerful tool to read arbitrary ASCII data files. The use of format strings allows us to handle different delimiters when reading sequences of ASCII strings.

You find the XploRe code for the following examples in the 34790 XLGiofmt02.xpl . As input, the data file testascii.dat is used:

  1;2;3;4
  5;6;7;8
  ab;cd;de;fg
We can read this file and print the resulting list by
  dat = readascii ("testascii.dat", "\t\n;")
  dat
The list dat is a list that consists of two components, dat.data and dat.type which print as follows:
  Contents of dat.data
  [ 1,] "1"
  [ 2,] "2"
  [ 3,] "3"
  [ 4,] "4"
  [ 5,] "
  "
  [ 6,] "5"
  [ 7,] "6"
  [ 8,] "7"
  [ 9,] "8"
  [10,] "
  "
  [11,] "ab"
  [12,] "cd"
  [13,] "de"
  [14,] "fg"
  [15,] "
  "
  Contents of dat.type
  [ 1,]        0 
  [ 2,]        0 
  [ 3,]        0 
  [ 4,]        0 
  [ 5,]       20 
  [ 6,]        0 
  [ 7,]        0 
  [ 8,]        0 
  [ 9,]        0 
  [10,]       20 
  [11,]       10 
  [12,]       10 
  [13,]       10 
  [14,]       10 
  [15,]       20

The data component is a string vector, based on the contents of the input file. In this example, the format string used consists of \t\n;. Here, \t stands for tabulator and \n for newline. Important is the ";" which tells XploRe that a semicolon is used as the delimiter. This way, XploRe is able to read ASCII strings from files that contain delimiters other than the default blank delimiter.

The type component indicates which data type has been read for each cell of the vector. Possible types are 0 (number), 1 (missing), 10 (text), and 20 (newline). Thus, cells 5, 10, and 15 in the example above are newlines. Cells 1 through 4 and 6 through 9 are numbers, and cells 11 through 14 are text lines.

To see the difference, we study what happens when using another format string for the same data file:

  dat = readascii ("testascii.dat", "\t\n")
  dat
produces
  Contents of dat.data
  [1,] "1;2;3;4"
  [2,] "
  "
  [3,] "5;6;7;8"
  [4,] "
  "
  [5,] "ab;cd;de;fg"
  [6,] "
  "
  Contents of dat.type
  [1,]       10 
  [2,]       20 
  [3,]       10 
  [4,]       20 
  [5,]       10 
  [6,]       20

Here, each line of the data file is read as a string. The ";" is considered as a regular character but not as a delimiter as in the previous example. Obviously, only three text lines (cells 1, 3, and 5) are read, each followed by a newline (cells 2, 4, and 6).