next up previous [pdf]

Next: Additional documentation Up: Introduction to Madagascar Previous: Madagascar's design

RSF file format

As previously mentioned, the lowest level of Madagascar is the RSF file format, which is the format used to exchange information between Madagascar programs. Conceptually, the RSF file format is one of the easiest to understand, as RSF files are simply regularly sampled hypercubes of information. For reference, a hypercube is a hyper dimensional cube (or array) that can best be visualized as an $N$-dimensional array, where $N$ is between 1 and 9 in Madagascar.

RSF hypercubes are defined by two files, the header file and the binary file. The header file contains information about the dimensionality of the hypercube as well as the data contained within the hypercube. Information contained in the header file may include the following:

Since we often want to view this information about files without deciphering it, we store the header file as an ASCII text file in the local directory, usually with the suffix .rsf. At any time, you can view or edit the contents of the header files using a text editor such as gedit, VIM, or Emacs.

The binary file is a file stored remotely (i.e. in a separate directory) that contains the actual hypercube data. Because the hypercube data can be very large ($10s$ of GB or TB) we usually store the binary files in a remote directory with the suffix .rsf@. The remote directory is specified by the user using the DATAPATH environmental variable. The advantage to doing this, is that we can store the large binary data file on a fast remote filesystem if we want, and we can avoid working in local directories.

Figure 2: Cartoon of the RSF file format. The header file points to the binary file, which can be separate from one another. The header file, which is text, is very small compared to the binary file.
\begin{figure}
\setlength{\unitlength}{1cm}
\begin{picture}(12,7)(0,0)
\put(2,...
...ector(0,-1){2}}
\put(2,0){\framebox{(}10,4){Binary}}
\end{picture}
\end{figure}

Because the header and binary are separated from one another, it is possible that we can lose either the header or binary for a particular RSF file. If the header is lost, then we can simply reconstruct the header using our previous knowledge of the data and a text editor. However, if we lose the binary file, then we cannot reconstruct the data regardless of what we do. Therefore, you should try and avoid losing either the header or binary data. The best way to avoid data loss is to make your research reproducible so that your results can be replicated later.

Sometimes though we need to store RSF files for archiving or to transfer to other machines. Fortunately, we can avoid transferring the header and binary separately by using the combined header/binary format for RSF files. Files can be constructed using the combined header/binary format by specifying additional parameters on the command line, in particular -out=stdout, for any Madagascar program. The output file will then be header/binary combined, which allows you to transfer the file without fear for losing either the header or binary. Be careful though: header/binary combined files can be very large, and might slow down your local filesystem. A best practice is to only use combined header/binary files when absolutely necessary for file transfers. Note: header/binary combined files are usually automatically converted to header/binary separate files when processed by a Madagascar program.


next up previous [pdf]

Next: Additional documentation Up: Introduction to Madagascar Previous: Madagascar's design

2011-11-03