Systems

Continuous integration and reproducibility

November 11, 2013 Systems No comments

Continuous Integration (CI) is a technique in software engineering, usually described as one of the common techniques in extreme programming (XP). CI implies maintaining a shared code repository, where developers contribute frequently (possibly several times per day), and an automated build system that includes testing scripts.

Using a CI tool, such as TeamCity, it is easy to implement CI for Madagascar, which includes both compilation tests and reproducibility tests. One of the computers at the University of Texas at Austin has been dedicated to such testing and has helped to detect several bugs and reproducibility problems. You can subscribe to testing reports using the RSS feed. To implement similar testing on your own computer, install TeamCity and configure it as follows:

1. Configure Version Control Settings to connect to the Madagascar Subversion repository.

2. Configure Build Step to use a command-line script such as the following:

./configure --prefix=/tmp/rsfroot 
make install 
source env.sh 
export RSFFIGS=$RSFROOT/share/madagascar/figs 
cd book 
scons test

3. Go out for a cup of coffee and come back to check the results of testing. On a CI server, the run of the “compile & test” script is triggered every time somebody commits a new change to the repository.

4. Fix detected problems and commit your changes back to the repository to continue the integration loop.

Adopting the technique of Continuous Integration, in combination with reproducibility testing, provides a robust development environment with well-debugged code and continuously-maintained reproducible examples. It should encourage an active participation of the Madagascar development community.

Reproducible research and IPython notebooks

August 6, 2013 Systems No comments

Reproducible science was one of the main general themes at the recent SciPy conference in Austin. While different tools for accomplishing reproducible research are being proposed, IPython notebooks are often mentioned as one of the main tools. In his review of the meeting, Eric Jones of Enthought writes:

We can safely say that 2013 is the year of the IPython notebook. It was everywhere. I’’d guess 80+% of the talks and tutorials for the conference used it in their presentations.

As illustrated by the screenshots below, it is possible to combine SCons-based processing workflows with IPython notebooks.

The notebook used in this example is in rsf/rsf/test/test.ipynb

Extending MATLAB interface

June 13, 2013 Systems No comments

For those using MATLAB interface to Madagascar, a new function, m8r, allows running Madagascar programs and workflows directly on MATLAB objects. The following MATLAB code runs regularized time-frequency analysis using sftimefreq:

% create chirp signal
t=0:0.001:1;
y=chirp(t,100,1,25,'q',[],'convex');

% spectragram analysis in MATLAB
spectrogram(y,256,200,256,1000);
set(gca,'xlim',[0,250]);
set(gca,'ydir','reverse');
ylabel('Time (s)');

% time-frequency analysis in Madagascar
tf = m8r('sftimefreq rect=50 nw=251 dw=1',y',0.001);

f = 0:250;
imagesc(t,f,tf);
xlabel('Frequency (Hz)');
ylabel('Time (s)');

The necessity of object orientation in geophysical software applications

August 16, 2012 Systems 3 comments

I will begin now acting on an older idea that I espoused on the blog — discussing topics of programming and software engineering related to HPC in general and Madagascar in particular. Comment feedback is more than welcome 🙂

Today’s discussion is about the necessity of light object orientation in many geophysical applications. Like medicine, object orientation can be very useful, but when used inappropriately or in large quantities can lead to severe problems. One necessity for this medicine is the case when the same algorithm applies to 2D and 3D data, with minimal changes other than adding an extra loop. Geophysicists writing code usually either create two versions of the program, or they try to only write a 3-D version that also deals with 2-D data. Creating two versions of the program leads to severe maintenance issues in the long term. The potential for bugs and forking the code is enormous. There are also other variations (anisotropy, alternative kernels, etc), so the amount of code duplication can easily explode. Writing a 3-D version that also works on 2-D data is a better option, however this usually requires contortions to introduce length-1 axes upfront in some of the inputs, and may create outputs with different dimensions for 2-D vs. 3-D, breaking workflows. The biggest block, however, comes from the fact that the 3-D code is more verbose, harder to understand and keep in mind in its entirety, and more prone to bugs. This is why people first implement a feature in the 2-D program, test it, understand it, and then upgrade the 3-D to it. Also, the simplicity of 2-D programs comes in handy when trying to explain the algorithm to other people. The way to both have the cake and eat it is through overloading (link) and light object orientation. This makes it possible to write a program that looks like a 2-D program, but also works on 3-D data. Neither C nor F90 are truly object-oriented, but they do have user-defined structures and they have static global variables that can be used to remember whether the problem is 2-D or 3-D. These features should be enough to work out a solution for this light OO problem. To convert pairs of programs that already exist as both 2D and 3D, one would first define structures that have both x and y members (i.e. offset.x and offset.y), then convert both programs to use those instead of the native types (i.e. offset_x and offset_y). Then, a test for whether the data is 2-D or 3-D is implemented, and the value preserved as a global variable so it can be accessed by various procedures without being passed as an explicit argument. Then, code sequences that have extra operations for 3D are moved into functions that are in effect overloaded, so they call a 3-D operator for 3-D data, and 2-D operator for 2-D data. In the end, the 2-D and 3-D main programs should look the same, and what is left is removing one file, renaming another, and keeping only one version.

What do you think is the best way to avoid code duplication in geophysical applications?

FFTW

July 30, 2012 Systems No comments

FFTW (the Fastest Fourier Transform in the West) is a famous library implementing an FFT algorithm. It was developed at MIT by Matteo Frigo and Steven G. Johnson and is distributed under a GPL license.

By popular demand, Madagascar’s FFT-based programs (such as sffft1 and sffft3) include now an optional support for FFTW. The presence of the FFTW single-precision library is detected during compilation. The following table shows some peformance measurements (CPU time in seconds for 1,000 complex-valued FFTs using sffft3):

data length KISS FFT FFTW
4,096 0.11 0.08
8,192 0.27 0.18
16,384 0.50 0.32
32,768 1.21 0.80
65,536 2.61 1.78

IWAVE

June 7, 2012 Systems No comments

A major enhancement of Madagascar is the recent addition of IWAVE, a wave modeling package from Bill Symes and The Rice Inversion Project (TRIP). IWAVE includes includes a variable-density acoustics modeling application, and the initial release of an isotropic elastic modeling code, both with PML boundary conditions, parallelization over loops and tasks, and other features. Its authors are Igor S. Terentyev, Tatyana Vdovina, William W. Symes, Xin Wang, and Dong Sun. IWAVE was used as a benchmarking and QC tool by SEAM, the SEG modeling project.

You can see a simple example of using IWAVE with Madagascar in this reproducible paper.

Running Madagascar in an interactive console

July 18, 2011 Systems 3 comments

The latest versions of IPython, a rich interactive shell for Python, provide a Qt-based console with enhanced GUI controls, such as embedded images, multiline editing, and session sharing. It works nicely with the Python interface to Madagascar. This allows for a Matlab-like or Mathematica-like experience with running Madagascar interactively. This style of computing is not recommended, however, because it may lead to computations that are hard to reproduce.

Madagascar for users of Seismic Unix

August 17, 2010 Systems No comments

Seismic Unix (SU) is a famous open-source seismic processing package maintained by John Stockwell at the Center for Wave Phenomena, Colorado School of Mines.

SU has been around for 25 years and has attracted many devoted users. If you are one of them, please consider the following:

  • Using Seismic Unix is not an excuse for non-reproducible computational experiments. To facilitate reproducibility, you can use Python and SCons with the rsf.suproj module supplied by Madagascar. The book/rsf/su directory contains many examples of seismic data processing flows using SU and their loose translation to Madagascar analogs. Here is an example SConstruct script from rsf/su/sulab1
from rsf.suproj import *

Flow('plane',None,'suplane')
Result('plane','suxwigb label1="Time (s)" label2=Trace')

Flow('specfx','plane','suspecfx')
Result('specfx','suxwigb label1="Frequency (Hz)" label2=Trace')

End()

Its loose Madagascar translation is in rsf/su/rsflab1

from rsf.proj import *

Flow('plane',None,
     '''
     spike n1=64 n2=32 d2=1 o2=1 label2=Trace unit2=
     nsp=3 k1=8,20,32 k2=4 l2=28 p2=2,1,0 
     ''')
Flow('specfx','plane','spectra | scale axis=2')

for plot in ('plane','specfx'):
    Plot(plot,
         '''
         wiggle clip=1 transp=y yreverse=y poly=y
         wanttitle=n wheretitle=b wherexlabel=t
         ''')

Result('specfx','plane specfx','SideBySideAniso')

End()

If you want only rsf.suproj but not the rest of Madagascar, download madagascar-framework package from SourceForge.

  • It is also possible to convert between SU and RSF file formats with sfsuread and sfsuwrite and to combine SU and Madagascar programs in common processing flows.

  • If you decide to switch to Madagascar but are missing certain functionality from Seismic Unix, it is possible and legal to borrow code from SU and to add it to Madagascar. The opposite is not true, because the Madagascar license (GPL) is more restrictive than the SU license (BSD-like). The su directory contains some of the codes Madagascar has borrowed from SU by requests of the users. Naturally, we try to limit such borrowing to avoid unnecessary forks.

  • In a recent message to the Seismic Unix mailing list, John Stockwell described a proposal for S3, the third generation SU. The main requirements for S3 are:

    1. a new project managed on SourceForge
    2. GPL license
    3. flexible trace headers
    4. integration with scientific libraries
    5. integration with the current SU as well as other GPL- or BSD-licensed packages

One cannot help thinking that the project that John describes is Madagascar!

Java API

January 23, 2010 Systems No comments

Jeff Godwin adds Java API based on Dave Hale’s Mines Java Toolkit.

Madagascar on Playstation 3

December 18, 2009 Systems No comments

Madagascar has been successfully installed on a Playstation 3 running YellowDog Linux. See the slides below for a walkthrough!

PowerPoint