RSF School and Workshop, Vancouver 2006

From Madagascar
Jump to navigation Jump to search
Reproducible Research in Computational Geophysics
using Madagascar software package
Photo courtesy of Joe Dellinger
Day 1 Wednesday, August 30
08:15-09:00 AM Registration and coffee
Workshop part I: General introduction
09:00-09:10 AM Opening workshop by Felix Herrmann
09:10-09:40 AM Reproducible computational research - A history of hurdles, mostly overcome by Jon Claerbout, Stanford University html
09:40-10:10 AM Reproducible Research - Opportunities in non-research communities by Matthias Schwab, Google
10:10-10:30 AM Coffee break
Workshop Part I: Seismic data processing packages
10:30-11:00 AM Integrated open-source geophysical code framework by Igor Morozov, University of Saskatchewan

html show

11:00-11:30 AM DDS, A Seismic Processing Architecture by Joe Dellinger, BP ppt 195 Kb
11:30-12:00 AM PSEIS, A Processing Architecture Blueprint by Randy Selzler, Data-Warp ppt 374 Kb
12:00-12:30 AM PSEIS, A Blueprint for Parallel Processing by Randy Selzler, Data-Warp ppt 518 Kb
12:30-01:45 PM Lunch break
Tutorial Part I: Madagascar: an open-source tool for technology transfer in seismic processing and imaging
01:45-02:00 PM Introduction and objectives (Felix Herrmann)
02:00-02:30 PM Installation and compilation by Sergey Fomel ppt 438 Kb
02:30-03:00 PM Basic command-line usage by Paul Sava pdf 271 Kb
03:00-03:30 PM Coffee break
03:30-04:00 PM Basic processing flows with SCons by Gilles Hennenfent pdf 933 Kb
04:00-04:30 PM Vplot graphics language - past, present, and future by Joe Dellinger ppt 1.5 Mb
04:30-05:00 PM LaTeX and Web tools by Sergey Fomel pdf 246 Kb
Day 2 Thursday, August 31
Workshop Part II: Graphical User Interfaces
09:00-09:30 AM INSP - Internet seismic processing system by Iulian Musat, 3DGeo
09:30-10:00 AM TKSU - Experiences with a graphical front end to SU by Jeff Thorson, Henry Thorson Consulting pdf 392 Kb
10:00-10:30 AM Coffee break
Workshop Part II: Object-oriented abstraction, parallelization and numerical linear algebra
10:30-11:00 AM An Overview of the Thyra Interoperability Effort for Abstract Numerical Algorithms within Trilinos by Ross Bartlett, Sandia National Laboratories
11:00-11:30 AM PyTrilinos: A Python Interface to Trilinos by Bill Spotz, Sandia National Laboratories ppt 1.2 Mb
11:30-12:00 AM A Software Framework for Inversion by Bill Symes, Rice University pdf 884 Kb
12:00-12:30 AM (P)SLIMpy: a object-oriented abstraction for (Parallel) numerical linear algebra by Felix Herrmann, University of British Columbia
12:30-01:45 PM Lunch break
Tutorial Part II: Technology transfer through Madagascar
01:45-02:00 PM Introduction and objectives (Felix Herrmann)
02:00-02:30 PM Writing your own applications by Sergey Fomel ppt 617 Kb
02:30-03:00 PM Coffee break
03:00-03:30 PM Advanced processing flows: Seismic imaging by Paul Sava pdf 1.6 Mb
03:30-04:00 PM Advanced processing flows: Geostatistics by Jim Jennings ppt 6.5 Mb
04:00-05:00 PM Panel discussion


Liu Institute for Global Issues
The University of British Columbia
6476 NW Marine Drive
Vancouver BC

Organizing Committee

  • Sergey Fomel (University of Texas at Austin)
  • Felix Herrmann (University of British Columbia)
  • Paul Sava (Colorado School of Mines)


Jon Claerbout: Reproducible computational research - A history of hurdles, mostly overcome

I discovered reproducibility in computational research when I learned about makefile syntax and how to use it to incorporate figures in documents. Here I summarize the reproducibility obstacles I faced writing textbooks, teaching reproducibility, how SEP has set up it's reproducibility rules, and how it uses them. An unanswered question is what we can do to enable reproducible research to spread more widely throughout the community. The next "killer application" will be "reproducible lectures."

Matthias Schwab: Reproducible Research - Opportunities in non-research communities

Reproducible research was created by academic scientists doing data processing. Unfortunately reproducible research has not spread widely among this research community, nor has a standard technology evolved. Why? And do other potential user communities exist that would benefit even more from a similar technology? What are their needs? And would they potentially be more capable in adopting such a technology? Is there a business case out there?

Igor Morozov: Integrated open-source geophysical code framework

In the SIA package, we propose an open-source code framework to meet the needs of both academic and commercial researchers in several areas of geophysics. SIA currently includes over 200 dynamically-linked plug-in tools which are tightly integrated with a content-agnostic processing monitor and serve a wide range of processing tasks. The package consists of three main components operating as parallel UNIX processes: 1) multiple and optionally distributed processing flows using dynamically-linked object libraries, 2) the user interface implemented using Qt, and 3) 3D/2D interactive visualization server based on OpenGL. These components communicate with each other by means of an object protocol currently based on the Parallel Virtual Machine (PVM). Object and class libraries, templates, and software development utilities were developed, including an automated, distributed, web-based code update service.

At present, the scope of SIA applications includes reflection, refraction, and some earthquake seismology, numerical modeling, 2- and 3-D potential field processing and inversion, graphics, and web services. Several key algorithms, (for example, 1-D and 3-D finite-difference seismic modeling and 2-D ray tracing) were ported into the system.

After 10 years of its development, batch execution remains the main mode of SIA operation, and UNIX shells can be used to implement complex and fully reproducible data processing. Seismic Un*x sub-flows can be used in CMP processing applications, and GMT is extensively used for PostScript plotting. The new OpenGL visualization program allows building custom interactive user interfaces, such as for travel-time picking, ray tracing, and 2-D gravity modeling. Reproducibility of interactive processing may present some concern; however, with flexible parameterization and two-way interaction of the GUI with the calling processing flows, this goal is well within the capability of the framework and implementation presented here.

Joe Dellinger: DDS, A Seismic Processing Architecture

The Data Dictionary System (DDS1) is a seismic processing architecture. It has proved its value for both research and production in seismic imaging inside Amoco and BP for more than a decade. The software has been ported to more than a dozen hardware platforms and the data sets can be efficiently accessed across any combination of them. A key (and at this time, unique) feature of DDS is its ability to read and write almost any data format. This is primarily useful because it allows DDS to inter-operate with other processing systems, allowing DDS to serve as a bridge between different processing environments. However, in the spirit of reproducible research, it also allows DDS to document data so that it can be read and understood into the future. DDS was released by BP America in 2003 under an open-source license. The goal of this paper is to reveal useful innovations so that subsequent projects can benefit from DDS, as DDS has benefited from concepts from earlier processing systems.

Randy Selzler: PSEIS, A Processing Architecture Blueprint

The Parallel Seismic Earth Imaging System (PSEIS1) is a software package that is envisioned as a successor to DDS2. It will retain the high efficiency that made DDS suitable for production processing at a leading HPC facility. Its flexibility will continue to facilitate advanced research in seismic processing and imaging. It will also continue to inter-operate with other processing system in order to maximize total value. The new system provides an alternative to MPI for parallel processing on distributed-memory machines. By tailoring it specifically for seismic processing, applications will be easier to develop, maintain and execute, while remaining efficient and scalable. This scheme is described in a companion paper session “PSEIS, Blueprint for Parallel Processing.” The system supports parallel I/O by slicing data set dimensions across independent file systems. I/O can split headers and samples into separate file for special processing. Two user interfaces are provided that complement each other. Power users can leverage existing skills with scripts and casual users can benefit from the GUI. Object-Oriented technology is used to improve robustness, functionality and long term cost.

Randy Selzler: PSEIS, A Blueprint for Parallel Processing

The Parallel Seismic Earth Imaging System (PSEIS1) is a software package that is envisioned as a successor to DDS2. It provides an alternative to MPI for parallel programming on distributed (and shared) memory machines. The objective is to make it easier to develop, maintain and execute parallel applications that are efficient and scalable. This is accomplished by integrating the technology into the I/O architecture and by tailoring it specifically to seismic processing. The new scheme provides a strategy for fault-tolerant processing, check-point restarts, persistent data storage, out-of-core solvers, data monitoring and parallel debugging.

Joe Dellinger: Vplot graphics language - past, present, and future

Vplot is the graphical language used by the Stanford Exploration Project's "SEPlib" seismic processing system. Application programs such as "Graph", "Wiggle", and "Grey" write out specifications for plots in "vplot format". The appropriate vplot "pen" filter then reads in the vplot file and produces the desired plot on the corresponding graphical device.

Vplot was originally created to allow consistent plotting on any of a dozen or so different graphical devices, each of which could only be accessed using its own unique and proprietary programming interface. With so many devices (and now ones arriving every few months!), writing a separate version of each plotting program for each device would have been completely impractical. Today, however, that is exactly what we do: most graphical programs directly support "X", "postscript", and/or "bitmaps".

Vplot is still in use today at least partly because the challenge of supporting so many different incompatible devices imposed a discipline that produced a flexible and powerful internal programming logic. Even with the universe of plotting devices now collapsed down to only 3, vplot has been useful enough in making those 3 compatible to survive.

In my talk I will explain the "vplot virtual device" and show how it was meant to be used. I will show what I think are the "good ideas" vplot contains that even today have not been replicated in other systems. Foremost among these is the "4th canonical graphical device", the "capture the changes I've made and turn that back into a new figure" device ("vppen"). I will also show useful features in vplot that should still be used, but have been forgotten.

Finally, I will discuss "where should we go from here". If we wish to use vplot with Madagascar, now is our chance to update it!

Jeff Thorson: TKSU – Experiences with a graphical front end to SU

Seismic Unix (SU) is a readily available seismic processing package from the Center for Wave Phenomena, CSM, with a command-line-driven interface. TKSU is a graphical user interface to SU that provides the ability to interactively build a processing flow out of SU modules, set values for module parameters from menus, and create a shell script to execute the processing flow. Although developed specifically for the SU package, TKSU is loosely coupled to SU and can manage any set of programs that follows the command-line parameter conventions of SU. TKSU is written entirely in Tcl/Tk. In this talk I discuss the advantages and shortcomings I’ve experienced with a generic graphical front end to processing flow scripts.

Roscoe Bartlett: An Overview of the Thyra Interoperability Effort for Abstract Numerical Algorithms within Trilinos

The Trilinos Project is an effort to develop and implement robust parallel algorithms using modern object-oriented software design, while still leveraging the value of established numerical libraries such as PETSc, Aztec, the BLAS and LAPACK. It emphasizes abstract interfaces for maximum flexibility of component interchanging, and provides a full-featured set of concrete classes that implement all abstract interfaces. The number of Trilinos packages continues to grow along with the variety and sophistication of the numerical algorithms (and other numerical support software) contained in these packages. It is clear that the standalone use of these packages and algorithms is insufficient to solve tomorrows challenging multi-physics analysis and design problems. The Thyra effort seeks to develop a uniform set of Trilinos standard software interfaces and protocols by which any reasonable combination of Trilinos algorithms and other tools may be automatically supported; even configurations that the individual package and algorithm developers never even imagined. In this talk I will provide an overview of the Thyra effort discussing its context, requirements, history, current status, and future plans. Thyra is both a Trilinos package (named thyra) and a more general collaboration between Trilinos developers. The primary focus of Thyra is the support for abstract numerical algorithms (ANAs) such as iterative linear solvers (e.g. Belos), eigen solvers (e.g. Anasazi), nonlinear equation solvers (e.g. NOX), stability and bifurcation analysis (e.g. LOCA), ODE/DAE solvers (e.g. Rythmos), and constrained optimization (e.g. MOOCHO). The primary foundation for Thyra's ANA support is a minimal set of fundamental operator/vector interoperability interfaces expressed as abstract C++ classes. Using the foundation of these basic operator/vector interfaces, more sophisticated mathematical interfaces are being developed to address higher-level capabilities such as preconditioner factories, linear solver factories, nonlinear model evaluators, and nonlinear solvers.

Bill Spotz: PyTrilinos: A Python Interface to Trilinos

PyTrilinos is a python interface to selected Trilinos packages, discussed in the previous talk. Such an interface provides all the power of a high-level scripting language combined with the computational efficiency of compiled solver code. This makes it a suitable environment for both rapid prototyping and application development. Python is particularly well-suited for Trilinos, because the language was designed from the ground-up to be object-oriented, which allows for nearly one-to-one proxies of Trilinos C++ classes. This talk will give an overview of the current status of PyTrilinos, with some emphasis on efforts made to make PyTrilinos compatible with other scientific python software such as NumPy and SciPy. I will give a short interactive demonstration to convey the utility of the PyTrilinos interface, and will conclude with a proposal to provide a python interface to the Thyra package.

Bill Symes: A Software Framework for Inversion

This talk describes an experimental framework for algorithm organization and software development in support of research on geophysical inversion. The central assumptions of this project are that software components can be designed to closely mimic the mathematical concepts they implement, and that such mimicry eases algorithm construction and hypothesis-testing. I will report some applications of these ideas to modeling and inversion of reflection seismograms via finite difference modeling.

Felix J. Herrmann: (P)SLIMpy: a object-oriented abstraction for (Parallel) numerical linear algebra

This talk gives an overview of current efforts at SLIM towards an out-of-core and parallel Python library. This library contains abstractions of pipe-based std-in/std-out programs wrapped into elementwise vector and matrix-vector operators. With this framework, we are able to develop matlab-like abstract numerical linear algorithms that are reusable and void of coordinate information. I will describe our progress towards parallelization of embarrassingly parallelizable algorithms and towards algorithms that require domain decompositions. I will also show how the abstraction allows us to use the same code for out-of-core serial as well as out-of-core parallel (MPI) abstract numerical algorithms. Our library is an example of how to interoperate different software packages that include PETSC (parallel vectors), Madagascar (vector and matrix-vector operations), CurveLab (Curvelet transform) and Python implementations of numerical arrays. This is joined work with Sean Ross Ross, Darren Thomson and Henryk Modzelewski.

Speaker biographies


Roscoe Bartlett, PhD, Carnegie Mellon University 2001, is a computational scientist at Sandia National Laboratories. His thesis area was large scale optimization, successive quadratic programming methods, and large-scale object-oriented numerics. He started at SNL in 2001 and he works on a number of projects related to large-scale optimization. Roscoe is the head of the Thyra interoperability effort within Trilinos which seeks to develop and insure the interoperability of numerical algorithms both internal and extrnal to Trilinos from linear solvers all the way to nonlinear optimizers. He is also the lead developer of MOOCHO, a Trilinos package for massively parallel, simulation-constrained, nonlinear, deriative-based optimization.


Jon F. Claerbout (M.I.T., B.S. physics, 1960; M.S. 1963; Ph.D. geophysics, 1967), professor at Stanford University, 1967-present. Consulted with Chevron (1967-73). Best Presentation Award from the Society of Exploration Geophysicists (SEG) for his paper, "Extrapolation of Wave Fields." Honorary member and SEG Fessenden Award "in recognition of his outstanding and original pioneering work in seismic wave analysis." Founded the Stanford Exploration Project (SEP) in 1973. Elected a Fellow of the American Geophysical Union. Authored three published books (two translated to Russian and Chinese) and five internet books. Elected to the National Academy of Engineering. SEG's highest award, the Maurice Ewing Medal. Honorary Member of the European Assn. of Geoscientists & Engineers (EAGE). EAGE's highest recognition, the Erasmus Award.

Joe Dellinger graduated with a PhD in Geophysics from the Stanford Exploration Project in 1991 and currently works for BP in Houston, specializing in anisotropy and multicomponent seismology. Joe has often provided advice to the SEG (much of it unsolicited) on how they should best advance into the brave new online/digital world, for which he was awarded Life Membership in 2001. Joe currently is the editor of the Software and Algorithms section of GEOPHYSICS, and maintains the accompanying software and data website

Sergey Fomel is a research scientist at the Bureau of Economic Geology, University of Texas at Austin. Received a Diploma in Geophysics from the Novosibirsk State University and worked at the Institute of Geophysics in Novosibirsk, Russia in 1990-1994. Receiving a Ph.D. in Geophysics from Stanford University in 2001 and was a postdoctoral fellow at the Lawrence Berkeley National Laboratory before joining UT Austin. For six months in 1998, worked at Schlumberger Geco-Prakla in England. Received a J. Clarence Karcher award from SEG "for numerous contributions to seismology".

Gilles Hennenfent received in 2003 the Engineering diploma in applied physics from the École Nationale Supérieure de Physique de Strasbourg, France, and the M.Sc. in photonics, image and cybernetics from the Louis Pasteur University, France. Since 2004, he is a PhD student at the Seismic Laboratory for Imaging and Modeling at the University of British Columbia. His research interests include fast approximate algorithms and multi-scale methods applied to stable seismic signal recovery.

Felix J. Herrmann received the Ph.D. degree in engineering physics from the Delft University of Technology, the Netherlands, in 1997. He was a visiting scholar in 1998 at Stanford University, California, and a postdoctoral fellow at the Massachusetts Institute of Technology from 1999–2002. He is currently an assistant professor at the department of Earth & Ocean Sciences at the University of British Columbia, Canada, and the head of the Seismic Laboratory for Imaging and Modeling. He is interested in theoretical aspects of exploration and global reflection seismology. His research is directed towards creating a fundamental understanding of seismic imaging and inversion as well as establishing a direct link between local aspects of seismic reflectivity and major events in the geological and rock-physical processes that are responsible for rapid changes in Earth’s elastic properties.


Igor Morozov received his Ph.D. in Theoretical and Mathematical Physics (1985) and M.Sc. in Physics (1982) from Moscow State University (Russia). Worked at the Institute of Physics of the Earth (Russian Academy of Sciences, Moscow, Russia), the University of Wyoming and Rice University (U.S.A.). From 2002 – Professor of Geophysics at the University of Saskatchewan (Canada). Research interests include studies of the deep crust and mantle, in particular using ultra-long range nuclear-explosion profiles, reflection, wide-angle and earthquake seismology, seismic nuclear test monitoring, and also development of computational methods and geophysical software.

Paul Sava is an Assistant Professor of Geophysics and a member of the Center for Wave Phenomena at Colorado School of Mines. Prior to arriving to CSM, he was a Research Associate at the Bureau of Economic Geology, University of Texas (Austin). Paul holds an Engineering degree in Geophysics (1995) from the University of Bucharest, an M.Sc. (1998) and a Ph.D. (2004) in Geophysics from Stanford University where he was a member of the Stanford Exploration Project. He is a recipient of a Stanford Graduate Fellowship (1997-2000) and of three Awards of Merit for best student presentations at the SEG conventions (1999, 2001 and 2004). He has received a Honorable Mention in the category Best Paper in Geophysics (2003) for "Angle-domain common-image gathers by wavefield continuation methods", co-authored by Sergey Fomel. Paul's main research interests are in seismic imaging and velocity analysis using wavefield extrapolation techniques, computational methods for wave propagation, optimization and high performance computing.

Matthias Schwab -- Formerly manager at the The Boston Consulting Group serving clients in the field of Energy and Industrial Goods. Received PhD from Stanford University in exploration geophysics (SEP). Before he studied geophysics and mathematics at Technical University of Clausthal-Zellerfeld, and attended Rice University as a Fulbright student. During his studies he participated in seismic surveys in Western Europe, Kenya, and Alaska. After high-school he worked in a subsurface coal-mine. His 1999 PHD thesis on detecting faults in seismic 3-D images is programmed in Java and is reproducible.

Randy Selzler is currently President of Data-Warp, Inc. It was founded in 1999 and provides consulting services for seismic imaging software on High-Performance Computers (HPC). Randy has a BSEE from SDSU. He worked for Amoco at the Tulsa Research Center in the Geophysical Research Department for 24 years. This provided valuable exposure to bleeding edge geophysics, seismic processing and the people that make it happen. Randy’s primary interests include advanced processing architectures, High-Performance Computing and seismic imaging applications. He designed and implemented the Data Dictionary System (DDS) in 1995 for Amoco. It was released by BP in 2003 under an open-source license at DDS continues to provide the software infrastructure for advanced seismic imaging at one of the world’s largest geophysical HPC centers.

Bill Spotz, PhD, University of Texas 1995, is a computational scientist at Sandia National Laboratories. His thesis area was computational fluid dynamics with a focus on high-order numerical methods. He was a postdoc in the Advanced Studies Program and then a Project Scientist in the Scientific Computing Division at the National Center for Atmospheric Research in Boulder, Colorado. In 2001, he accepted his current position, where he works to apply high performance computing techniques to climate modeling and is the lead developer of PyTrilinos, a python interface for selected packages from the Trilinos Project, which are object-oriented PDE solvers.

Bill Symes, BA, UC Berkeley 1971, PhD Harvard 1975, both Mathematics. Joined Rice University in 1984, currently Noah Harding Professor of Computational and Applied Mathemaics. Director of The Rice Inversion Project, a university-industry research consortium for research on seismic inversion. Recipient of SIAM's Kleinman Prize in 2001, for contributions to analysis and numerical analysis of inverse problems and scientific software engineering. Managing Editor of Inverse Problems.

Jeff Thorson, a PhD graduate of Stanford University, was a participant in the Stanford Exploration Project in the early 1980’s. He has worked for Getty Oil, Sierra Geophysics and Amerada Hess Corp. in interpretation and processing program development. Since 1993, he and his wife, Marilee Henry, have been independent consultants based in Seattle, WA, specializing in seismic processing design and development.