The predictive power of modern science that led to today’s technologies is not a lucky accident. The scientific method is based on a set of rigurously defined principles. One of them is that experiments have to be described in sufficient detail to be reproducible by other researchers within statistically justified errorbars.
Let numerically-intensive sciences be defined as those brances of science and engineering in which the experiment is conducted entirely inside a computer. These sciences benefitted directly from the extraordinary increase in computing capacity of the last few decades. Every “lifting” of the “ceiling” of hardware constraints has brought with it new low-hanging fruit, and the corresponding rush to pick it. With it came disorganization. In the gold rushes of yore, disorganization did not matter too much as long as gold was found by the bucket, but scientific principles and industrial methods took over as soon as there was not much left to fight over. The scientific method and engineering practice are invoked by necessity whenever the going gets tough.
Likewise is the case of reproducibility in numerically-intensive sciences. The other thing that exploded together with hardware capacity was complexity of implementation, which became impossible to describe on the necessarily limited number of pages of a paper printed on… paper. I will use seismic processing and imaging as a concrete example from now on. Whoever assumes that most numerical experiments published in peer-reviewed papers are reproducible, should try their hand at reproducing exactly the figures and numerical results from some of these papers. What? No access to the input data? No knowledge of the exact parameters? Not even pseudocode for the main algorithm, just a few equations and two paragraphs of text? Have to take the researcher’s word that the implementation of a competing method for comparison is state-of-the art? Ouch. We get to miss the articles from the 1970’s, when the whole Fortran program fit on two pages. But who cares, as oil is being found?
I read with interest the section dedicated to wide/multi azimuth surveys in a recent number of First Break. Such surveys are at the leading edge today, but, inevitably, they will become commoditized in a few years. It is worth noting that the vast majority of the theory they are based on has been mature since the mid-80’s. Indeed, such surveys are just providing properly sampled data for the respective algorithms. What next, though, after this direction gets exploited? What is left? Going after amplitudes? Joint inversions? Anisotropy? Absorbtion? Mathematically more and more complex imaging methods? 9-C? 10-C? The objective soul with no particular technology to sell cannot help seeing diminishing returns, increasing complexity, and the only way out being through more data.
The “more data” avenue will indeed work well, and finally fulfil some long-standing theoretical predictions which did not work out for lack of sampling, but guess what: after we’ll get to full-azimuth over-under Q system surveys, with receivers in the well/water column and on the ocean botom too, the “more data” avenue for improvement will shut. That may happen even sooner, should oil prices or exploration resource constraints make it uneconomical. Then, in order to take our small steps forward, we will need to start getting pedantic about the small details such as the scientific method and reproducibility. And it is then that we will discover that the next low-hanging fruit is actually reproducibility.
No matter how big today’s gap between having just a paper and having its working implementation, this gap is actually easy to fill. That is because the filling has existed and was thrown away. The scripts, data and parameter files needed to create the figures have existed, and the methodical filing of them incurs only a small marginal time cost. All is needed is to chain those scripts together to perform automatically. This means that, for the first time, it is possible to give to other researchers not only the description of the experiment, but the experiment itself. Really. As if a physicist could clone his actual lab, complete with a copy of himself that can redo the experiment. Imagine the productivity boost that this would provide to physicists! Such boosts are actually available to geophysicists. Experiments that were once at the frontiers of science will be commoditized, and the effort moved to where more value is added. The frontiers will be pushed forward.
What is needed for a reproducible numerical experiment? To start with, publicly available datasets. Thankfully, those already exist. The next step is having reference implementations of geophysical algorithms, that anybody can take and implement their additions on, and use for comparisons. In other words, any package released under an open-source license as defined by the OSI. The software should foster a community that will balance its interests so that all advance. To illustrate with a well-known debate (now winding down), Kirchhoff proponents would want others to have a good quality Kirchhoff implementation as a comparison to their Wavefield-Extrapolation algorithms, and the WE people would do the same for their own methods, and in the end all reference implementations will be very good. The network effect will enhance the usefulness, completness and quality of the software: the more participants, the more and better algorithms and documentation. The tragedy of commons in reverse. For more reasons on why particular people and institutions in our field would adopt open source, see Why Madagascar.
I believe Madagascar does have the incipient qualities for becoming the Reference Implementation of Geophysical Algorithms. Should we have such high ambitions? Should we adopt this as a easy-to-remember motto? Madagascar: the Reference Implementation of Geophysical Algorithms. I think it sounds good, and something like that in big bold letters is needed on the main page, above the tame and verbose mission statement. Maybe some day Madagascar will fulfill the motto and grow enough to become institutionalized, i.e. become a SEG Foundation (SF) or something 🙂
This was fun – but I want to rehash the main idea of this rant: in a few years algorithm details will start to matter, people will need reproducibility to get ahead, and Madagascar is ideally positioned to take advantage of this!
Are the times good, or what?