I will begin now acting on an older idea that I espoused on the blog — discussing topics of programming and software engineering related to HPC in general and Madagascar in particular. Comment feedback is more than welcome 🙂
Today’s discussion is about the necessity of light object orientation in many geophysical applications. Like medicine, object orientation can be very useful, but when used inappropriately or in large quantities can lead to severe problems. One necessity for this medicine is the case when the same algorithm applies to 2D and 3D data, with minimal changes other than adding an extra loop. Geophysicists writing code usually either create two versions of the program, or they try to only write a 3-D version that also deals with 2-D data. Creating two versions of the program leads to severe maintenance issues in the long term. The potential for bugs and forking the code is enormous. There are also other variations (anisotropy, alternative kernels, etc), so the amount of code duplication can easily explode. Writing a 3-D version that also works on 2-D data is a better option, however this usually requires contortions to introduce length-1 axes upfront in some of the inputs, and may create outputs with different dimensions for 2-D vs. 3-D, breaking workflows. The biggest block, however, comes from the fact that the 3-D code is more verbose, harder to understand and keep in mind in its entirety, and more prone to bugs. This is why people first implement a feature in the 2-D program, test it, understand it, and then upgrade the 3-D to it. Also, the simplicity of 2-D programs comes in handy when trying to explain the algorithm to other people. The way to both have the cake and eat it is through overloading (link) and light object orientation. This makes it possible to write a program that looks like a 2-D program, but also works on 3-D data. Neither C nor F90 are truly object-oriented, but they do have user-defined structures and they have static global variables that can be used to remember whether the problem is 2-D or 3-D. These features should be enough to work out a solution for this light OO problem. To convert pairs of programs that already exist as both 2D and 3D, one would first define structures that have both x and y members (i.e. offset.x and offset.y), then convert both programs to use those instead of the native types (i.e. offset_x and offset_y). Then, a test for whether the data is 2-D or 3-D is implemented, and the value preserved as a global variable so it can be accessed by various procedures without being passed as an explicit argument. Then, code sequences that have extra operations for 3D are moved into functions that are in effect overloaded, so they call a 3-D operator for 3-D data, and 2-D operator for 2-D data. In the end, the 2-D and 3-D main programs should look the same, and what is left is removing one file, renaming another, and keeping only one version.
What do you think is the best way to avoid code duplication in geophysical applications?