Behind the scene, temporary files are created, and Madgascar programs run in the usual way, but, for the user, they appears like native Python functions. This way, the full power of Madagascar becomes available to people who prefer to work on data analysis projects in a Python environment.
However, there is no good reason to abandon Madagascar’s use of SCons for managing data analysis workflows even when working in a Python framework. Because SConstruct scripts are written in Python, they are easy to adapt for including Python functions in place of command-line instructions. See an example of using Keras with SCons or an example of using PyTorch with SCons.
In deep learning projects, the training data, the neural-network model, and the testing data can be treated as files and handled effectively through SCons workflows while mixing with Madagascar commands and workflows.
Plotting with Matplotlib may offer some advanced functionality in comparison with Vplot, such as the possibility of using $\LaTeX$ code in figure labels. It is now possible to use Matplotlib plots in papers reproducible with Madagascar through an application of sfmatplotlib. The figures will be saved in the PDF format and included in reproducible papers in the usual way. See an example.
Joe Dellinger, the author of Vplot, suggests adjusting parameters for raster figures when including them in Word documents. He writes:
Wow, working on my SEG abstract I had a helluva time getting my vplot raster figures to look decent in word. Then I realized… wait a minute, it’s doing just the bad things plotters back in the 80’s were doing. I fiddled a little with pixc and greyc, and voila! Beautiful raster figures.
pixc is used only when dithering is being performed, and also should only be used for hardcopy devices. It alters the grey scale to correct for pixel overlap on the device, which (if uncorrected) causes grey raster images to come out much darker on paper than on graphics displays.
greyc is used only when dithering is being performed, and really should only be used for hardcopy devices. It alters the grey scale so that grey rasters come out on paper with the same nonlinear appearance that is perceived on display devices.
The default values are pixc=1 greyc=1. The values used by Joe in his Word document were pixc=1.15 greyc=1.25.
To convert Vplot plots to other forms of graphics, you can use vpconvert.
Continuous Integration (CI) is a powerful discipline of software engineering, which involves a shared code repository, where developers contribute frequently (possibly several times per day), and an automated build system which includes testing scripts.
As previously suggested, CI tools can be easily adopted to perform continuous reproducibility: repeatedly testing if previously reproducible results remain reproducibe after software changes. Continuous reproducibility can assure that reproducible documents stay “alive” and continue to be usable.
Numerous tools have appeared in recent years to offer CI services in the cloud: Travis CI, Semaphore, Codeship, Shippable, etc. It is hard to choose one. I would pick CircleCI. CircleCI is developed by a startup company from San Francisco. Its product is not fundamentally different from analogous services but provides a solid implementation, which includes:
Julia is a new open-source programming/scripting language designed for high-performance scientific computing. The goal is to combine the simplicity of Python with the performance approaching that of statically-compiled languages like C.
Julia has a number of other attractive features including:
Dynamic type system
Powerful shell-like capabilities for managing other processes
Designed for parallelism and distributed computation
Automatic generation of efficient, specialized code for different argument types
Elegant and extensible conversions and promotions for numeric and other types
The most popular colormap in Madagascar, other than the default greyscale, is color=j, modeled after “jet“, which used to be the default colormap in MATLAB. More than 1,000 Madagascar examples use color=j. In October 2014, with release R2014b (Version 8.4), MATLAB switched the default colormap to a different one, called “parula“. The “parula” colormap is copyrighted by MathWorks as a result of a creative process (solving an optimization problem). No open-source license is given to use it outside of MATLAB. According to Steve Eddins, “this colormap is MathWorks intellectual property, and it would not be appropriate or acceptable to copy or re-use it in non-MathWorks plotting tools.” Stéfan van der Walt and Nathaniel Smith from the Berkeley Institute for Data Science have developed several new open-source colormaps with good perceptual properties. One of them (named “viridis“) is proposed as a good replacement for “jet” and as the default colormap in matplotlib 2.0. Is it a good colormap? We can find out by using tools from Matteo Niccoli’s tutorial on colormaps. This analysis shows the intensity and lightness distributions of “viridis” are nicely linear. In his presentation at SciPy-2015, Nathaniel Smith explains the rational for this choice.
Claerbout’s principle of reproducible research, as formulated by Buckheit and Donoho (1995), states:
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.
The geophysics class in the SEGTeX package features a new option: reproduce, which attaches SConstruct files or other appropriate code (Matlab scripts, Python scripts, etc.) directly to the PDF file of the paper, with a button under every reproducible figure for opening the corresponding script. Unfortunately, not every PDF viewer supports this kind of links. The screenshot below shows evince viewer on Linux, where clicking the button opens the file with gedit editor.
Literate programming is a concept promoted by Donald Knuth, the famous computer scientist (and the author of the Art of Computer Programming.) According to this concept, computer programs should be written in a combination of the programming language (the usual source code) and the natural language, which explains the logic of the program.
When it comes to scientific programming, using comments for natural-language explanations is not always convenient. Moreover, it is limited, because such explanations may require figures, equations, and other common elements of scientific texts. IPython/Jupyter notebooks provide a convenient tool for combining different text elements with code. See the notebook at https://github.com/sfomel/ipython/blob/master/LiterateProgramming.ipynb for an example on how to implement literate programming using an IPython notebook with reproducible SConstruct data-analysis workflows in Madagascar.
As an alternative to installing Madagascar, you can now run a Crunchbang (Debian) virtual machine (VM) with it pre-installed. Just download, unzip, and run the file with Oracle VirtualBox (free software). Detailed instructions for running the VM for the first time or installing VirtualBox can be found in the readme. Downloads: README.txt MadagascarVM.zip (~3.0 GB) MadagascarVM.7z (~2.1 GB, but requires 7zip to unpack)
SageMathCloud provides a rich environment, which allows one, for example, to easily install Madagascar and to access it interactively through its Python interface. The example above shows Madagascar running interactively in the cloud using an IPython notebook hosted by SageMathCloud. Support for interactive widgets is a new feature in IPython version 2 released earlier this year.
Continuous Integration (CI) is a technique in software engineering, usually described as one of the common techniques in extreme programming (XP). CI implies maintaining a shared code repository, where developers contribute frequently (possibly several times per day), and an automated build system that includes testing scripts.
Using a CI tool, such as TeamCity, it is easy to implement CI for Madagascar, which includes both compilation tests and reproducibility tests. One of the computers at the University of Texas at Austin has been dedicated to such testing and has helped to detect several bugs and reproducibility problems. You can subscribe to testing reports using the RSS feed. To implement similar testing on your own computer, install TeamCity and configure it as follows:
2. Configure Build Step to use a command-line script such as the following:
3. Go out for a cup of coffee and come back to check the results of testing. On a CI server, the run of the “compile & test” script is triggered every time somebody commits a new change to the repository.
4. Fix detected problems and commit your changes back to the repository to continue the integration loop.
Adopting the technique of Continuous Integration, in combination with reproducibility testing, provides a robust development environment with well-debugged code and continuously-maintained reproducible examples. It should encourage an active participation of the Madagascar development community.