Several enhancements have been added to Madagascar’s Python interface.
Behind the scene, temporary files are created, and Madgascar programs run in the usual way, but, for the user, they appears like native Python functions. This way, the full power of Madagascar becomes available to people who prefer to work on data analysis projects in a Python environment.
SConstructscripts are written in Python, they are easy to adapt for including Python functions in place of command-line instructions. See an example of using Keras with SCons or an example of using PyTorch with SCons.
In deep learning projects, the training data, the neural-network model, and the testing data can be treated as files and handled effectively through SCons workflows while mixing with Madagascar commands and workflows.
Madagascar users are invited to try the new functionality and contribute to its further development.
Joe Dellinger, the author of Vplot, suggests adjusting parameters for raster figures when including them in Word documents. He writes:
Wow, working on my SEG abstract I had a helluva time getting my vplot raster figures to look decent in word. Then I realized… wait a minute, it’s doing just the bad things plotters back in the 80’s were doing. I fiddled a little with pixc and greyc, and voila! Beautiful raster figures.
From the Vplot documentation:
pixc is used only when dithering is being performed, and also should only be used for hardcopy devices. It alters the grey scale to correct for pixel overlap on the device, which (if uncorrected) causes grey raster images to come out much darker on paper than on graphics displays.
greyc is used only when dithering is being performed, and really should only be used for hardcopy devices. It alters the grey scale so that grey rasters come out on paper with the same nonlinear appearance that is perceived on display devices.
The default values are pixc=1 greyc=1. The values used by Joe in his Word document were pixc=1.15 greyc=1.25.
To convert Vplot plots to other forms of graphics, you can use vpconvert.
Continuous Integration (CI) is a powerful discipline of software engineering, which involves a shared code repository, where developers contribute frequently (possibly several times per day), and an automated build system which includes testing scripts.
As previously suggested, CI tools can be easily adopted to perform continuous reproducibility: repeatedly testing if previously reproducible results remain reproducibe after software changes. Continuous reproducibility can assure that reproducible documents stay “alive” and continue to be usable.
Numerous tools have appeared in recent years to offer CI services in the cloud: Travis CI, Semaphore, Codeship, Shippable, etc. It is hard to choose one. I would pick CircleCI. CircleCI is developed by a startup company from San Francisco. Its product is not fundamentally different from analogous services but provides a solid implementation, which includes:
Let us test if it can serve as a good platform for Madagascar’s continuous reproducibility.
Julia is a new open-source programming/scripting language designed for high-performance scientific computing. The goal is to combine the simplicity of Python with the performance approaching that of statically-compiled languages like C.
Julia has a number of other attractive features including:
#!/usr/bin/env julia using m8r m8r.init() inp = m8r.input("in") out = m8r.output("out") n1 = m8r.histint(inp,"n1") n2 = m8r.leftsize(inp,1) clip = m8r.getfloat("clip") trace = Array(Float32,n1) for i2 in 1:n2 m8r.floatread(trace,n1,inp) trace = clamp(trace,-clip,clip) m8r.floatwrite(trace,n1,out) end
Compare it with scripts or programs in other languages.
The most popular colormap in Madagascar, other than the default greyscale, is color=j, modeled after “jet“, which used to be the default colormap in MATLAB. More than 1,000 Madagascar examples use color=j. In October 2014, with release R2014b (Version 8.4), MATLAB switched the default colormap to a different one, called “parula“. The “parula” colormap is copyrighted by MathWorks as a result of a creative process (solving an optimization problem). No open-source license is given to use it outside of MATLAB. According to Steve Eddins, “this colormap is MathWorks intellectual property, and it would not be appropriate or acceptable to copy or re-use it in non-MathWorks plotting tools.” Stéfan van der Walt and Nathaniel Smith from the Berkeley Institute for Data Science have developed several new open-source colormaps with good perceptual properties. One of them (named “viridis“) is proposed as a good replacement for “jet” and as the default colormap in matplotlib 2.0. Is it a good colormap? We can find out by using tools from Matteo Niccoli’s tutorial on colormaps. This analysis shows the intensity and lightness distributions of “viridis” are nicely linear. In his presentation at SciPy-2015, Nathaniel Smith explains the rational for this choice.
Claerbout’s principle of reproducible research, as formulated by Buckheit and Donoho (1995), states:
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.
The geophysics class in the SEGTeX package features a new option: reproduce, which attaches SConstruct files or other appropriate code (Matlab scripts, Python scripts, etc.) directly to the PDF file of the paper, with a button under every reproducible figure for opening the corresponding script. Unfortunately, not every PDF viewer supports this kind of links. The screenshot below shows evince viewer on Linux, where clicking the button opens the file with gedit editor.
Literate programming is a concept promoted by Donald Knuth, the famous computer scientist (and the author of the Art of Computer Programming.) According to this concept, computer programs should be written in a combination of the programming language (the usual source code) and the natural language, which explains the logic of the program.
When it comes to scientific programming, using comments for natural-language explanations is not always convenient. Moreover, it is limited, because such explanations may require figures, equations, and other common elements of scientific texts. IPython/Jupyter notebooks provide a convenient tool for combining different text elements with code. See the notebook at https://github.com/sfomel/ipython/blob/master/LiterateProgramming.ipynb for an example on how to implement literate programming using an IPython notebook with reproducible SConstruct data-analysis workflows in Madagascar.
As an alternative to installing Madagascar, you can now run a Crunchbang (Debian) virtual machine (VM) with it pre-installed. Just download, unzip, and run the file with Oracle VirtualBox (free software). Detailed instructions for running the VM for the first time or installing VirtualBox can be found in the readme.
MadagascarVM.zip (~3.0 GB)
MadagascarVM.7z (~2.1 GB, but requires 7zip to unpack)
SageMathCloud provides a rich environment, which allows one, for example, to easily install Madagascar and to access it interactively through its Python interface. The example above shows Madagascar running interactively in the cloud using an IPython notebook hosted by SageMathCloud. Support for interactive widgets is a new feature in IPython version 2 released earlier this year.