Madagascar Development Blog

Reproducible Research 2012

July 23, 2012 Links No comments

Three years after the first special issue on reproducible research, Computing in Science and Engineering published another special issue this month. The material for this issue comes from the 2011 AMP workshop Reproducible Research: Tools and Strategies for Scientific Computing organized by Randy Leveque, Ian Mitchell, and Victoria Stodden. In an editorial paper, the workshop organizers write:

The principal goal of these discussions and workshops is to develop publication standards akin to both the proof in mathematics and the deductive sciences, and the detailed descriptive protocols in the empirical sciences (the methods section of a paper describing the mechanics of the controlled experiment and hypothesis test). Computational science is only a few decades old and must develop similar standards, so that other researchers in the field can independently verify published results.

There are not too many examples of sucessful open-source projects in scientific software. One of them is Sage, a mathematical software system aimed at creating a viable free open-source alternative to Magma, Maple, Mathematica and Matlab.

In a recent interview, the developers of Sage gave the following answer to the question Why do you consider free/libre open source software important for the advancement of your field?

[…] When we contribute to mathematics, it is important to contribute both the results and the methods. When software plays an essential role in research, this is a valuable part of the public contribution. If other mathematicians cant learn how it works, modify it, and use it for new purposes, then there is a serious loss of value.

[…] Second topic: Unifying the ecosystem. There are several different commercial software tools. Software libraries developed by one research group based on one of them is unusable by other research groups. Students and universities dont have the money to finance all of this. A unified ecosystem also has the advantage that anyone can better understand the code written by others.

[…] Third topic: Quality of research. If you consider including your code directly into the code of the whole Sage project, it will be part of it for the foreseeable future. This means, the code must meet a certain level of quality and there need to be a set of tests for each part of the contributed code. In preparation for each new release of Sage, it is made certain, that all those tests pass on all supported systems. Therefore, your code continues to be functional and probably actively maintained, even though you have stopped working on it. This is very different from the following situation, where you publish a half-working code once on your private website, and stop maintaining it.

[…] Fourth topic: Accessibility (Cost). We want the code we write to be usable by students or researchers without access to large department budgets. People who cant (or dont want to) afford an expensive software license are restricted from using work developed for that software. This is a restriction on users and on developers. Also, a related example are 3rd world and emerging countries. Just think about the importance of the freely accessible Wikipedia for education via projects like e.g. OLPC. The very same holds true for higher education and in our case for the accessibility to advanced mathematical software.

Naturally, some of this discussion applies to Madagascar as well.

	Sage	Madagascar
First public release	2005	2006
Main language	Python	C
License	GPL	GPL
Lines of code (per ohloh.net)	nearly 500,000	more than 500,000
Contributors (per ohloh.net)	more than 500	more than 50
User mailing list	more than 2,000 (sage-support)	more than 200 (RSF-user)
Developer mailing list	nearly 1,500 (sage-devel)	more than 70 (RSF-devel)

For integration of Madagascar and Sage, one can use the Python interface.

Performance evaluation of SU and Madagascar

May 1, 2012 Links No comments

The paper Performance Evaluation of Open Source Seismic Data Processing Packages by Izzatdin A. Aziz, Andrzej M. Goscinski, and Michael M. Hobbs from Deakin University was presented at the 11th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2011).

“The goal in this paper was to demonstrate the capability of open source packages, SU and Madagascar, to execute a sequence of seismic functions representing the actual industrial work process. We succeeded in this by conducting two sets of tasks. First, the investigation of the problem, whether or not open source seismic data processing packages can be executed using the same set of seismic data through data format conversions. Second, whether or not they can achieve reasonable performance and speedup when execute parallel seismic functions on a HPC cluster.”

“The case for open computer programs” in Nature magazine

February 22, 2012 Links No comments

A recent article by Darrel C. Ince, Leslie Hatton, and John Graham-Cumming.

Scientific communication relies on evidence that cannot be entirely included in publications, but the rise of computational science has added a new layer of inaccessibility. Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail.

Read full article here.
Les Hatton was in computational geophysics research for 15 years. He is the primary author of a great introductory book “Seismic Data Processing: theory and practice”. He switched careers after that to study software and systems failure.

Elusive Goal

January 29, 2012 Links No comments

Scientists’ Elusive Goal: Reproducing Study Results , an article published on the first page of the Wall Street Journal, describes the crisis of scientific reproducibility in bio-medical research.

Many of the issues described in the article sound familiar.

…Reproducibility is the foundation of all modern research, the standard by which scientific claims are evaluated. In the U.S. alone, biomedical research is a $100-billion-year enterprise. So when published medical findings can’t be validated by others, there are major consequences…

…There is also a more insidious and pervasive problem: a preference for positive results…

…Some studies can’t be redone for a more prosaic reason: the authors won’t make all their raw data available to rival scientists…

Geophysical research does not affect human lives directly but its quality can suffer from non-reproducibility in very much the same way.

Beijing survey

January 1, 2012 Links No comments

A short user survey was conducted after the 2011 Madagascar School in Beijing.

The results are overwhelmingly positive. 100% of those who responded to the survey stated that they would be interested in attending Madagascar events in the future, and 100% would recommend it to their colleagues. As for the location of a future event, 95% suggested Beijing again. Many of those surveyed liked the excellent organization of the school. They disliked the fact that the school was too short and did not have enough space to accommodate all interested students. Organizers of future schools should take all survey suggestions into account.

Science Code Manifesto

October 15, 2011 Links 1 comment

You can endorse or discuss Science Code Manifesto published this week at http://sciencecodemanifesto.org/

Software is a cornerstone of science. Without software, twenty-first century science would be impossible. Without better software, science cannot progress.
But the culture and institutions of science have not yet adjusted to this reality. We need to reform them to address this challenge, by adopting these five principles:

Code

All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.

Copyright

The copyright ownership and license of any released source code must be clearly stated.

Citation

Researchers who use or adapt science source code in their research must credit the codes creators in resulting publications.

Credit

Software contributions must be included in systems of scientific assessment, credit, and recognition.

Curation

Source code must remain available, linked to related materials, for the useful lifetime of the publication.

Nick Barnes, the author of the Manifesto, explains its creation as follows:

I wrote it for the Climate Code Foundation, initially as a response and contribution to the Royal Societys policy study on Science as a Public Enterprise. It is partly inspired by the Panton Principles, a bold statement of ideals in scientific data sharing. It refines the ideas I laid out in an opinion piece for Nature in 2010.
However, I did not originate these ideas. They are simply extensions of the core principle of science: publication. Publication is what distinguishes science from alchemy, and is what has propelled science and human society so far and so fast in the last 300 years. The Manifesto is the natural application of this principle to the relatively new, and increasingly important, area of science software.
My own ideals, influenced by the Free and Open Source Software movement, go beyond those stated in the Manifesto: I believe that Open Source publication of all science software will be one outcome of the current revolution in scientific methods, a revolution in which I hope this Manifesto will play a part.

Open-source governance

August 17, 2011 Links 1 comment

In a recent report, VisionMobile points out that Open Source is not only about licensing but also about the governance model adopted by open-source projects.

The governance model used by an open source project encapsulates all the hard questions about a project. Who decides on the project roadmap? How transparent are the decision-making processes? Can anyone follow the discussions and meetings taking place in the community? […] Governance determines who has influence and control over the project or platform beyond what is legally required in the open source license.

The governance model adopted by Madagascar is exceptionally flat and open. So far, 50 people have been given write access to the Subversion repository. Nobody who asked for an access has been denied it so far. Everyone of the 50 developers has equal rights to add, remove, or modify code. We coordinate our efforts through the developer mailing list and annual meetings. This open governance model is a distinctive feature of the Madagascar project, which should be emphasized when comparing it with other projects. VisionMobile states in its report: “Our research suggests that platforms that are most open will be most successful in the long-term.”

Executable papers

June 26, 2011 Links No comments

In addition to six different workshops and special sessions devoted to reproducible research, an important event of this year is the Executable Paper Grand Challenge organized by Elsevier.

The Grand Challenge was a “contest created to improve the way scientific information is communicated and used”. Many of the participants focused on implementing reproducible research. The winners were announced this month at the International Conference on Computational Science in Singapore, with winning entries, as well as other solutions, published in Procedia Computer Science.

The Madagascar approach to Reproducible Papers works but is starting to show its age. Perhaps we could learn from other people on how to make it more modern.

Reproducible Research conferences in 2011

January 12, 2011 Links No comments

A sign of reproducible research becoming a mainstream idea is six different events happening this year:

Minisymposium The Digitization of Science: Reproducibility and Interdisciplinary Knowledge Transfer at AAAS Annual Meeting in Washington (organized by Victoria Stodden from Columbia University) on February 19:
http://aaas.confex.com/aaas/2011/webprogram/Session3166.html
Minisymposium Verifiable, Reproducible Research and Computational Science at SIAM CSE conference in Reno (organized by Jarrod Millman from UC Berkeley) on March 4:
http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=11844
http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=11845
Minisymposium Reproducible Science and Open-Source Software in the Geosciences at SIAM Geosciences conference in Long Beach (organized by Bernd Flemisch from University of Stuttgart, Kristin Flornes from IRIS, and Atgeirr Rasmussen from SINTEF) on March 22-23:
http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=11822
http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=11823
Special session Reproducible Research at Interface 2011 (Statistical, Machine Learning, and Visualization Algorithms) in Cary (organized by Jürgen Symanzik from Utah State University) on June 1:
http://www.interfacesymposia.org/Interface2011/
Workshop Reproducible Research: Tools and Strategies for Scientific Computing at Applied Mathematics Perspectives conference in Vancouver (organized by Randall LeVeque from University of Washington, Ian Mitchell from UBC, Cleve Moler from Mathworks, and Victoria Stodden from Columbia University) on July 13-16:
http://www.mitacs.ca/goto/amp_reproducible
Minisymposium Reproducible Research in Computational Science: What, Why and How at ICIAM in Vancouver (organized by Randall LeVeque from University of Washington, Ian Mitchell from UBC, and Victoria Stodden from Columbia University) on July 18-22:
http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=11435

Links