The race is on! Although the official (December 1) deadline has passed, papers continue to be accepted for the special section Reproducible research: Geophysics papers of the future in Geophysics. As long as the paper can be reviewed in time for the desiugnated issue (November-December 2017), it will be accepted for the special section. Otherwise, it will appear in a regular issue of the journal.
If you use Madagascar to prepare your “paper of the future”, you can commit the code to the Madagascar repository and add a link to your submission. Alternatively, the code can be submitted as an attachment.
The concept of reproducible research, pioneered 25 years ago by Jon Claerbout, suggests the discipline of of attaching software code and data to scientific publications in order to enable the reader to verify, reproduce, and extend computational experiments described in the publication. A framework for reproducible research is provided by the Madagascar open-source software project, which was started 10 years ago. This special section will collect papers on different subjects in exploration geophysics united
by the discipline of reproducible research. Each paper in the section will be reviewed according to the guidelines of the Geophysics Software & Algorithms section, which means that not only the text of the paper but also its associated software codes will be examined by the reviewers, and the reproducibility of computational experiments will be independently verified. For more information, visit http://software.seg.org.
The recent Geoscience Papers of the Future (GPF) Initiative qualifies papers in the special section as Geophysics papers of the future. Supported by the National Science Foundation, “GPF is an initiative to encourage geoscientists to publish papers together with the associated digital products of their research. This means that a paper would include: 1) Documentation of data sets, including descriptions, unique identifiers, and availability in public repositories; 2) Documentation of software, including preprocessing
of data and visualization steps, described with metadata and with unique identifiers and pointers to public code repositories; [and] 3) Documentation of the provenance and workflow for each figure or result.” For more information, visit http://www.ontosoft.org/gpf/.
Use of the Madagascar framework is encouraged but not required, as long as the submitted paper satisfies the reproducibility conditions. Use of proprietary data is allowed as long as it is restricted to one section of the paper while other parts of the paper use publicly available or synthetically generated data.
Among other ideas and proposals, the 2016 US presidential candidates shared some thoughts on the issue of reproducible research in science.
Quoted from the traditional questionnaire in Scientific American, with emphasis added.
I believe federal policies can do even more to reinforce public trust in the integrity of science throughout the research enterprise. Though very rare, deliberate fraud in how scientists use public research dollars must be exposed, punished, and prevented. We can and will create further incentives to encourage scientists not only to maintain accountability and accuracy checks, but also to share data, code, and research results for reuse and support replication by others.
Science is science and facts are facts.
This is not a political endorsement. You can read the questionnaire to form your own opinion.
A high-profile workshop Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results was organized by the National Academies of Sciences and the National Science Foundation and took place in Washington (DC) last year. The workshop summary report was recently published by the National Academies Press.
Here is an extract, which lists recommendations from the panel discussion:
- Establish publication requirements for open data and code. Journal editors and referees should confirm that data and code are linked and accessible before a paper is published. (Keith Baggerly)
- Clarify strength of evidence for findings. The strength of evidence should be clearly stated for theories and results (in publications, press releases, etc.) to ensure that initial explorations are not misrepresented as being more conclusive than they actually are. (Keith Baggerly)
- Align incentives. Communities need to examine how to build a culture that rewards researchers who put effort into verifying their own results rather than quickly rushing to publication (Marcia McNutt)
- Improve training.
- Institutions need to make extra efforts to instill students with an ethos of care and reproducibility. (Marcia McNutt)
- Universities need to change the curriculum to incorporate topics such as version control, code review, and general data management, and communities need to revise their incentives to improve the chances of reproducible, trustworthy research in the future. Steps to improve the future workforce are necessary to keep the public trust of science. (Randy LeVeque)
- Many graduates are well steeped in open-source software norms and ethics, and they are used to this as a normal way of operating. However, they come into a scientific research setting where codes are not shared, transparent, or open; instead, codes are being built or constructed in a way that feels haphazard to them. This training disconnect can interfere with mentorship and with their continuation in science. Better understanding of these norms is needed in all levels of research (Victoria Stodden)
- Prevention and motivation need to be components of instilling the proper ethos. This could be part of National Institutes of Health (NIH)-mandated ethics courses. (Keith Baggerly)
- Clarify terminology. A clearer set of terms is needed, especially for teaching students and creating guidelines and best practices. Some examples of how to do this can be found within the uncertainty quantification community, which successfully clarified the terms verification and validation that were almost used synonymously 10-15 years ago. (Ronald Boisvert)
The authors of these recommendations are:
This blog is changing its appearance by moving its engine from Serendipity to WordPress. The new engine should make it more convenient to leave comments and to interact with the blog content. Markdown editing is enabled. The old blog will remain available until all links are properly redirected. According to wappalyzer.com, WordPress currently dominates the market of Content Management Systems (CMS). According to builtwith.com, it powers 50% of all websites on the entire Internet. This is another success story for free and open-source software. The WordPress software was originally developed by Matt Mullenweg and Mike Little and is released under the GPL license.
In a major administrative change, the code repository of the Madagascar project, which has been hosted by SourceForge for more than 9 years and more than 12,000 revisions, is being moved to GitHub. The location of the Madagascar project on GitHub is https://github.com/ahay/ The move has been discussed several times previously. The former major hub of open-source projects, SourceForge has been loosing popularity among open-source software developers and went through some bad publicity recently because of their practice of injecting malware into open-source projects. The final straw, which prompted our move, was the whole cite going down in July 2015 and taking more than a week to restore access to repositories. DHI Group, Inc. announced today its plans to sell the Slashdot Media business, which includes Slashdot and SourceForge.
GitHub brings a social-networking aspect to open-source software development, as well as many other useful tools and enhancements. Its success story was desribed in the recent article in Wired How GitHub Conquered Google, Microsoft, and Everyone Else.
The repository has been converted to Git, but if you prefer to use Subversion, you can continue to do so thanks to the svn bridge. See https://www.ahay.org/wiki/Download#Currentdevelopmentversion for instructions. If you need a developer access to commit changes directly to the master branch of the repository, please register at GitHub and send your GitHub login name to the project administrator. Everyone else should be able to participate in the project development by using Git’s preferred way of “pull requests”.
An article A Perfect Storm: The Record of a Revolution by Eric-Jan Wagenmakers, a mathematical psychologist from the Unversity of Amsterdam, describes a reproducibility revolution, which is taking place in psychology:
The dynamics of political revolutions are in some ways similar to the academic revolution that has recently gripped the field of psychology. Over the last two decades, increasing levels of competition for scarce research funding have created a working environment that rewards productivity over reproducibility; this perverse incentive structure has caused some of the findings in the psychological literature to be spectacular and counter-intuitive, but likely false […] The general dissatisfaction with the state of the field was expressed in print only occasionally, until in 2011 two major events ignited the scientific revolution that is still in full force today.
The article was published this month by the Inquisitive Mind (In-Mind) magazine.
Although some researchers are less enthusiastic about the “replicability movement” than others, it is my prediction that the movement will grow until its impact is felt in other empirical disciplines including the neurosciences, biology, economy, and medicine. The problems that confront psychology are in no way unique, and this affords an opportunity to lead the way and create dependable guidelines on how to do research well. Such guidelines have tremendous value, both to individual scientists and to society as a whole.
Yesterday (April 1, 2015) a group of computer scientists from UK (Neil Chue Hong, Tom Crick, Ian Gent, and Lars Kotthoff) announced a seminal paper Top Tips to Make Your Research Irreproducible.
Here are the tips that the authors share:
- Think “Big Picture”. People are interested in the science, not the dull experimental setup, so don’t describe it. If necessary, camouflage this absence with brief, high-level details of insignificant aspects of your methodology.
- Be abstract. Pseudo-code is a great way of communicating ideas quickly and clearly while giving readers no chance to understand the subtle implementation details (particularly the custom toolchains and manual interventions) that actually make it work.
- Short and sweet. Any limitations of your methods or proofs will be obvious to the careful reader, so there is no need to waste space on making them explicit\footnote. However much work it takes colleagues to fill in the gaps, you will still get the credit if you just say you have amazing experiments or proofs.
- The deficit model. You’re the expert in the domain, only you can define what algorithms and data to run experiments with. In the unhappy circumstance that your methods do not do well on community curated benchmarks, you should create your own bespoke benchmarks and use those (and preferably not make them available to others).
- Don’t share. Doing so only makes it easier for other people to scoop your research ideas, understand how your code actually works instead of why you say it does, or worst of all to understand that your code doesn’t actually work at all.
These tips will be undoubtedly embraced by all scientists trying to make their research irreproducible. The paper ends with an important conjecture:
We make a simple conjecture: an experiment that is irreproducible is exactly equivalent to an experiment that was never carried out at all. The happy consequences of this conjecture for experts in irreproducibility will be published elsewhere, with extremely impressive experimental support.
The paper Reproducible Research as a Community Effort: Lessons from the Madagascar Project was published in the January/February 2015 issue of Computing in Science and Engineering, a special issue on Scientific Software Communities.
Reproducible research is the discipline of attaching software code and data to publications, which enables the reader to reproduce, verify, and extend published computational experiments. Instead of being the responsibility of an individual author, computational reproducibility should become the responsibility of open source scientific-software communities. A dedicated community effort can keep a body of computational research alive by actively maintaining its reproducibility. The Madagascar open source software project offers an example of such a community.
The favorite tool of all Madagascar users, SCons, is featured as the December 2014 Community Choice Project of the Month at SourceForge.
SCons is a software construction tool (build tool, or make tool) implemented in Python, which uses Python scripts as configuration files for software builds. It is an easier, more reliable, and faster way to build software, solving a number of problems associated with other build tools, especially including the classic and ubiquitous make itself.
Distinctive features of SCons include: a modular design that lends itself to being embedded in other applications; a global view of all dependencies in the source tree; an improved model for parallel (-j) builds; automatic scanning of files for dependencies; use of MD5 signatures for deciding whether a file is up-to-date; use of Python functions or objects to build target files; and easy user extensibility.
A large number of open-source projects, companies, universities, and other scientific institutions use SCons as their build system, and are very happy with its stability and ease of maintenance. There are also several projects like Parts, PlatformIO, Madagascar, and FuDePAN, which use the SCons framework as a building block to provide highly specialized build environments to their users.
Back in 2006, when Madagascar became an open-source project, SourceForge was the dominant platform for such projects. Since then, it has remained a highly useful resource but has lost its popularity to GitHub.
Madagascar developers have not yet seen a compelling need to migrate the Madagascar repository from SourceForge to GitHub or to switch from Subversion (SVN) to Git, but will keep all options open.
Simultaneous editorials in Science and Nature state
Reproducibility, rigour, transparency and independent verification are cornerstones of the scientific method. Of course, just because a result is reproducible does not make it right, and just because it is not reproducible does not make it wrong. A transparent and rigorous approach, however, will almost always shine a light on issues of reproducibility. This light ensures that science moves forward, through independent verifications as well as the course corrections that come from refutations and the objective examination of the resulting data.
The editorials describe Proposed Principles and Guidelines for Reporting Preclinical Research developed this summer and endorsed by dozens of leading scientific journals publishing in the field of biomedical research. The guidelines focus on the issue of reproducibility of scientific experiments and include provisions for sharing data and software.
Nature explains its software sharing policy further in the following statement:
Nature and the Nature journals have decided that, given the diversity of practices in the disciplines we cover, we cannot insist on sharing computer code in all cases. But we can go further than we have in the past, by at least indicating when code is available. Accordingly, our policy now mandates that when code is central to reaching a papers conclusions, we require a statement describing whether that code is available and setting out any restrictions on accessibility. Editors will insist on availability where they consider it appropriate: any practical issues preventing code sharing will be evaluated by the editors, who reserve the right to decline a paper if important code is unavailable.
These changes in publication policies by the leading scientific journals may lead to a fundamental change in scientific standards for reproducibility of computational experiments in different fields.