## Petition to raise awareness about the role of software in research

The Software Sustainability Institute in the UK has created an online petition to “everyone in the research community”, which states “We must accept that software is fundamental to research, or we will lose our ability to make groundbreaking discoveries.”

1. We want software to be treated as a valuable research object which befits the same level of investment and effort as any other aspect of the research infrastructure.
2. We want researchers to be encouraged to spend time learning about software, because the value of that knowledge is understood to improve research.
3. We want the people who develop research software to be recognised and rewarded for their invaluable contribution to research.
4. We want a research environment in which software-reliant projects are encouraged to hire software developers, rather than having to hide these valuable staff members in anonymous postdoctoral positions.
5. Ultimately, we want the research community to recognise softwares fundamental role in research.

You can sign the petition at Change.org.

## High-performance computing and open-source software

A recent Report on High Performance Computing by the US Secretary of Energy Advisory Board contains a bizarre section on open source software, which states

There has been very little open source that has made its way into broad use within the HPC commercial community where great emphasis is placed on serviceability and security.

In his thoughtful blog post in response to this report, Will Schroeder, the CEO an co-founder of the legendary Kitware Inc. makes a number of strong points defending the role of open source in the past and future development of HPC. He concludes

The basic point here is that issues of scale require us to remove inefficiencies in researching, deploying, funding, and commercializing technology, and to find ways to leverage the talents of the broader community. Open source is a vital, strategic tool to do this as has been borne out by the many OS software systems now being used in HPC application… Its easy to overlook open source as a vital tool to accomplish this important goal, but in a similar way that open source Linux has revolutionized commercial computing, open source HPC software will carry us forward to meet the demands of increasingly complex computing systems.

## Reproducibility workshop @ XSEDE

XSEDE (Extreme Science and Engineering Discovery Environment) is hosting a workshop on reproducibility as a full-day event on Monday, July 14, 2014, during the XSEDE conference in Atlanta, Georgia. The workshop promises to address the issue of the reproducibility crisis in computational science addresses during the Data and Code Sharing Roundtable at Yale in 2009.

XSEDE is the world’s largest, most comprehensive distributed cyberinfrastructure for open scientific research, which integrates high-performance computers and other facilities around the US.

A paper describing Madagascar has been published in the Journal of Open Research Software (JORS), a new peer-reviewed open-access journal, which features papers describing research software with high reuse potential.

This paper should become a standard reference for those who use Madagascar in their research and wish to reference it in scientific publications. Following a recommendation of Robin Wilson, a file called CITATION.txt is placed in the top Madagascar directory to provide reference information. Here are the current contents of this file:

Fomel, S., Sava, P., Vlad, I., Liu, Y. and Bashkardin, V. 2013. Madagascar:
open-source software project for multidimensional data analysis and
reproducible computational experiments. Journal of Open Research
Software 1(1):e8, DOI: http://dx.doi.org/10.5334/jors.ag

@Article{m8r,
author = {S. Fomel and P. Sava and I. Vlad and Y. Liu and
V. Bashkardin},
title = {Madagascar: open-source software project for
multidimensional data analysis and reproducible
computational experiments},
journal =      {Journal of Open Research Software},
year =      2013,
volume =      1,
number =      1,
pages =      {e8},
doi =          {http://dx.doi.org/10.5334/jors.ag}}


It is hard to give proper credit to everyone who contributed to such as collaborative project, as Madagascar. Even the smallest contribution can be crucially important. The five authors of the paper are the five most active all-time contributors to Madagascar by the number of commits to the repository at the time of the paper submission.

## Trust, but verify

“Doveryai, no proveryai” (translated as “trust, but verify”) is a Russian proverb, which became a signature phrase of Ronald Reagan during his nuclear-disarmament negotiations with Mikhail Gorbachev.

Last week, this phrase was used by *The Economist * to describe the troublesome state of modern scientific research. The editorial article How science goes wrong states

“A SIMPLE idea underpins science: trust, but verify. Results should always be subject to challenge from experiment. That simple but powerful idea has generated a vast body of knowledge. Since its birth in the 17th century, modern science has changed the world beyond recognition, and overwhelmingly for the better. But success can breed complacency. Modern scientists are doing too much trusting and not enough verifyingto the detriment of the whole of science, and of humanity.”

The article goes on to describe the problems of non-reproducible unverifiable science

“Too many of the findings that fill the academic ether are the result of shoddy experiments or poor analysis. A rule of thumb among biotechnology venture-capitalists is that half of published research cannot be replicated. Even that may be optimistic…”

and eventually suggests a possible cure

“Ideally, research protocols should be registered in advance and monitored in virtual notebooks. This would curb the temptation to fiddle with the experiments design midstream so as to make the results look more substantial than they are. Where possible, trial data also should be open for other researchers to inspect and test.”

This sounds like another powerful message in support of reproducible research and a call for changes in the culture of scientific publications. In application to computational science, “virtual notebooks” are reproducible scripts that, in words of Jon Claerbout, “along with required data should be linked with the document itself”. The article ends with a call to science to fix itself

“Science still commands enormousif sometimes bemusedrespect. But its privileged status is founded on the capacity to be right most of the time and to correct its mistakes when it gets things wrong. And it is not as if the universe is short of genuine mysteries to keep generations of scientists hard at work. The false trails laid down by shoddy research are an unforgivable barrier to understanding.”

## Results may vary

Slide about the emerging reproducible research ecosystem from Carole Goble’s keynote presentation at a computational biology conference last month. The presentation summarizes the issues involved in reproducible research and the recent progress.

## Reasons not to share your code

In Top Ten Reasons To Not Share Your Code (and why you should anyway) published by SIAM News, Randy LeVeque, Professor of Applied Mathematics from the University of Washington, elegantly destroys common excuses computational scientists and applied mathematicians come up with when they refuse to share their software codes.

Today, most mathematicians find the idea of publishing a theorem without its proof laughable, even though many great mathematicians of the past apparently found it quite natural. Mathematics has since matured in healthy ways, and it seems inevitable that computational mathematics will follow a similar path, no matter how inconvenient it may seem. I sense growing concern among young people in particular about the way we’ve been doing things and the difficulty of understanding or building on earlier work […] We can all help our field mature by making the effort to share the code that supports our research.

As if to illustrate LeVeque’s point, major news outlets report on a story about a reproducibility error (Excel bug) discovered in the famous politically-influential paper by economists Reinhart and Rogoff:

The validity of the Reinhart-Rogoff assertion “once debt exceeds 90 percent of GDP, economic growth drops off sharply” continues to be debated by economists, but it is clear now that their data were flawed and that the error would have been discovered much easier if the publication had followed the reproducible-research discipline.

## US government gets serious about reproducible research

The debate on open science and reproducible research has reached Washington, DC.
On February 22, John Holdren (Assistant to the President for Science and Technology and Director of the White House Office of Science and Technology Policy) issued a Memorandum on Increasing Access to the Results of Federally Funded Scientific Research to the heads of all federal agencies that sponsor research and development projects. The memo states

Access to digital data sets resulting from federally funded research allows companies to focus resources and efforts on understanding and exploiting discoveries. For example, open weather data underpins the forecasting industry, and making genome sequences publicly available has spawned many biotechnology innovations. In addition, wider availability of peer reviewed publications and scientific data in digital formats will create innovative economic markets for services related to curation, preservation, analysis, and visualization. Policies that mobilize these publications and data for re-use through preservation and broader public access also maximize the impact and accountability of the Federal research investment. These policies will accelerate scientific breakthroughs and innovation, promote entrepreneurship, and enhance economic growth and job creation.

The memo obliges every agency to come up with a strategy for making both scientific publications and digital scientific data resulting from Federally funded research publicly available.

On March 5, the Subcommittee on Research of the US House Committee on Science, Space, and Technologys held a hearing on the issue of access to data from federally funded published research. In his opening statement, Dan Lipinski, a democratic U.S. Representative from Illinois, said:

..the more data are open, the faster we will validate new theories and overturn old ones, and the more efficiently we will transform new discoveries into innovations that will create jobs and make us healthier and more prosperous. The movement toward open data is not primarily about scientific integrity, its mostly about speeding up the process of scientific discovery and innovation.

Victoria Stodden, an Assistant Professor of Statistics at Columbia University and a famous advocate for reproducible research, testified:

Making research data and software conveniently available also has valuable corollary effects beyond validating the original associated published results. Other researchers can use them for new research, linking datasets and augmenting results in other areas, or applying the software and methods to new research applications. These powerful benefits will accelerate scientific discovery. Benefits can also accrue to private industry. Again, data and software availability permit business to apply these methods to their own research problems, link with their own datasets, and accelerate innovation and economic growth.

## Setting the default to reproducible

A well-attended workshop Reproducibility in Computational and Experimental Mathematics was hosted by ICERM in Providence, Rhode Island, on December 10-14, 2012, and continued the previous sequence of workshops and special sessions devoted to computational reproducibility.

The workshop participants developed a set of recommendations for changing the culture of computational research in favour of openness and reproducibility.

The main recommendations that emerged from the workshop discussions are:

1. It is important to promote a culture change that will integrate computational reproducibility into the research process.
2. Journals, funding agencies, and employers should support this culture change.
3. Reproducible research practices and the use of appropriate tools should be taught as standard operating procedure in relation to computational aspects of research.

Madagascar was presented among different tools available to aid in the effort. For more information, see the Huffington Post article by David Bailey and Jonathan Borwein, two of the workshop co-organizers.