The code is old, can it still run?

Nicolas Rougier needs a plate. It’s not a portable USB flash drive, or a CD-ROM—but a genuine floppy disk. Post-90s may not know that a floppy disk is a thin and soft disk, placed in a square shell. There is a hole in the middle of the shell and a corner is missing, which can store hundreds of K of data. In the 1983 Cold War movie “War Games”, high school hacker David Lightman used a floppy disk to hack into the school computer and changed his girlfriend’s biology class score to full marks.

After that, he hacked into the military network and almost triggered a global thermonuclear war. Rougier’s demand is not so exciting. He just wanted to upload a text file from his Mac desktop to an old antique computer-the 1977 Apple II. This is Apple’s first consumer product.

Rougier is a computational neuroscientist and programmer at the French National Institute of Information and Automation (INRIA). Passing this document is the last step of his own computational challenge: Ten Years Reproducibility Challenge ( In 2019, he and Konrad Hinsen, a theoretical biophysicist at the French National Center for Scientific Research (CNRS), jointly launched this challenge, requesting to find an old code and re-execute it, so as to deal with calculations that have been published for at least ten years. The paper is reproduced. The original plan was for participants to discuss their experiences at a seminar held in Bordeaux in June, but it was forced to postpone it due to COVID-19 (currently, it is tentatively postponed to June 2021).

Illustration of twins plan

Illustration of twins plan

Although computing is playing an increasingly critical role in science, scientific articles rarely contain code for computing, Rougier said. Even if it is included, it is difficult for others to execute it, and even the original author may encounter problems in execution after a period of time. Programming languages ​​are evolving, and so is the computer environment that runs the code. Code that runs smoothly today may have problems tomorrow.

In 2015, Rougier and Hinsen founded “ReScience C“. This journal publishes how researchers can replicate other people’s calculation methods based on original papers and self-written open source code. The reviewer then studies the code to confirm whether it works. But even in this idealized scenario—the author is willing to reproduce the code, the reviewer is proficient in computing, and the code is newly written—the entire process still has many difficulties.

The goal of the Ten Years Code Challenge is to “find out which techniques for writing and publishing code ten years ago are still available today,” Hinsen said. The challenge time is set at the time when Python 2 “exit” on January 1, 2020. This very popular language in science decided to end its support 20 years after its appearance. (Python 3, which appeared in 2008, is still under development, but the difference between the two is relatively large. Code written in one of them may not run in another environment.)

“In the world of software, ten years is a long, long, long time.” Victoria Stodden said. She studied the reproducibility of calculations at the University of Illinois at Urbana-Champaign. After making this thesis, she said that this challenge is essentially to encourage researchers to explore the limits of code reproduction, whether it can be reproduced in an “almost infinite time for the software world”.

There are 35 challengers in total. Of the 43 articles they proposed to reproduce, 28 of them submitted recurring reports. “ReScience C“Since the beginning of this year, they have published their work. The programming languages ​​used range from C and R to Mathematica and Pascal; what a challenger reproduces is not code, but a molecular model coded in System Biology Markup Language (SBML).

Although it is in the digital world, the challenger’s experience and real-world archaeology can borrow the past and the present to propose the best strategy for reproducing the code in the future. One thing in common is that scientists must optimize the documentation if they want to reproduce the code. “In 2002, I thought I could remember everything for a lifetime,” said Karl Broman, a biostatistician at the University of Wisconsin-Madison. “I realized later that I would forget it in a month.”

Reproduce scientific research

Rougier’s entry reproduces the oldest code in the entire challenge[1], He wrote an image amplifier for Apple II when he was 16 years old. The article was published in a French amateur magazine “Micro Springboard“(The oldest scientific code in the challenge is a Pascal program that maps water quality data 28 years ago, and will be published later in “ReScience C“Published on). Thirty-two years later, Rougier can’t remember how the code works. It still uses AppleSoft BASIC code that looks like a spell—”It’s strange, I wrote it myself after all.” But he successfully found this code on the Internet and successfully ran it with the web version of the Apple II simulator. This step is relatively simple, he said, running on the actual Apple II is the real difficulty.

The hardware is not a problem-Rougier has an Apple II in his office, which his colleague picked up when he cleaned the office. “Young people will ask’what is this?'” he said, “and then you have to explain’this is a computer’. Old people will say’Oh, I have an impression of this machine’ when they see it.” But because of the Apple II Earlier than USB and the Internet—and modern computers can’t connect directly to old hard drives—Rougier needed some homemade hardware, plus a box of old floppy disks, to allow the computer to read the code. He found these things on Amazon, a “new” product made in 1993. After writing three times to ensure that all bits are stable, he confirmed that these floppy disks are available.

Bruno Levy, a computer scientist at the INRIA Research Center, reviewed the content written by Rougier. Levy also has an Apple II and also posted a short video on Twitter. After the old keyboard made a “click”, he called the code and successfully executed it, slowly displaying a line of pure green “We reproduce scientific research!”.

Outdated hardware, dead language

When Charles Robert, a biophysical chemist at the French National Center for Scientific Research, heard about this challenge, he decided to use this opportunity to review a research topic that he hadn’t thought about for many years. “This challenge gave me a kick, let me work hard in that direction,” he said.

In 1995, Robert used a computer notebook running the commercial software Mathematica to model the three-dimensional structure of eukaryotic chromosomes. Robert has Mathematica on his MacBook, but for fun, he bought a Raspberry Pi for 100 Euros (approximately 800 RMB), which is a single-chip computer for hobbyists, with a Linux system installed on it and Mathematica 12 pre-installed.

There is basically no problem when Robert runs the code, but it exposes the difficulties that may be caused by computing notebook[2], Such as lack of code structure, and code segments may also be executed out of order. Today, Robert has broken the code into modules and wrote code tests. He also used version control to track code changes and recorded which version of the software produced what results. “When I read old code, I occasionally get goose bumps, and then I think about how I can do better now,” he said. “However, I also feel that the whole process allows me to review some of the knowledge I have learned since then. “

Robert who successfully completed the challenge is not an isolated case: only 2 of the 13 reproduction papers published so far have failed.One of them was written by Hinsen. The tape he used to systematically store code in the early 1990s caused him to stumble.[3]. “This is the end of making a backup but failing to check whether the backup can be read ten years later.” He said, “Before you had this good set of tapes and backups, but now there is no reading device.” Hinsen also published a successfully completed article[4]) Other participants who failed to complete the challenge were attributed to lack of time, especially under the epidemic.

Another common problem encountered by challengers is the outdated computing environment. Sabino Maggi, a computational physicist now working at the Institute of Air Pollution of the Italian National Research Council, used the programming language Fortran to model a superconducting device called the Josephson junction, and processed the results with Microsoft Visual Basic. After that, Fortran has not changed much, so Maggi only fine-tuned a few and successfully compiled the code. Visual Basic caused more trouble.

“Visual Basic,” Maggi in the article[5]Li wrote, “It’s a dead language that has been replaced by Visual Basic.NET for a long time, and only the name between the two is the same.” In order to run the code, he had to refactor it on a Mac laptop a decade ago Windows virtual machine. He installed Microsoft DOS 6.22 and Windows 3.11 (both software around 1994) and Visual Basic using the installation disk found on the Internet. “Even if it is software from a long time ago, using an emulator to install copyrighted software may still have legal issues.” Maggi admitted. However, because he had a legal certificate when he was doing scientific research, he said he felt “at least morally qualified” to use it.

But which version of Visual Basic should I use? Microsoft has released several versions of Visual Basic in a few years, and not all of them are forward compatible. Maggi can’t remember which version he used in 1996, and a water leak in the basement destroyed his early notebook for recording these details. “I have to start from the beginning.” He said.

Run the 1994 Windows emulator on the Mac to run Microsoft Visual Basic. Source: Sabino Maggi

Run the 1994 Windows emulator on the Mac to run Microsoft Visual Basic. Source: Sabino Maggi

INRIA research engineer Ludovic Courtès reproduced a 2006 study, the content is to compare different data compression strategies, the code is in C language[6]written. But the application programming interface (API) used by the programmer has changed, so his program cannot be compiled with modern software libraries. “Everything is evolving-of course, except for the software used in the paper.” He said. In the end, he had to roll back five or six libraries to the old version-what he called “a chain reaction of downgrades.” “This pit is a bit deep,” he said.

Today, researchers can use Docker containers[7]And Conda virtual environment[8]Packaging the computing environment for reuse. But several challengers chose another way. Courtes said this “probably represents the’gold standard’ for reproducing scientific papers”: a Linux package management system called Guix. It guarantees that the environment can fully reproduce every bit, and is completely transparent to the version when the code is linked. “The entire environment, in fact, the entire paper can be viewed and linked from the source code,” he said. Hinsen called it “probably the best thing for reproducing scientific research so far.”

Need documentation

Roberto DiCosmo, a computer scientist at INRIA and the University of Paris, tried to reproduce[9]In his paper, he posed another common problem for challengers: finding out where he put the code. DiCosmo challenged a paper in 1998, which described a parallel programming system called OcamlP3l. He searched all over the hard drive and backups, and asked his collaborators in 1998 to do the same, but found nothing. Then he searched for a service, Software Heritage, which he established in 2015. “Found it, incredible.” He said.

Software Heritage regularly crawls code-sharing sites such as Github, and backs up source code just like Internet Archive backup web pages. Developers can also request the service to back up their own libraries, and the challenge rules also require challengers to do so: DiCosmo did not search Software Heritage at the beginning, because Software Heritage did not appear when he developed OcamlP3l. However, I don’t know who sent his code to a library called Gitorious. Gitorious has now disappeared, but it was backed up by Software Heritage before that, and the above OcamlP3l was also included.

Of course, finding the code does not mean knowing how to use it.For example, Broman’s article mentioned that he was reproducing a 2003 paper[10]It took a lot of effort to figure out which code to run due to the lack of documentation and “weird” file structure. “In the end, I had to work hard to read the original paper.” He wrote.

“It’s not uncommon for documents to be longer than code (in a well-structured program),” said Karthik Ram, who focuses on computational reproducibility at the University of California, Berkeley. “With sufficient detailed documentation, we can describe it more broadly. Analysis methods, data sources, data and code metadata are all critical.”

Melanie Stefan, a neuroscientist at the University of Edinburgh, used this challenge to evaluate the reproducibility of the computational model she wrote in SBML. Although the code is easy to find, she can’t find the parameters used before (such as molecular concentration). The key details of data normalization are also not recorded in detail. As a result, Stefan was unable to reproduce part of the research. “What was almost obvious when you were doing research is not so obvious anymore—for you in 10-12 years. Who can think of it!” She laughed at herself.

Reproducible spectrum

Stefan’s experience drove her to set the regulations on the document for the laboratory-for example, the model must be accompanied by such a description: “If you want to reproduce Figure 5, you need to follow the steps below.”

But writing these resources takes time, Stodden said. Clean up the code and add documentation, write tests, organize data sets, and reproduce the computing environment-“These workloads have not yielded results.” Researchers have little incentive to do these things, she added, and there is no consensus in the scientific community about what reproducible papers should look like. To further complicate the problem is that computing systems continue to evolve, so it is difficult to predict which strategies will always be effective.

Reproducibility is a spectrum, says Carole Goble, a computer scientist at the University of Manchester who studies reproducibility. From scientists reproducing their own research, to peer reviewers trying out the code to prove its effectiveness, to researchers applying published algorithms to new data. Similarly, what the researcher does to ensure reproducibility can also be made into a spectrum (see “Reproducibility Checklist” below), but this table may be very long. Goble said, release the source code so that at least others can browse it and rewrite it as needed in the future-Goble calls it “a means of reproducing code reading.” “Software is alive,” she said, “and things that are alive will eventually decay, so they need constant repairs and eventually have to be replaced.”

Reproducibility checklist

Although the following methods cannot guarantee 100% calculation reproducibility, they can increase the success rate.

Code -If your calculation process is clicked on the graphical interface, such as Excel, it is not reproducible. Write your calculations and data manipulations into code.

Documentation -Use comments, calculation notebooks, and README files to explain how the program works, and define the expected parameters and the required computing environment.

recording -Record key parameters, such as the seed of the random number generator. This type of record can be used to reproduce code, find bugs and track unexpected results.

test -Write a set of test functions. Use positive and negative control group data sets to ensure that you can get the expected results, and run these tests continuously during the development process to find out immediately when programming errors occur.

guide -Write a main script (such as file) to download the required data sets and variables, execute the calculation process and provide an obvious entry point for your code.

Archive -GitHub is a popular but non-permanent online code repository. Use archiving services such as Zenodo, Figshare and Software Heritage to ensure long-term stability.

track -Use version control tools like Git to record project history. Record which version produced each result.

Bale -Use containerized tools (such as Docker and Singularity), online services (Code Ocean, Gigantum, Binder), or virtual environment manager (Conda) to set up a computing environment that can be used immediately.

automation -Use continuous integration services (such as Travis CI) to test code automatically, regularly, and in various computing environments.

simplify -Avoid third-party code libraries that are rare or difficult to install to simplify the difficulty of reusing code.

verification -Run your code in different computing environments to confirm its portability.

An unintuitive fact is that many challengers find that code written in older languages ​​is easier to reuse. The APIs of new languages ​​are updated frequently, and the third-party libraries they depend on make the code more vulnerable to corruption. In this sense, the retirement of Python 2.7 at the beginning of this year provided an opportunity for scientists, Rougier and Hinsen said. Python 2.7 “let us have a high-level programming language that is guaranteed not to change.” Rougier wrote[1]。

No matter what programming language and reproducible strategy the researcher uses, it is wise to actually verify it again, says Anna Krystalli, a research software engineer at the University of Sheffield. Krystalli is responsible for holding a seminar called ReproHacks, which allows researchers to submit published papers, codes and data, and then ask other participants to reproduce the results. She said that in most cases it is impossible to reproduce: the author failed to provide some key details that they seemed obvious but others did not know. “No matter what we are doing, if we don’t actually use it, it’s impossible to know if it can be reproduced if we fiddle with it.” Krystalli said, “In fact, it’s much harder than people think.”