On How Computers Did (or Didn't) Break Science

By John Pellman · Jun 14, 2020

Recently I came across an article (How computers broke science – and what we can do to fix it by Ben Marwick) that argues that electronic computers are breaking science. Namely, computers are blamed for:

Making data processing methods more opaque and converting processes that were formerly transparent into black boxes.
Being too versatile, and thus complicating methods reporting in journal articles. Furthermore, making it so that making results reproducible now involves documenting both your software and data management efforts.

While the article contains many positions that I agree with to varying degrees, such as increased sharing of data, the use of open source toolkits, the use of open formats, and a shift away from exclusively point-and-click applications, I find that the central premise- that computers in and of themselves have somehow disrupted the scientific process- to be poorly supported. I do think that there is an element of truth to what the author says in this regard, and I think that by and large we are in agreement on finer matters, but I think that his central position lacks precision.

Causes of Computational Reproducibility Issues

From my perspective as a technical professional with a different mindset and different background from many researchers, I regard problems with computational reproducibility as originating from two separate causes:

Nondeterministic algorithms, which do not necessarily produce the same outputs given the same input across multiple runs.
Human factors issues.

The article in question does not address mathematical causes of irreproducibility, and instead focuses on the human factors front. From this vantage point, there are two possibilities for why computational reproducibility might be challenging:

That scientists are misusing digital tools because computational illiteracy is reasonably widespread in the sciences, and the publish or perish incentive structure of the modern academy does not reward scientists who take the time to properly understand computation and its role in bringing about their study’s conclusions.
That there is a fundamental issue with how computing interfaces are designed for scientists, and that this leads them to perform actions that are maladaptive. Computational reproducibility is fundamentally hampered by poor user interface design decisions rather than by the researchers themselves. Scientific computation would benefit from more of what Don Norman has called user-centered design.

It is my belief that both possibilities are responsible for issues of computational reproducibility in varying proportions. The remedy for the former possibility is more education, and it is for this reason that efforts such as Software Carpentry exist. In an ideal world, education about scientific computing would begin even earlier at the undergraduate level, since computing is becoming essential to all areas of research, and would be better learned before the competing burden of needing to publish scholarly articles comes into play.

The latter possibility is addressed by efforts such as Jupyter Notebooks, Galaxy, brainlife, and NeuroCaaS, which simplify computing by abstracting away elements of general-purpose computing that are irrelevant to science while keeping elements that fit within a researcher’s cognitive schema / understanding of the world. Jupyter, for instance, uses a notebook analogy, similar to how a researcher might make notes in a literal notebook while performing bench work. The other tools perform specific tasks, in well-defined pipelines, with fixed inputs and outputs deliberately constraining the problem space / software elements that a researcher must manage while increasing the consistency of research outputs. When running the mriqc pipeline on brainlife, for instance, functionality is restricted to a clear and obvious goal- ensuring that data quality is acceptable. While to some extent these are black boxes, they are also based upon incredibly transparent components that can be audited if need be- it is for this reason that I must clarify that I am not wholesale against the use of point and click applications, as long as such applications are built upon versatile and reasonably transparent components.

As a brief disclaimer, it is also worth pointing out that, at the time that the article I’m responding to was written, Jupyter Notebooks were relatively new and not as established as they are today, although other notebook interfaces such as MATLAB and Mathematica were (but were not yet web-based).

Human factors issues related to data management in particular are also being partially addressed by metadata standards such as BIDS, NIDM, EML, and CF Conventions. These standards encourage reproducibility by decreasing the number of possibilities that files on a filesystem can be organized, constraining researchers with a default set of good data management practices. Efforts such as Datajoint go even further, encouraging researchers to manage data within structured database tables. In the long-term, I believe that the data science world will come to influence data management practices within science positively, and that most analyses will be performed on data stored within highly structured databases in a transactional manner instead of on files directly, while files enriched with metadata schemas will come to be used as intermediate, portable representations of datasets that can be imported into databases via various connectors. Phrased differently, a structured database abstraction will force researchers to keep their data and its provenance organized through restricting the number of operations that can be performed on them, much like how photo organizer programs such as Apple Photos or digiKam tame the chaos of managing one’s own personal photos.

Are Computers Fundamentally Different from Other Instrumentation?

The author of How Computers Broke Science also cites a claim by Victoria Stodden that a computer is fundamentally different from other pieces of scientific instrumentation in his article. I am slightly skeptical of this claim as it stands in the modern world, in no small part because many pieces of modern instrumentation themselves contain full-blown onboard computers (similar to Raspberry Pis) that perform a portion of data processing to produce the “raw” data.

Phrased differently, unless you’re using analog instrumentation, your microscope almost certainly is running a full-fledged copy of Linux, a BSD, Windows, or Minix to ensure that the output you receive is encoded in a digital format. In fMRI research, MRI scanners don’t even use onboard PCs, instead using whole workstations that perform some rudimentary image processing steps as part of data acquisition (i.e., k-space transformations) that often go unreported and are typically forgotten about.

Even outside the realm of science, printers, ATMs, and New York City subway kiosks have been running complete copies of Windows for years. Even components within a computer, such as hard disk controllers or CPUs, have become themselves computers running their own operating systems. In 2017, it was even revealed that a large number of modern Intel processors have been secretly running the Minix operating system.

If the author is to critique computers for being too opaque, he cannot claim that most modern instrumentation is somehow fundamentally less opaque, since modern instruments are by-and-large application-specific digital computers in reality. In fact, such application-specific computers are arguably even more opaque than your average general-purpose computer, since the processes that they use to transform data into a “raw” format during data acquisition are often proprietary, undocumented by the instrument manufacturer, or both.

Beyond considerations of the opaqueness of on-board computers used by digital instrumentation, it’s important to note that even analog instrumentation performs processing steps upon data as it is acquired that are not too dissimilar from the processing done by digital instruments. Most methods reporting sections do not delve into the engineering details of such instrumentation and stop at mentioning the make and model of a particular data acquisition device, effectively making most instrumentation just as much of a black box as a digital computer.

Methods Reporting and Rosy Retrospection

In the article, the author claims that “For most of the history of science, researchers have reported their methods in a way that enabled independent reproduction of their results.” I suspect that this characterization of pre-electronic computer methods reporting exhibiting higher transparency is an example of rosy retrospection. To back up this hunch, I’d like to explore a few historical instances of methods reporting: the modeling of the action potential in Hodgkin and Huxley (1952), irreplicable findings on the chemical basis of schizophrenia in Heath et al (1958), and the (probable) discovery of the Red Spot storm on Jupiter by Hooke (1667).

Firstly, how transparent was the methods section of Hodgkin and Huxley? While it was transparent in giving the necessary formulae to reproduce its results (these formulae would be analogous to analysis source code or a Jupyter notebook today), nowhere in the article does it indicate how these formulae were applied. The most popular contemporary means of calculating results for articles were human computers, mechanical calculators, and vacuum tube computers such as Cambridge University’s EDSAC. Hodgkin and Huxley says nothing about these methods or the pitfalls and potential for error that they introduce; all of these methods were treated interchangeably and either the inclusion of the specific calculating method was regarded as superfluous or it never even occurred to Hodgkin and Huxley to include this detail. By modern standards of methods reporting, which often require that you report your computer’s CPU and the software used at a minimum, the Hodgkin and Huxley study is almost certainly more opaque. Eventually, it came out in 1992 that Hodgkin and Huxley had wanted to use EDSAC, but were forced to use a Brunsviga instead due to an extended maintenance window on EDSAC (see Schwiening (2012)).

Heath et al is a very dramatic historical case of poor transparency in methods reporting. This psychiatric study concluded that a chemical substance called taraxein was a direct cause of schizophrenic behavior. It was never replicated due to deliberately vague methods reporting by a coauthor (Matthew Cohen). As Matthew Cobb explains in his book The Idea of the Brain (2020):

[Matthew Cohen] had deliberately withheld key parts of the relevant protocol from their scientific publications, rendering their work impossible to replicate. Cohen was in fact a fraud with no scientific training; he was a gangster on the run and had kept part of the taraxein technique secret as an insurance policy in case of discovery.

Lastly, we have Hooke (1667), one of the first published articles in the first recognized journal (The Philosophical Transactions). Hooke (1667) is short enough to quote in full:

The Ingenious Mr. Hook did, some months since, intimate to a friend of his, that he had, with an excellent twelve foot Telescope, observed, some days before he then spoke of it (viz. on the ninth of May, 1664. about 9 of the clock at night) a small Spot in the biggest of the 3 obscurer Belts of Jupiter, and that, observing it from time to time, he found, that within 2. hours after, the said Spot had moved from East to West, about half the length of the Diameter of Jupiter.

This article, while brief (it is, in fact, shorter than most modern abstracts) is essentially an observation with precise details about the instrumentation used in the observation and fairly imprecise measurements (“about half”). There is no obvious computation, analysis or statistical testing. In spite of this, there is certainly computation occurring; it’s just occurring within Hooke’s brain as he identifies an object and tracks it with his eyes. This is no different from the modern methodology of most modern pseudoscientific UFOlogists, whose main motto appears to be “seeing is believing” as they stare at objects in the sky. Even with the precision reported, which likely wasn’t even reported by Hooke himself (Henry Oldenburg wrote most of the articles in Philosophical Transactions on behalf of other researchers), key details about instrumentation, such as the nature of the lenses (who manufactured them, lens thickness, what combination of different lens types were used, etc) are omitted. Because of the vagueness of this article, it has long been debated whether Hooke or Cassini discovered the Great Red Spot on Jupiter (Falorni (1987)). Creating an authentic reproduction of Hooke’s analysis (an effort more likely to be undertaken by a museum than a scientist) is thus somewhat difficult because of Hooke (1667)’s brevity and lack of clarity.

In Brief

I do not think that it can be said that computers, in and of themselves, have broken science fundamentally. Rather, I think that science tends to break down whenever there is a lack of focus or absence of constraints, as occurs any time there is a paradigm shift that disrupts what Thomas Kuhn has called “normal science”.
Computational reproducibility can be improved, not only by education, but also by re-engineering general-purpose computers into domain-specific computers through better interfaces.
Scientific instrumentation is often not as transparent or straightforward as one would think.
When discussing issues of reproducibility, it’s important not to romanticize the past. If we apply modern standards of academic publishing and methods reporting against previous decades, we will find that most articles fall woefully short of our contemporary expectations.