Open code for open science?
Article discussed: Easterbrook, S. Open code for open science?. Nature Geosci 7, 779–781 (2014). https://doi.org/10.1038/ngeo2283 Shareable Link to full text
Paper Summary
Journal Club Discussion
Further Thoughts
- Alex Byrnes (ReproTea attendee) suggested discussing the implications of the Black Spatula Project for LLMs in research integrity checks. I think this is a good idea, and I hope to open a forum/comment feature on this blog to continue these conversations. More to come…
Attendees
Redacted Redacted, Christian Sodano, Redacted Redacted12
Footnotes
1 tl;dr \(\equiv\) too long; didn’t read↩︎
See “Paraphrased Points” for context. Easterbrook defines “repeatability” as the ability to “re-run the same code at a later time or on a different machine” and reproducibility as the ability “to recreate the results, whether by re-running the same code, or by writing a new program”↩︎
The mixing of R-words here can be quite confusing. I believe what Easterbrook means is that the confidence we should have in the robustness of an effect reported by one group should be higher if re-analyses of the same dataset using different analysis approaches (e.g. different in-house pipelines that generally are considered valid approaches) result in the same effect. This is the general concept of many/multi-analyst studies. This is not what I would put value on as a researcher. Rather, I’d prefer to see that effects which have a large impact on scientific discourse be shown foremostly to have code that is a reflection of best practice (according to the current best theories of how to analyze those data), error-free (passing a suite of test cases, possibly including forensic metascience tests), and capable of running on any machine. If experts in that field of study agree that this method of analysis is appropriate, and the code is shown to be a valid translation of that method, then I’d think it be better to define a “replication”/verification of that effect as “the same, previously shown-to-be-valid code produces results that continue to support the initially tested hypothesis when ran on new, preferably more generalizable, data”.↩︎
However, I think it is a serious mistake to assume the reason journals encourage code-posting is to stimulate the scientific software sharing ecosystem. It could be that librarians negotiating contracts with publishers begin to put pressure on code availability as a criterion they use to decide to subscribe or not, or that scientists begin to treat open-code journals as more credible, or that more highly cited scientists begin to opt to submit to journals that have code availability requirements (none of these reasons are mutually exclusive). A glaring omission in this article is that, from a research integrity perspective, the posting of code, regardless of how portable, configurable, readable, etc the code is, is a crucial first step to evaluating the integrity of the results reported. On multiple occasions I’ve tried to replicate a paper and found that the codebase posted includes an analysis step that is different than what is reported in the methods section of the paper.↩︎
In the future, I will post a blog about denial of service attacks. Given the scale which scientific-ecosystem crime organizations like papermills operate at and the policy implications of controlling a scientific narrative, it’s not so strange to think that this is a concern to prepare for. Already, research integrity staff at publishers spend a lot of their time tracing down, verifying, and barring papermill products from their journals–time they could spend doing reproducibility checks or expediting the investigation of a paper with inconsistencies noted on pubpeer↩︎
In this section I use the terms ‘repeatability’ and ‘reproducibility’ not as they are defined in the paper, but as we used them in discussion; namely, that “repeatable” code is code that a researcher can reproduce the output (figures, summary statistics, results of statistical tests) on their machine, and “reproducible” code is code which another researcher can use to reproduce the paper’s outputs on their (different) machine using the same data and source code files. Both terms would be considered “repeatability” by Easterbrook, but the distinction was important for explaining the work of a journal editor trying to reproduce the results before agreeing to publish the code submitted alongside a manuscript↩︎
According to the editor↩︎
When writing this post, I came across the term “Dependency Hell” which I think accurately describes this situation↩︎
I plan a future post describing the cases where unit tests can make or break an analysis script, if you’re unfamiliar with this concept stay tuned↩︎
There have been times where I have been able to reduce the runtime of a analysis script by tenfold merely by implementing parallelism↩︎
Silent errors are errors that do not throw an exception (In compiled languages, the program will be able to compile, or in interpreted languages, the code will continue to run even though the error has occured. This may mean, e.g., one step of your data analysis pipeline isn’t performed and you do not have any error message alerting you to that fact, resulting in a disconnect between your stated methods and the actual methods that produced your figures, statistical test results, etc.)↩︎
redacting until I know they are okay with me posting↩︎