Open preprint reviews by Karl Broman

Tools and techniques for computational reproducibility

Stephen R Piccolo, Adam B Lee, Michael B Frampton

Review posted on 22nd July 2015

I had a somewhat less-positive take on this manuscript. Here's my review:

The authors describe a number of tools for dealing with the issue of computational reproducibility (that the software and data underlying a scientific research project are organized in a way that the results may be re-derived by others). They hit the important tools, but I wonder a bit about the target audience. I suspect that the less computationally-savvy reader is not likely to get much sense of what's important or really what any of this stuff means.

1. Virtual machines and containers are pretty high-powered, and they're a bit much when one's analysis scripts aren't well organized or documented. Why not focus on some more pragmatic, immediately applied techniques, like organization & naming stuff well?

2. I think the figures aren't terribly helpful. Better would be to include some examples of the earlier tools. In particular, inclusion of a Makefile, an iPython notebook, and some sort of knitr document (for example, in R Markdown) would really help.

3. "The practices described in this review are accessible to all scientists and can be implemented with a modest extra effort." (lines 474-5, pg 24) This is rather optimistic. I think Docker is definitely not yet in the "modest effort" category, and switching from Excel to python and iPython notebooks is not an easy transition.

4. The authors don't explain the issues very fully. My colleagues that resist concerted efforts towards reproducibility would argue, "what we're doing is fine," and I don't think this paper will be of much help.

5. To me, GNU Make and workflow software (such as galaxy) serve much the same purpose. Rather than include Make in the "scripts" section, I'd put it with workflows.

show less