Completed on 29 Nov 2016 by Jeffrey G. Reid. Sourced from http://biorxiv.org/content/early/2016/11/08/086470.
Login to endorse this review.
The authors offer a dissection of the impact of missense mutations on disease by focusing on genes with previously characterized Mendelian disease associations in genes affecting cellular organelles. In general, predicting variant impact is an incredibly important question, and one that takes on increasing relevance as sequencing is deployed ever more widely for drug target discovery and clinical genetics. Unfortunately, this study adds no new or useful insights to the problem at hand. Specifically:
1. It is disappointing that the authors provide so much data, but with so little insight into their results. The results section is far too long, and at times with so much numerical data and dryly descriptive text, it starts to read like it was automatically generated. I found it extremely difficult to keep track of what results were being reported in any given section or paragraph due to this disorienting abuse of language. To improve this section the authors must…
a. …focus on only the most meaningful measures of interest. In particular, the decision to use three different methods for conservation scoring is pointless, distracting, and makes the conservation results extremely confusing. No good justification for reporting all three scores is provided beyond ‘lack of consensus’, and to my eye, there is nothing to suggest any one is significantly better than the other, so pick one and stick with it.
b. …completely rewrite the case analysis sections. The overuse of the phrase “we also see” is emblematic of the problem in this section – namely that it simply reports observationally a list of facts with no real attempt to package them so that they can be digested or understood by the reader. Better visualizations (I found the existing figures to be unhelpful) that attempt to capture and highlight the most relevant and important results could replace and improve maybe as much as 7 pages of this text.
c. …only report a justifiable number of significant digits. Do we really need to know the 9th significant digit of the Shannon Entropy average?
d. …avoid reporting observations that are not necessary to support the discussion and conclusions. Reporting every possible measurement or observation obfuscates what is important or useful, and diminishes the impact of your findings by burying the signal in so much noise.
2. Restricting effect prediction comparisons to only SIFT and PolyPhen2 is short-sighted as there are a growing number of newer, better tools that have been published and widely adopted (e.g., CADD; http://cadd.gs.washington.edu/... ). The authors must address more thoroughly the relevant existing literature by comparing to a broader set of effect predictors. Without this kind of comparison it is hard to know how useful this analysis is, particularly in terms of understanding the value of predicted protein stability calculations.
3. The biggest problem with this work is its lack of novelty and insight, as it provides nothing over and above the current conventional wisdom. Base conservation as a proxy for variant impact at a site and recognition of the value of model organisms are not new. Observing that these insights hold true for small disease gene subsets adds nothing. For this data to be useful and impactful to the community it needs to provide much more that what is presented here.