Completed on 5 Jun 2018 by Rachael Huntley.
Login to endorse this review.
The manuscript by Jacobson et al. describes the GOTrack web interface that can be used to monitor the changes in Gene Ontology and its associated annotations over time.
This is a novel method that potentially will assist users of the GO and its annotations to interpret their observations and analyses of gene product datasets. The paper is well written and the standard of English is good. As expected from these authors, a good knowledge of the GO and how it is developed and curated is demonstrated.
The authors highlight the importance of recognising that GO and its associated annotations are dynamic and therefore any analysis is likely to change over time, a concept which is sadly lacking in many papers that use GO for interpreting results. Additionally, the authors explain that GOTrack can be used, in combination with annotation dataset reports from UniProt-GOA, to focus in on the possible reason for these changes. Therefore, this paper should go a long way to help further educate the users of GO.
It has previously been suggested that, when performing enrichment analysis, researchers should look for key, relevant GO terms that have been enriched for their dataset in more than one analysis tool, due to the large variability between tool outputs (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3235096/), therefore the ability to compare a dataset over time using GOTrack is a very useful addition for assisting researchers to better interpret their datasets in an informed manner.
The authors describe the various uses of GOTrack with appropriate examples that are easy to understand and well illustrated with figures. They compare GOTrack with previous evaluations of enrichment result stability and describe the measures they have taken to improve on the earlier studies. They also assist the reader with traceability of datasets by providing the sources of all files they have used.
I have only a few minor comments.
1. GOTrack currently only enables tracking changes of annotations for protein-coding genes, could the authors indicate whether they have considered including the newer GO annotation datasets that have been made available, i.e. macromolecular complexes and non-coding RNAs? Although there are significantly fewer annotations for these entities, I am aware of no other tools that provide direct enrichment analysis for these gene products, so this would be a very useful addition to consider.
2. In Supp. Fig. 3A the authors show a comparison of 4 editions, but on the website it explains only how to compare two. I discovered by trial and error that you can add more editions by using
3. Page 9 Line 21: remove 1st "the" from "because the we get the same result"
4. Page 10 Line 4: "Notably, annotations coming from the Reference Genome Project (The Reference Genome Group of the Gene Ontology Consortium, 2009) are not identified so we were unable to establish any specific impact this may have had on the events of early 2012 (Figure 2C)"
The Reference Genome Project annotations can be identified either by a Panther ID in the "With/From" field or by a PAINT_REF in the "Reference" field of the GO annotation files.
5. Page 10 Line 7: Figure 2C should be Figure 3C.
6. Under "Global Trends" on the website, the Size of Gene Ontology graph appears to include obsolete terms (47204 in June 2018, compared with the total count of GO terms on the AmiGO website; currently 44947 non-obsolete). I was expecting to see a count of active terms, therefore it should be clarified what this graph is showing.