Completed on 23 Aug 2016
Login to endorse this review.
The manuscript by Chen et al is significantly revised in the light of the comments made by my earlier review and one other reviewer. Nevertheless, the response to the following concern continues to remain less than satisfactory, the manuscript is not ready to be accepted. Can the authors further adequately address the following concern?
The authors claim their method to be very sensitive (lines 55-58, Page 5). How does their approach avoid picking up false positives or even simple sequencing artifacts? Can they show that with further analyses?
The authors implement the sCNAPhase algorithm while posing a normal sample as a tumor sample, and while they detect tumor purity to be < 0.1% in this scenario, they also detect a lot of spurious sCNA peaks. The authors' recommendation here is to discard any sCNAs observed for <1% tumor cellularity. However, they do not seem to address the fact that they do observe spurious peaks, and hence there is a certain likelihood of such spurious peaks appearing at other (all) ranges of tumor purity, as well. In such a scenario, how does one, therefore, rely on the approach itself, that has a tendency to be over-sensitive and hence detect more than necessary sCNAs?
I recommend that a basal noise correction module be modeled from the spurious peaks observed in the <1% tumor cellularity scenario and built in the sCNAPhase algorithm in order to make it sensitive and specific at the same time.
We apologise for not addressing this question adequately in our original resubmission. A low tumour purity paramater has the effect of 'zooming in' on the data and thus magnifying the effect of small changes in allele frequency. Thus, if the model correctly identifies a very low tumour proportion, it has the effect of magnifying small fluctuations as potential copy number signals. The other methods we tested also suffer a drop in specificity at this boundary of detectable cellularity, with the difference that the boundary for these tools is much higher. We provide sensitivity and specificity results for higher purity and we show that the specificity remains high down to 10% tumour purity.
However, we do agree with the reviewer that the threshold of cellularity of 1% is somewhat arbitrary, and this is unsatisfying. In order to provide a robust way of avoiding spurious sCNA calls without an arbitrary purity threshold, we show in the current version how we can use significance testing to identify the presence of detectable tumour DNA. In particular, we show that all the 0% mixtures have no detectable tumour DNA, whereas all of the 10% and 3 of 4 of 5% mixtures have detectable tumour DNA. In this way, investigators who use our tool will be able to first assess the presence of tumour DNA, and then run sCNAPhase for sCNA segmentation, confident the detected sCNA have a low false positive rate.
In sCNAphase workflow (line 248-252), we add following lines:
“By default, sCNAphase uses Equation 4 to calculate the distribution of p-values for the observed parental haplotype counts under a null model. This is used to infer the presence of tumor DNA. If the distribution of p-values matches the expected distribution in the QQ plot, then running sCNAphase is not recommended as there is not a strong enough signal in the data to infer the copy number segmentation. ”
emphasized the tumor-presence test routinely runs before the copy number profiling. This model-free test is able to recognize no or very low tumor content, yet still sensitive enough to recognize the 5% or 10% tumor content, as shown in the new Figure (Supplementary Figure S6), described in Line 388-394
“It did, however, identify spurious sCNAs present at this level of purity. On this basis, we can recommend that the sCNAphase segmentation should be disregarded if the estimated tumor cellularity is less than 1%. In order to avoid inference of spurious sCNA, sCNAphase also performs an initial significance test that infers whether there is detectable tumor DNA in the mixture (see Methods). The output QQ-plots for the four cell-lines at 0% showed no inflation of the test statistics, indicating no detectable tumor DNA, whereas substantial inflation of the test statistic was observed for 10% mixture samples and most of the 5% samples (Supplementary Figure S6).”