Completed on 5 May 2016
Login to endorse this review.
The manuscript by Chen et al describes a novel tool that can characterize ploidy, identify sCNAs in aneuploid, low cellularity tumors and tumor-normal admixtures with as low as 10% tumor purity. The biology of such tumors is indeed, difficult to understand, and requires a combinatorial approach to ascertain the validity of discovered variants. The novelty in their approach lies in what they term as 'pre-phasing heterozygote germline SNPs' to estimate a parental-haplotype frequency instead of the the commonly used 'B-allele frequency'.
However, the manuscript is not ready to be published until the atuhros address the following major concerns and fully. The authors must do additional analyses and not just explain to address the following concerns.
1. The authors claim their method to be very sensitive (lines 55-58, Page 5). How does their approach avoid picking up false positives or even simple sequencing artifacts? Can they show that with further analyses?
We analysed the '0%' tumour mixtures in more detail to demonstrate robustness of the method to sequencing and mapping artifacts. In more detail, we include the following at page 13 line 379-388.
"To test the tolerance of the methods to sequencing artifacts, we applied the tools to a matched tumor normal pair in which the tumor was an independent normal sample (i.e. a 0% tumor sample). In this case, the estimated tumor DNA purity should be 0%, and any sCNAs predicted are due to noise alone. For all four 0% tumour samples, sCNAphase identified < 0.1% tumor purity (Figure 4C; Supplementary Table 3). It did, however, identify spurious sCNAs present at this level of purity. On this basis, we can recommend that the sCNAphase segmentation should be disregarded if the estimated tumor cellularity is less than 1%. ASCAT and CLImAT reported tumor purity estimates of above 20% for these 0% tumor mixtures, which indicates that the copy number segmentation of these tools may be only reliable down to 40% purity (Figure 4C)."
We also added the following to discussion to explain why sCNAphase is robust to mapping artefacts (Page 17, line 519-523).
"Equally importantly, samples without any tumor DNA were predicted to have < 0.1% tumor cellularity by sCNAphase, whereas both of the other approaches tested reported at least 20% tumor DNA. sCNAphase has this in-built robustness to sequencing and mapping artifacts because it models the observed regional tumor depth data (including total depth and haplotype-specific depth), conditional on normal depth data from the same region."
2. Can the authors comment on how their approach would perform on tumors with varying ranges of heterogeneity within themselves?
We have not fully investigated the impact of heterogeneity as we feel it’s out of the scope of this paper. However we do comment on this in the discussion at Page 17, line 524-527.
"To focus on characterizing the copy number profile of low purity tumors, we have made the simplifying assumption that there is a dominant tumor clone with low heterogeneity in the tumor biopsy. Regions with heterogeneous copy number between clones with similar abundance would lead to a failure of the merging test statistic, and thus these regions would likely be excluded."
3. The authors need to explain the low specificity in the performance of sCNAphase in detecting the focal sCNAS in COSMIC (Table 3)?
We have further investigated the low specificity relative to COSMIC array based annotation and confirmed that it is due, at least in part to the fact that the sequence data provides a better discrimination of high copy number, and so many of the regions annotated by sCNAphase as a focal amplification (according to our definition of a copy number at least twice the ploidy, and a length between 100kb and 4Mb), are amplifications, but do not have a copy number (according to COSMIC) of greater than twice the ploidy. We include the following paragraph at Page 16, line 467-474 as well as a new Supplementary table S6).
"We hypothesized that one reason for a low specificity (on average, only 41% and 31% of focal amplifications detected by sCNAphase and CLImAT respectively were validated by COSMIC) could be that COSMIC underestimates copy number of highly duplicated regions (due to fluorescence signal saturation) and so the 2 * ploidy threshold for declaring a focal amplification is not reached. To test this, we re-calculated specificity and sensitivity after increasing the sCNAphase detection threshold, but keeping the COSMIC threshold (Supplementary Table S6). As the sCNAphase threshold increased, the specificities almost doubled across all tumor purities, with a much smaller effect on sensitivity."
1. It is not clear from the definition and usage of the terms, tumor purity and tumor cellularity, how is one different from another, or why are they used interchangeably at times, and together at times.
We would like to apologise for any confusion, and have made a substantial effort to make these terms clearer. We have included the following sentence in the introduction (Page 3, line 49-54):
"A typical tumor biopsy will contain both tumor cells as well as cells with a normal, diploid genome. This can be quantified via the cellularity (the proportion of tumor cells in this mixture) or via the tumor DNA purity (the proportion of tumour DNA in the mixture of normal and tumor DNA). Tumour purity is a function of both the cellularity and the tumor ploidy (which we define as the average copy number of the tumor) - for example a 50% cellularity tetraploid tumour and will have a 66% tumor purity. "
A subsection in method (Page 6, line 170-178) further explained how to calculated one measurement to another using Equation (5), given the tumor ploidy.
“The percentage of tumor content can be measured in two different scales, 1) tumor cellularity (tc) defined as the percentage of tumor cells or 2) tumor purity (tp) defined as the percentages of tumor DNA in mixtures of tumor and normal cells. These two quantities are related via the tumor ploidy, which we define as the average copy number of the tumor over all windows”
2. The basis for the ploidy assumption for the cell lines for calculating sCNAs from the microarray validation analyses is not explained.
We compared the ploidy estimates from SKY data, flow cytometry and COSMI. We found the ploidy estimates were consistent For example. HCC1187 was seen as hypo-triploid from SKY, and estimated as 2.64 from COSMIC. We then rounded this number to the closest integer.
This information is updated onto the manuscript on Page 14, line 397-401.
“To compare the annotations of these cell-lines (Table 2), the base ploidy for each cell-line were determined, by rounding up the ploidy estimates from SKY, flow cytometry and PICNIC to integers and taking the consensus values. In this process, hyper or hypo-teraploidy was round to tetraploidy; hypo-triploidy to triploidy. On this basis, HCC1187 was considered as triploid and all other cell-lines were treated as tetraploids.”