In this months Nature Communications Journal, Agnieszka Witkiewicz et al., published the first whole-exome sequencing of pancreatic cancer. This was a true tour-de-force for pancreatic ductal adenocarcinoma (PDA). Normally, PDA can be difficult to dissect cleanly, i.e., biopsies from which DNA will be obtained will have many different cell types beyond tumor cells, e.g., inflammatory or stromal cells. In this paper, the authors used 109 micro-dissected tissues resulting in much cleaner samples from which to do exome-sequencing. While there appears to be histological evidence of a cleaner sample, the authors state they believe it is a cleaner sample because they detected more mutations than previous PDA studies. They also detect similar numbers of mutations in comparison to other solid tumors. I’m not sure I buy this argument because it implies, if you do not detect more mutations, the study or sample is not valid. I think this logic sets a stage for dishonesty.

But, alas, the additional mutations beyond previous studies detected do indeed make biological sense. The primary samples were sequenced to a depth of 50x, and then a smaller subset was sequenced to a depth of 120x in an effort to sequence even more mutations. While the extra depth in sequencing did detect additional mutations, those mutations were found to have low allelic frequency.

Organizing the data to determine the overall mutational landscape of these samples yielded the following conclusions;

Consistent with other solid tumors, mismatch repair genes had the highest relative mutational burden with a T > C transition at CTG tri-nucleotides.
C > T mutations are associated with age,
C > A transversions were associated with a mutational ‘smoking signature’ and in common with other cancers where smoking is a significant risk factor.

Figure 1 outlines these findings, but some improvements could be made to make the figures more understandable.

1b… all I see is ‘blue’ Missense, it would be helpful if you cut some of that out to zoom into the other mutational categories. Further, what is the scale for the heat bar on the bottom and the inset?
1d, be very helpful if the order from top to bottom was the same order as left to right in legend.

The authors then choose to use affinity propagation clustering (APC) for copy number variation analysis (CNV). APC is nice in that it allows for clustering independent of knowing approximate number of clusters a priori, like k-means. It’s also great that the authors compared this relatively novel clustering method with standard Euclidean distance hierarchical clustering methods and achieved similar results. Which begs the questions, why the novel approach?

Not surprisingly, significant common amplification and deletion events were detected in MYC, CCND1, SMAD4 and CDKN2A. When these CNV data were integrated with survival statistics, application of MYC was associated with poor outcome and over-represented in the adenosquamous subtype of PDA, even though there were no associations with other mutations.

Again, there are some figure improvements to better understand these findings.

2a: Keep some color consistency with the clustering. GISTIC legend needs some kind of unit… is that log transformed CNVs?
2b: make it clear that the clusters 5 and 6 came down into this figure.
2d: what are the units on each top/bottom axis? what are the numbers in parentheses?

The rest of the paper is quite description basically outlining the specific gene mutations and their associations with each other, patient characteristics, tumor classification and overall survival.

I had a hard time deciphering Figure 4a. I interpreted it as a flow chart, but find it incorrect, as BRAF mutation is not completely inclusive of KRAS mutation. The protein schematic (Lollipop plot) comparison of this PDA cohort to the available data in cBioPortal was nifty. However, fundamentally, I have issue with using a protein schematic representing genomics or transcriptomics data. In these figures, they are really conceptually translating the transcript, accounting for mutations detected and then drawing the conceptual amino acid mutations. Without sequencing the protein, this is, IMHO, misrepresenting. Although, I liked the idea, so maybe just show the data sequence you have measured but still highlight putative protein domain locations.

The OncoMaps in Figure 5 are definitely a great improvement over most pathway mapping schematics. While, I see color consistency between the map and cluster figures below, I’m unsure of what exactly the colors indicate. Is blue for missense mutations, red for CNV amplification? Despite what the caption states, there are some correlations, i.e., interactions between pathways, e.g., TGF-beta with SNF complex and TP53 or Hedgehog and TP53. That can be seen in 5c cluster analysis.

Overall, awesome work. The final table, summarizing the potential pharmacological strategies is brilliant. Thanks for opening the shades a little more on this high morbidity disease.

Ask us Questions

Let’s talk!
What are your bioinformatic pain points?

May 6 Tour de Force for PDA

Jul 22 Applied Bioinformaticians

Apr 18 How'd this get into Nature?