May 09, 2003
breast cancer genomics paper

Our group's paper is released for publication today. The associated commentary by Sridhar Ramaswamy and Chuck Perou is cogent.

My initial response to their commentary
They query the use of metagenes (they call them "highly abstracted structures") to summarize the impact of multiple genes, suggesting that treating genes individually is enough, and that deconvoluting the roles of individual genes from metagene data is difficult--I'd venture that (1) metagenes reduce noise; (2) more importantly, they reduce this noise in the context of discrete biological functions, i.e. all analytical techniques aggregate genes in some manner, what we do is distill and heighten the signal/noise ratio for genes that share functional associations; (3) metagenes actually simplify the process of understanding how individual genes interact within functional roles because they allow us to prioritize and estimate the impact of individual genes.

They also ask what the point of predicting lymph node status is if it's an "imperfect surrogate", but the point is that lymph node status is currently the single best clinical prognostic indicator, so (1) shouldn't we try understand genomic data in this context? (2) isn't there pure scientific interest in understanding the metagenes and biology involved in lymph node status? and (3) what about those patients who within a narrow window of time at workup are lymph node negative, but are about to convert to lymph node positive? Isn't it of use to be able to identify these patients?

Finally, they off handedly say that out-of-sample cross-validation "generally overestimates" accuracy. I'm not so sure this is a proper generalization to make. If we "locked" our predictive model and just cross-validated the samples, perhaps. But we cross-validate not only the samples, but the model. This is about as stringent as one can get. Naturally we are actively augmenting our sample size and hope to be working with thousands of samples in the near future.

Reasonable questions. Perou and Ramaswamy do find common ground in that gene expression data all point to the fact that metastatic potential is present in primary tumors.

