May 22, 2003

Collaboration Philosophy

The CODEx Project isn't only a research group dedicated to a research end, whether it is clinico-genomic predictive models of breast cancer, or cancer X, it is also a philosophy of how a research team interacts. In December 2001, I wrote a memo after a productive CODEx meeting in Taipei, Taiwan that proposed, among other things, finding a way to regularly have the statisticians, clinicians, and scientists working shoulder-to-shoulder. Starting last year, we began assembling our group every week, once a week for half-day working/brain-storming "retreats". Currently we meet every Thursday for the entire day.

We convene in a room with both hard-wired and wireless internet access, plasma screens and videoconferencing capabilities, bring our laptops and discuss work in the Project and related projects during the session. At times, we'll have informal presentations of new data or new analytic methods. Sometimes we'll invite someone to give a more formal invitation. The rest of the time is filled with discussion and work. Because all data resides on a secure server, we are able to work on analyses remotely using the secure shell. Our computational statisticians can even remotely access a 100-node Beowulf cluster. If we need to, we can project figures on large plasma screens. And when we need to discuss CODEx work directly with our collaborators in Taipei, we can videoconference with them with the capability of projecting work on our laptops to our distant collaborators.

Why do all of this? Though this may seem to be an inefficient way to spend our time, it actually has allowed the Project to move quickly. Genomics is inherently an interdiscipinary science. At this stage, most biomedical scientists are unaccustomed to dealing with millions of datapoints; most statisticians have little exposure to molecular biology. By being able to work with one another intimately, in the same room, and exploring data and how representing the data facilitates biological interpretation, the CODEx Project has been able to move rapidly from a promising experimental plan to publication such as the recent Lancet and Nature Genetics papers, and ultimately clinical application.

This is the reason the first word in the Project acronym is "Collaborative". Further, because our goal is seeing clinico-genomic data inform the practice of medicine, we involve industry in the CODEx Project.

Our game is applied research, and this is what industry knows how to do. The original research was sponsored by SYNPAC-NC, and we are currently developing relationships with other partners to bring our technology as quickly as possible to patients.

An important part of this cross-fertilization is to create biomedical researchers that understand statistics, and statisticians that understand biology, and all that understand what it takes to move from research and development to application.

The "Post-Genomic" age needs people who make the effort to cross traditional disciplinary boundaries and the CODEx Project is also about creating these types of people.

Posted by erich at 12:16 PM | Comments (0) | TrackBack

May 20, 2003

News link on Nature Genetics Paper

The Atlanta Journal-Constitution has a small piece (via wire service) on the latest Nature Genetics paper. Actually a bit surprising considering this paper is pure science and not as overtly clinical as the other.

Posted by erich at 12:01 PM | Comments (0) | TrackBack

ScienCentral Piece on Both Papers

This piece addresses personalized medicine for cancer in general, and covers the CODEx Project along with the NKI/Bernards/Friend group and their current clinical trials.

Posted by erich at 12:00 PM | Comments (0) | TrackBack

May 18, 2003

Nature Genetics Paper Published

Our paper is released for online publication today. Here's the link to Nature Genetics's Advance Online Publication.

High-density DNA microarrays measure expression of large numbers of genes in one assay. The ability to find underlying structure in complex gene expression data sets and rigorously test association of that structure with biological conditions is essential to developing multi-faceted views of the gene activity that defines cellular phenotype. We sought to connect features of gene expression data with biological hypotheses by integrating 'metagene' patterns from DNA microarray experiments in the characterization and prediction of oncogenic phenotypes. We applied these techniques to the analysis of regulatory pathways controlled by the genes HRAS (Harvey rat sarcoma viral oncogene homolog), MYC (myelocytomatosis viral oncogene homolog) and E2F1 ,E2F2 and E2F3 (encoding E2F transcription factors 1, 2 and 3, respectively). The phenotypic models accurately predict the activity of these pathways in the context of normal cell proliferation. Moreover, the metagene models trained with gene expression patterns evoked by ectopic production of Myc or Ras proteins in primary tissue culture cells properly predict the activity of in vivo tumor models that result from deregulation of the MYC or HRAS pathways. We conclude that these gene expression phenotypes have the potential to characterize the complex genetic alterations that typify the neoplastic state, whether in vitro or in vivo , in a way that truly reflects the complexity of the regulatory pathways that are affected.

Published online: 18 May 2003, doi:10.1038/ng1167

Posted by erich at 02:49 PM | Comments (0) | TrackBack

May 14, 2003

more news outlet linkage

From US News & World Report

Posted by erich at 01:03 PM | Comments (0) | TrackBack

May 10, 2003

lancet paper news outlet linkage

Here are some newslinks related to the Lancet Paper:

From the BBC
From Wired News
From Reuters via Yahoo
And Reuters Health (apparently a separate news outlet).

Posted by erich at 08:42 AM | Comments (0) | TrackBack

May 09, 2003

lancet paper published today

Today is the day. Linkage to the online abstract is here. Chuck Perou and Sridhar Ramaswamy write a cogent commentary that poses some excellent questions for us to answer regarding our metagenes, or as they put it "highly abstracted structures", as well as validation. We will address these in follow-on papers and responses.

Here are some initial responses to their commentary:
They query the use of metagenes (they call them "highly abstracted structures") to summarize the impact of multiple genes, suggesting that treating genes individually is enough, and that deconvoluting the roles of individual genes from metagene data is a "formidable challenge"--I'd venture that (1) metagenes reduce noise; (2) more importantly, they reduce this noise in the context of discrete biological functions, i.e. all analytical techniques aggregate genes in some manner, what we do is distill and heighten the signal/noise ratio for genes that share functional associations; (3) metagenes actually simplify the process of understanding how individual genes interact within functional roles because they allow us to prioritize and estimate the impact of individual genes.

They also ask what the point of predicting lymph node status is if it's an "imperfect surrogate", but the point is that lymph node status is currently the single best clinical prognostic indicator, so (1) shouldn't we try understand genomic data in this context? (2) isn't there pure scientific interest in understanding the metagenes and biology involved in lymph node status? and (3) what about those patients who within a narrow window of time at workup are lymph node negative, but are about to convert to lymph node positive? Isn't it of use to be able to identify these patients?

Finally, they off handedly say that out-of-sample cross-validation "generally overestimates" accuracy. I'm not so sure this is a proper generalization to make. If we "locked" our predictive model and just cross-validated the samples, perhaps. But we cross-validate not only the samples, but the model. This is about as stringent as one can get. Naturally we are actively augmenting our sample size and hope to be working with thousands of samples in the near future.

Reasonable questions. Perou and Ramaswamy do find common ground in that gene expression data all point to the fact that metastatic potential is present in primary tumors.

Posted by erich at 08:40 AM | Comments (0) | TrackBack

May 08, 2003

infrastructure, infrastructure

One important aspect of the CODEx Project is the fact that an infrastructure was already in place to do everything we had to do. Because the KF-SYSCC was built from scratch 12 years ago, they had the luxury of putting systems and procedures in place that would be difficult to install in an established institution due to institutional inertia. Salient among these systems was establishing a tumor tissue bank from the very beginning. The physicians at KF-SYSCC couldn't anticipate how this resource might be used in the future, but in spite of high overhead in cost and time they went ahead and did it.

This wasn't a trivial undertaking, because the workflows in the OR and in the Surgical Path labs were altered for the sake of an abstraction. That investment is paid off now that we have the genomic technology and analytic tools that can squeeze astonishing value out of these tissues. Tissue banking is also having Standard Operating Procedures that standardize who handles a tissue sample, how it is handled, how it's stored, how long it remains in the specimen bucket, &c. This is essential for reducing confounding variables.

Assuming we use technology to analyze gene expression in these samples, and that expression "profiles" are essentially phenotypes, we want to understand how these molecular phenotypes relate to clinical phenotypes. Again, KF-SYSCC presciently built a computerized, double-keyed clinical database.

These seemingly mundane, painstaking infrastructural investments now provide a virtual scientific playground for the members of the CODEx Project. We can range across these tissues, their gene expression patterns, and integrate them with long-established wisdom with regard to clinical behavior.

The point is, infrastructure is far from boring or mundane, or a waste of time. If you have robust procedures and systems in place early, it will save you heartache in the future.

Now we are helping, in an advisory role, Oncology at Duke to start laying down the foundation for genomic, individualized medicine for the future. This necessarily entails a great deal of commitment, a willingness to sweat the details, and, more important than anything, desire to get this done. In return is data. Data is always good. Academic medical centers are known as bastions of research. Oddly enough, virtually all the data in a university hospital's day-to-day operations is inaccessible in spite of the fact that every clinic and ward is pumping out usable information. Extracting data is usually a painful process of applying for funds and manually compiling data from hundreds of paper medical charts. Ideally, a research hospital should have a standardized electronic medical record, robust procedures for protecting patient confidentiality, and easy means to access data in an anonymized format. Duke should be a data pump.

Right now, Duke is no different than every other "prestige" academic medical center in the world: data is expensive and time-consumingly difficult to access. Data collection is the responsibility of individual investigators, therefore hundreds of unconnected databanks exist where they should be unified. Instead of being analogous to an easily-squeezed sponge, getting data from a "modern" medical center is like squeezing a rock. With modern computer technology, there's no reason this should continue.

Posted by erich at 08:33 PM | Comments (0) | TrackBack

genomics & duke

Reproduced here is an editorial I wrote published in the Duke Chronicle in December of 2001. Many of the issues addressed in the closing paragraphs are now being addressed as Duke's IGSP has had its new Director, Hunt Willard, in place since January, and we now have the focused leadership to start carrying projects forward...

Look out for two papers by our group coming out this month. One in The Lancet specifically addressing prognostic modeling of gene expression data for high- and low-risk breast cancer patients, as well as for predicting axillary lymph node status. The second paper will be published in Nature Genetics at the end of the month, and shows how binary regression models of gene expression data can provide quantitative estimates of oncogenic activities in vitro. See the sidebar on the right for the specific references, once the papers are out.


"Betty found a lump in her breast. A proactive woman, she arranged an appointment at Duke through a friend's oncologist and was naturally apprehensive about the challenges she was about to face. She was otherwise a physically fit and vivacious woman who enjoyed talking about her hobbies.

In medical school, it's a common admonishment that we treat patients as individuals. We are taught to take careful histories and physical exams and summon the story that brought this particular patient to the hospital on this particular day. Ironically, all that work to render a patient unique is left by the wayside once a diagnosis is made. Modern medicine is really quite monolithic. A new heart-failure drug or a new leukemia treatment protocol is validated by massive studies on thousands of patients who possess virtually identical clinical parameters treated in an identical manner. Right now, this is the only robust way to prove that new treatments present any added benefit over what is already available. Consequently, though patients are genetically and biologically distinct individuals, we tend to treat them in a cookie-cutter fashion. And only after something doesn't work do we begin the trial-and-error process of customizing treatment.

For the most part, medicine is still empirical. A lot of what we do is because such-and-such study on 15,000 patients showed drug A worked better than drug B. Why? Simply because it worked. From a biological or pharmacologic standpoint, we might have a notion why, but honestly, much of this is hand-waving. This is the reason, I suppose, that some people look elsewhere. When Suzanne Somers chooses to inject herself with mistletoe extract for breast cancer, as kookie as it may seem, the medical establishment is evidently failing to answer some need. The cookie cutter works for most--and the statistics prove it--but there are always exceptions.

So how do we move beyond the cookie-cutter paradigm? Genomics. With genomics, we finally have the tools that permit us to treat patients as genetically and biologically distinct individuals.

A patient like Betty doesn't only have a unique story--she has a distinct genetic makeup that accounts for not just her curly hair, or how tall she is, but how she responds to infection, her susceptibility to cancer, whether she eventually has high blood pressure or even how she metabolizes certain drugs. Genomic technology is the only means available that captures the biological complexity of an individual person or even an individual cancer. One such application, recently published in Nature, was developed by a group at the National Cancer Institute. Using DNA microarrays, they managed to sub-categorize what was thought to be a single type of lymphoma into two distinct types, one that responds to standard therapy and one that is less amenable to that therapy. Genomic technology allows us to see what used to be monolithic and undifferentiated as variegated and unique. Here at Duke, a research team is applying similar approaches to breast cancer. As a member of this group, I am convinced that the day isn't far off when a "molecular phenotype" of a woman's breast tumor biopsy sample will permit us to customize her treatment by measuring the tumor's propensity to metastasize, how responsive it will be to certain drugs or radiation therapy and how rigorously it will have to be monitored. However, to reach this point will take steadiness, discipline and considerable resources. While scattered individual scientists at Duke are committed to this goal, success ultimately depends on a focused institutional commitment.

Over a year ago, Duke launched its $200 million Institute for Genome Sciences and Policy. In the 70 years since its foundation, this commitment to genomics is one of the University's most important. Yet, my sense is that the effort has become diffuse. Our recent difficulties in recruiting a director for the IGSP is only one symptom of this fuzziness.

In order to find focus, the University has to address several realities:

1. The critical technological advances at the bleeding edge of genomics were developed at other institutions including Stanford University and the Whitehead Institute at the Massachusetts Institute of Technology as well as at companies like Celera. Duke is consequently embarking at a slight disadvantage.

2. Replicating the same efforts taking place at dozens of other universities across the country, such as the University of North Carolina at Chapel Hill, likely isn't the most efficient use of funds.

3. Duke Medical Center is a heavyweight in the management and analysis of clinical trials. But there does not seem to be any concerted effort to take advantage of this strength in the genomic arena.

That Duke was not involved in the genesis of many of the genomic technologies being used today is not a terrible handicap. Ultimately, it is the application of that technology that will have palpable impact for patients like Betty. Duke can offer its own distinctive contribution to the genomic sciences by welding the incredibly complex and rich biological data that genomic technology provides with the Medical Center's proven expertise in clinical research. I don't mean to stir the pot by suggesting that the funds earmarked for the IGSP and its five centers be redistributed. Much of what I suggest is purely a matter of institutional emphasis. By focusing on projects that push Duke's uniqueness forward, many other advantages will accrue. In the near future, all drugs and all clinical protocols will have to be evaluated in the light of genomics, I cannot see any reason why Duke should not lead this charge. Further, in arriving at the goal of tailoring treatments for individual patients, the accompanying interpretive expertise and computational infrastructure will benefit not only the patients but the zoologist studying zebrafish or the cell biologist studying nematodes. My experience in the lab indicates that the tools for analyzing genomics in a clinical context are easily transferred to basic science applications.

Duke has the resources to compellingly usher forth an era of genomic medicine that does justice to the uniqueness of patients like Betty. Only with clarity of purpose can we mold the huge commitment the University has made to the IGSP, into a contribution to science and society that is distinctively 'Duke.'"

Posted by erich at 06:26 PM | Comments (0) | TrackBack