What types of complexity would be involved in computationally comparing 2 genomes to find over-expressed genes?<p>There has been news of success in comparing the genome sequences between cancer cells and healthy cells. I would like to hear any expert opinions regarding the complexity.<p>Some background:<p>The goal is to sequence patients' and their cancer's genomes, then compute the gene candidates to target for drug therapy, based on the difference of the two ( I know it's more complicated than that ). It took researchers a month to process the data, between the diff of normal and malignant cells. They then found a likely candidate gene that was over-expressed in the cancer cells- and coincidentally, a drug existed that targeted the gene, to the apparent success of the patient.<p>http://www.charlierose.com/view/interview/12455
http://www.nytimes.com/2012/07/08/health/in-gene-sequencing-treatment-for-leukemia-glimpses-of-the-future.html
Sure. There are teams doing it every day right now, with clinical patients. It's still research, but we are pushing variant calls back into patient's medical record to help make treatment decisions.<p>From a complexity perspective, not nearly as much as you would think (by which I mean there are <i>many</i> software tools that already do this pretty efficiently). The day-to-day problems are much simpler (and more standard IT) than you might think. For example, you can safely assume that finding variants is done. Now, where is the source of drugs that are in clinical trial or in approved treatments for a specific mutation. Can you get that in front of pathologists quickly?<p>You probably want to take a look at The Cancer Genome Atlas (TCGA) project. They are sequencing normal and tumor tissue from patients across a large number of cancer types and making the resulting sequence data available for research.<p>Edit (additional info):
It does NOT take a month to do this sequencing and it is getting tremendously quicker every day. The MiSeq from Illumina can pump out the fastq file from a normal or tumor sample in 24 hours (and it can do something like 96 samples at a time, but I don't know if people push that in production).
I'm no expert, but it's an incredibly complex process. One of the things about cancer cells is that they mutate at an amazingly fast rate because they are constantly multiplying and dividing, and so it can be difficult to target just a single gene that is responsible for transformation from a normal cell to a cancerous cell. Unfortunately as well, we may be able to tell that a particular gene has been mutated in such a way that it leads to overexpression of X, but it's often not just the overexpression of a gene, it's often the mutation of some enzyme that blocks another protein that will serve as a repressor of some other gene. There's immense bodies of research, but there's a ton that we simply just don't know yet.