Three reasons to distrust microarray results

Even when lab work and statistical analysis carried out perfectly, microarray experiment conclusions have a high probability of being incorrect for probabilistic reasons. Of course lab work and statistical analysis are not carried out perfectly. I went to a talk earlier this week that demonstrated reproducibility problems coming both from the wet lab and from the statistical analysis.

The talk presented a study that supposedly discovered genes that can distinguish those who will respond to a certain therapy from those who will not. On closer analysis, the paper actually demonstrated that is it possible to distinguish microarray experiments conducted on one day from experiments conducted another day. That is, batch effects from the lab were much larger than differences between patients who did and did not respond to therapy. I hear that this is typical unless gene expression levels vary dramatically between subgroups.

The talk also discussed problems with reproducing the statistical analysis. As is so often the case, data were mislabeled. In fact, 3/4 of the samples were mislabeled. Simply keeping up with indexes is the biggest barrier to reproducibility. It is shocking how often studies simply did not analyze the data they say they analyzed. This seems like a simple matter to get right; perhaps people give little attention to it precisely because it seems so simple.

So, three reasons to be skeptical of microarray experiment conclusions:

High probability of false discovery
Statistical reproducibility problems
Physical reproducibility problems

6 thoughts on “Three reasons to distrust microarray results”

Pingback: The Third Bit » Blog Archive » Three Reasons to Distrust Microarray Results

Pingback: Microarrays may be bad, but not that bad. « Suicyte Notes

Carl December 17, 2008 at 6:47 pm

This hasn’t been my experience it all. I should mention firstly that I use exclusively expression-based microarray methodologies, so I can’t speak about other platforms. We don’t simply report p-values. Instead, we use FDR-corrected q-values. Of the genes that change the most, we only rarely find false positives, and when we do, it’s usually because probe annotations in public databases were faulty (probe actually represented some intergenic transcript, rather than the gene annotated). After hand-curating the offending probes, we can usually reproduce the data. I admit I don’t know the true degree of false negatives. In reference to reproducibility, that’s why we perform multiple n’s, principle component analyses, etc. Though, again, in my hands, most array differences arise from biological, rather than, array methodology variance. I don’t believe I’m alone in that respect, see PMID: 16964229, PMID: 18793455, for example. We’ve even had some of our studies repeated by other labs, and the lists of differentially-expressed genes, even when using a slightly different statistical analysis and different array normalization procedures, were remarkably similar.

Reply ↓

Pingback: Randomise your samples! « I Love Symposia!