Even when lab work and statistical analysis carried out perfectly, microarray experiment conclusions have a high probability of being incorrect for probabilistic reasons. Of course lab work and statistical analysis are not carried out perfectly. I went to a talk earlier this week that demonstrated reproducibility problems coming both from the wet lab and from the statistical analysis.
The talk presented a study that supposedly discovered genes that can distinguish those who will respond to a certain therapy from those who will not. On closer analysis, the paper actually demonstrated that is it possible to distinguish microarray experiments conducted on one day from experiments conducted another day. That is, batch effects from the lab were much larger than differences between patients who did and did not respond to therapy. I hear that this is typical unless gene expression levels vary dramatically between subgroups.
The talk also discussed problems with reproducing the statistical analysis. As is so often the case, data were mislabeled. In fact, 3/4 of the samples were mislabeled. Simply keeping up with indexes is the biggest barrier to reproducibility. It is shocking how often studies simply did not analyze the data they say they analyzed. This seems like a simple matter to get right; perhaps people give little attention to it precisely because it seems so simple.
So, three reasons to be skeptical of microarray experiment conclusions:
- High probability of false discovery
- Statistical reproducibility problems
- Physical reproducibility problems