False Positives

One of the large problems in the life sciences is the messiness of the measurements coupled with the amount of effort typically needed for a single measurement leading to studies being under-powered. When a large screen is done, a lot of genes may show a positive effect with p-values < 0.05 when each considered individually. If you screen 20,000 genes with a less-then-perfect assay you should expect a lot of positive hits just by chance. Of course we biologists are all acutely aware of this phenomenon as it has been pounded into us since we were fresh grad students. And yet the problem still plagues many biology experiments.

But this problem of false positives is only a problem if those positives can’t be easily tested individually. If you are doing a genome wide association study on human participants using microarrays and you observe 20 alleles associated with your target disease, you can be sure that most of false positives. Exploring any of those 20 alleles more deeply are each PhD projects on their own. So, often the GWAS result is presented as-is with varying degrees of statistical analysis following but no other experimental investigation1.

At Mozza I have received my fair share of criticism on the sloppiness of many of our screens. Very generally, we are screening for traits (either gene knockdowns or gene overexpressions) that lead to higher accumulation of casein protein in our soybeans. The measurement of the screen is typically a Western blot or some other immunological assay that has an inherently high amount of noise. This leads to a lot of false positives. But my goal is never to claim that these traits are effective or ground breaking or worthy of awe by my peers gawking at how small my p-value is. In fact, I don’t even care if I have grossly misunderstood the mechanism through which the observed effect may have arisen. I just need something to make more casein protein accumulation.

So I did a screen and got some hits - most or all of which I have to assume are false positives. Now I can transform those traits into plants that already reliably produce the same amount of casein generation after generation. I have a pretty reliable way of measuring the amount of casein in a plant and the magnitude of the effects I am looking for is substantial2. So a few months later I know pretty conclusively if the trait was any good.

Now, there is a substantial cost to generating transgenic plants with new traits in it. People hours, reagent cost, greenhouse space, etc. Hence the large initial screen to weed out the majority of ideas. Screen first then test in full the smaller number of hits in real plants.

False negatives are trickier to deal besides just improving the assay accuracy. If I have to test all positives and all negatives then the initial screen was not needed. So I generally only test positives and when our team has the bandwidth, try to improve the methods to minimize false negatives.

In summary, I am happy with my sloppy screens.


  1. I’m not trying to dunk on those who do GWAS studies. Large scale screens are a critical part of the biological discovery pipeline. Testing if a certain allele is actually involved in a disease could be decades of work for a small team. So it’s not unthinkable that one might want to publish the results of a large screen without exploring which positives are indeed false positives. ↩︎

  2. I am usually looking for 2 or 3 fold increase in protein accumulation. Very easy to notice in an ELISA or even a Western blot when the experimental samples and the control plant sample are run side by side. ↩︎