We have a test for some condition, say it is the dreaded Lurgi, and are testing some sample of the population, say 10,000 folks.
We have reason to believe that the prevalence of Lurgi is 1% of our population. That means that 99% do not have the condition.
We have an estimate that this test correctly identifies 99% of the people with Lurgi where the condition exists, missing 1%. This former percentage is called the sensitivity of the test.
We also have an estimate that it correctly identifies 99% of the people without Lurgi where the condition does not exist, giving false alarms for 1%. This former percentage is called the specificity or selectivity of the test.
Higher sensitivity or selectivity or both reduce the rate of errors. We can call this the power of the test.
If we test the 1% of the people with Lurgi, we can expect to identify 99% of that 1% or 0.99% of the sample correctly as having the condition, missing 1% of that 1% or 0.01% of the sample.
This also means that if we test the 99% of the people without Lurgi, we can expect to identify 99% of that 99% or 98.01% of the sample correctly as not having the condition, with false alarms of 1% of that 99% or 0.99% of the sample.
Since our sample is 10,000 people, we get the following results:
Prevalence | With Condition (1%) | Without Condition (99%) | Totals |
Tests Positive | Hits = 99
(10,000 X .99 X .01) |
False Alarms = 99
(10,000 X .01 X .99) |
198 |
Tests Negative | Misses = 1
(10,000 X .01 X .01) |
Correct Rejections = 9,801
(10,000 X .99 X .99) |
9,802 |
Totals | 100 | 9,900 | 10,000 |
You can see that for this population and test power, the number of false alarms is as great as the number of hits. That is, the number of hits is overstated by a factor of two, since we do not know from the test which ones are the false alarms, only their expected frequency.
The misses are not a problem. If you reverse the prevalence numbers, so that 99% have the condition, the false alarm problem goes away, and the miss problem becomes significant. For prevalence figures in the middle, the problems with errors become smaller.
With poor tests, less power, we get more errors. With even lower prevalence, we get more false alarms.
In medicine and statistics in general:
- hits are called true positives
- misses are called false negatives
- correct rejections are called true negatives
- false alarms are called false positives
You may encounter these terms. The language I have used is both more intuitively understood, and is also the language of the formal methods underlying this type of analysis, Signal Detection Theory. You can relate it to smoke detectors, fire alarms, or many common situations, once you understand it.