Testing: simplest explanation of false alarms to hits

 

We have a test for some condition, say it is the dreaded Lurgi, and are testing some sample of the population, say 10,000 folks.

We have reason to believe that the prevalence of Lurgi is 1% of our population. That means that 99% do not have the condition.

We have an estimate that this test correctly identifies 99% of the people with Lurgi where the condition exists, missing 1%. This former percentage is called the sensitivity of the test.

We also have an estimate that it correctly identifies 99% of the people without Lurgi where the condition does not exist, giving false alarms for 1%.  This former percentage is called the specificity or selectivity of the test.

Higher sensitivity or selectivity or both reduce the rate of errors. We can call this the power of the test.

If we test the 1% of the people with Lurgi, we can expect to identify 99% of that 1% or 0.99% of the sample correctly as having the condition, missing 1% of that 1% or 0.01% of the sample.

This also means that if we test the 99% of the people without Lurgi, we can expect to identify 99% of that 99% or 98.01% of the sample correctly as not having the condition, with false alarms of  1% of that 99% or 0.99% of the sample.

Since our sample is 10,000 people, we get the following results:

Prevalence With Condition (1%) Without Condition (99%) Totals
Tests Positive Hits = 99

(10,000 X .99 X .01)

False Alarms = 99

(10,000 X .01 X .99)

198
Tests Negative Misses = 1

(10,000 X .01 X .01)

Correct Rejections = 9,801

(10,000 X .99 X .99)

9,802
Totals 100 9,900 10,000

You can see that for this population and test power, the number of false alarms is as great as the number of hits. That is, the number of hits is overstated by a factor of two, since we do not know from the test which ones are the false alarms, only their expected frequency.

The misses are not a problem. If you reverse the prevalence numbers, so that 99% have the condition, the false alarm problem goes away, and the miss problem becomes significant. For prevalence figures in the middle, the problems with errors become smaller.

With poor tests, less power, we get more errors. With even lower prevalence, we get more false alarms.

In medicine and statistics in general:

  1. hits are called true positives
  2. misses are called false negatives
  3. correct rejections are called true negatives
  4. false alarms are called false positives

You may encounter these terms. The language I have used is both more intuitively understood, and is also the language of the formal methods underlying this type of analysis, Signal Detection Theory. You can relate it to smoke detectors, fire alarms, or many common situations, once you understand it.

 

Leave a Reply

Your email address will not be published. Required fields are marked *