With respect to medical testing in general, the prevalence of a condition in the population has a huge effect on the number of false alarms and misses. Depending on the prevalence of the condition, and the performance of the test, the number of false alarms can exceed the number of hits. It is simple arithmetic, but a bit confusing.
If there is 0% infection in the population, any report of infection logically has to be a false alarm; there can be no hits. If there is 100% infection in the population, every report of infection logically has to be a hit; there can be no false alarms. The number of estimated false alarms drops as the percent infection rises.
If there is 0% infection in the population, logically there can be no misses. If there is 100% infection in the population, any report of no infection logically must be a miss. The number of estimated misses rises as the percent infection rises.
This is true even with a test of high sensitivity (detects infections if infection exists, with not too many misses) and high selectivity/specificity (detection of no infection if no infection exists, with not too many false alarms).
As sensitivity increases the number of hits increases and number of misses decreases. With a sensitivity of 100%, which does not happen, there would be no misses. See the graph below for a visual explanation of this.
As selectivity/specificity increases the number of correct rejections increases and the number of false alarms decreases. With a selectivity/specificity of 100%, which does not happen, there would be no false alarms. See the graph below for a visual explanation of this.
If selectivity/specificity is low, you get more false alarms. Combine this with a low prevalence, and the false alarms outshine the hits. This overstates the prevalence of the infection.
If the sensitivity is low, you get more misses. Combine this with a high prevalence, and the misses outshine the correct rejections. This understates the prevalence of the infection.
Use the calculations:
- Hits = Sensitivity X Test Cases X Prevalence
- False Alarms = (1-Selectivity) X Test Cases X (1- Prevalence)
- Misses = (1 – Sensitivity) X Test Cases X Prevalence
- Correct Rejections = Selectivity X Test Cases X (1 – Prevalence)
|Test Cases||Prevalence||Hits||False Alarms||Misses||Correct Rejections|
Note the changes in 1) false alarms to hits, and 2) misses to correct rejections, as the prevalence increases.
Another Example Showing Ratios of Incorrect to Correct
For a selectivity/specificity of 99%, a sensitivity of 99%, and a prevalence of 1%, 50% of positives will be false alarms. It gets better as the prevalence increases (table below).
For a selectivity/specificity of 99%, a sensitivity of 99%, and a prevalence of 99%, 50% of negatives will be misses. It gets better as the prevalence decreases (table below).
|False Positive to True Positive (FP/TP)||Infinite||1.00||0.49||0.01||0.00||0.00||0.00|
|False Negative to True Negative Ratio (FN/TN)||0.00||0.00||0.00||0.01||0.49||1.00||Infinite|
|False Positive to All Positive (FP/(TP + FP))||100.00%||50.00%||33.11%||1.00%||0.02%||0.01%||0.00%|
|False Negative to All Negative (FN/(TN + FN))||0.00%||0.01%||0.02%||1.00%||33.11%||50.00%||100.00%|
In medical testing, we need to establish some test measure, some quantification of the condition of interest. We then need some method of assessment for determining this measurement, the values for that condition of interest. We need a consistent and reliable protocol for administering and scoring the test.
Discriminating Signal from Noise
We can look at the measure as the signal. We can look at spurious signals as noise, extraneous information which will make it hard to detect the signal.
We need some method to discriminate the signal from the noise. Different tests have different abilities to differentiate the signal from random noise. That is to say not all tests give the same level of performance.
The Right and the Wrong of It
In signal detection theory, there are two ways to be wrong: false alarms and misses, and there are two ways to be right: hits and correct rejections.
- Hits are a measure of how many with the condition are correctly identified as having the condition. This is also called a true positive (TP).
- False alarms are a measure of how many without the condition are incorrectly identified as having the condition. This is also called a false positive (FP).
- Misses are a measure of how many with the condition are incorrectly identified as not having the condition. This is also called a false negative (FN).
- Correct rejections are a measure of how many without the condition are correctly identified as not having the condition. This is also called a true negative (TN).
Sensitivity, given as a fraction or a percent, is the ability of a test to correctly identify those who have the condition. It can only be assessed against the percentage of those who have the condition, the prevalence. It gives a rate for hits, and when the one’s complement is taken, a rate for misses.
Selectivity/specificity, given as a fraction or a percent, is the ability of a test to correctly identify those who do not have the condition. It can only be assessed against the percentage of those who do not have the condition, the infrequency. It gives a rate of correct rejections, and when the one’s complement is taken, a rate for false alarms.
Setting a Threshold
For a given test method we establish a threshold, some cut-off value, for our measurement. We use this to determine if we are getting a signal, or just noise. Above the threshold a measure will be deemed to be a detected signal, below the threshold will be no detected signal. We can set the threshold to bias the detection one way or another. The resulting differences in type of error will be dependent upon the threshold setting.
The ratio of hits to misses depends on the threshold, as does the ratio of correct rejections to false alarms. A decreased threshold shifts the bias towards more hits and fewer misses. At the same time, it shifts the bias towards more false alarms and fewer correct rejections. So hits and false alarms rise and fall in the same direction according to the bias.
Improving the Detector
A better test gives better discrimination of correct versus incorrect results, that is, better accuracy. This can be accomplished by changing the test, or changing the test protocols.
Looking at Counts
We may have statistics on expected performance for our test, but we also want to calculate estimated statistics for some given number of tests. We will want to count the number of independent tests performed and use those numbers in our calculations.
Prevalence and Infrequency
Prevalence is the estimated measure of the percent of the total population who have the condition.
Infrequency is the complement of prevalence, and is the estimated measure of the percent of the total population who do not have the condition.
Note that I use the word infrequency, as an antonym to frequency, which itself is a synonym for prevalence. There may be another term in common use, but I did not discover such.
Accuracy, Discrimination and Errors
We can compute a simple measure of accuracy by taking the total errors and dividing by the total of correct plus erroneous observations. With a better test, more discriminatory power, more accuracy, the error rate decreases.
With decreasing prevalence, the number of false alarms increases, and the number of misses decreases.
With an increasing prevalence, the number of false alarms decreases, and the number of misses increases.
Ratio of Errors to Correct
The ratio of false alarms to hits is found by taking the infrequency rate multiplied by the expected false alarm percentage and then dividing this quantity by the prevalence multiplied by the expected hit percentage.
The ratio of misses to correct rejections is found by taking the prevalence rate multiplied by the expected miss rate and then dividing this quantity by the infrequency rate multiplied by the expected correct rejection percentage.
Working through an example
Let me work through an example:
- We have an outbreak of Lurgi in Upper Middlemarsh. It is a terrible disease. 1
- We have reason to believe that 2% of the population is infected. This is the prevalence. So 98% will not be infected. This is the infrequency.
- We have a test for Lurgi, validated so that it identifies 90% of those infected and misses 10%. This is the sensitivity of the test.
- The test also identifies, gives a correct rejection, of 90% of those not infected, and gives a false alarm for 10% of those who are not infected. This is the selectivity/specificity of the test.
- We test 1000 people once. This is the test count. Of those 1000 people, 2% or 20 will be infected. It follows that 98% or 980 will not be infected.
- If we run our test on the 2% infected, it will identify correctly 90% or 18. These are the hits.
- It follows that 10% or 2 of those infected will be missed. These are the misses.
- If we run our test on that 98% not infected, it will identify correctly 90% or 882. These are the correct rejections.
- It follows that 10% or 98 of those not infected will be identified as infected. These are the false alarms.
- We have 18 hits versus 98 false alarms.
- We have 2 misses verses 882 not infected, correct rejections.
- It can be concluded that the false alarm rate to hit rate of 98/18 = 544% is very bad because of this mediocre level of test accuracy and the low disease prevalence.
- It can be concluded that the misses to correct rejection rate of 2/882 = 23% is not as bad because of the high disease infrequency.
- If we reverse this, changing the prevalence figure to 90% infected, the false alarms become small and the missed infections much larger.
- If we have a better test, we can reduce the error rate.
- We should try to get a better test.
I’m going to put these figures in the table below:
|Does The Condition Exist?
Testing 1000 for Lurgi
with Sensitivity of 90% and Selectivity/specificity of 90%
2% Estimated Prevalence
|Condition Is Absent
98% Estimated Infrequency = 980
|Was the Effect Observed?||Effect Observed||90% x 20 = 18 Hits||10% x 980 = 98 False Alarms|
|Effect Not Observed||10% x 20 = 2 Misses||90% x 980 = 882 Correct Rejections|
Based on Test Performance, Prevalence, and Number of Tests