COVID-19 Testing and Bayes’ Theorem

Author

Jasper Yang

Published

December 15, 2020

In the nine months since COVID-19 diagnostic tests were first made available to the public in March, they have served valiantly as one of the world’s greatest tools for tracking the sweeping spread of the disease. Widespread testing has also enabled the return of professional sports, made the partial re-opening of many college campuses possible, and provided a sense of safety for small social gatherings.

But just how reliable are they? The answer to this question carries much more weight at the level of the individual in that a few inaccurate results out of 100 won’t change much in terms of health officials’ overall tracking of the disease, but it could be the difference between life and death of a loved one for the few that the error affects. Studies have suggested the true error rate is probably somewhere around this “few in a hundred” mark, but the accuracy can vary depending on other factors such as the type of test and time since exposure. For this reason, CDC guidelines clearly state that a negative test doesn’t clear someone from the possibility of being infected with COVID, but many people are still willing to act like it does. In some cases, this may be a reasonable risk to take, but a more thorough assessment of the risks requires an understanding of the importance of prior probability.

Bayes’ Theorem and COVID Testing

Really, a COVID test should be treated as no more than a (quite strong) piece of evidence in a larger pool of information. If person A, who had a lengthy indoor dinner with a symptomatic friend, decides to get tested after experiencing symptoms and gets a positive result, they can be almost certain that they have COVID. If person B, who follows social distancing guidelines stricly, gets a precautionary test that comes back positive, there is still a fair chance that they have COVID, but their probability of being infected is much lower than person A’s. The only difference between these cases is the “other” evidence available in the larger pool, and it matters. Bayes’ Theorem helps us understand why and allows us to quantify the difference. In its simplest form, the theorem can be written as: \[P(A|B)= \frac{P(A)*P(B|A)}{P(B)}\]

Where:

  • \(P(A|B)\) is the updated, or posterior, probability of \(A\) given the evidence \(B\).
  • \(P(A)\) is the prior probability of \(A\) before considering the evidence \(B\).
  • \(P(B|A)\) is the conditional probability of the evidence \(B\) occurring supposing that the outcome \(A\) is in fact true.
  • \(P(B)\) is the prior probability of the evidence \(B\) occurring before it actually occurred. This is the “universe” of our equation, which we are dividing to take only the portion where \(A\) is true.

Essentially, Bayes’ theorem tells us how much a new piece of evidence alters the probability we assigned to the outcome before receiving the new evidence. This altered probability is called the posterior probablility, or \(P(A|B)\).

Applying this theorem to our COVID testing case, we have \[P(COVID^+|test\:result)= \frac{P(COVID^+)*P(test\:result|COVID^+)}{P(test\: result)}\]

While it may not look it, Bayes’ Theorem it is actually quite intuitive in this simple case. This becomes especially apparent when we consider the denominator, \(P(test\:result)\), as the sum of two distinct parts. Because we don’t know for certain whether we have COVID, the test result could be false or true, so one part must account for the probability that the the result occurs and we have COVID and the other part for the probability that the result occurs and we don’t have COVID. Accordingly, we can break \(P(test\:result)\) into the sum of \(P(COVID^+)*P(test\:result|COVID^+)\) and \(P(COVID^-)*P(test\:result|COVID^-)\), giving us a new application of Bayes’ Theorem to COVID:

\[\scriptsize P(COVID^+|test\:result)= \frac{P(COVID^+)*P(test\:result|COVID^+)}{P(COVID^+)*p(test\:result|COVID^+)+P(COVID^-)*P(test\:result|COVID^-)}\]

Now it is clear to see that this theorem simply represents the proportion of total scenarios where we receive a specific test result in which we also have COVID.

Intepreting COVID Test Results: An Example

Scenario One: Let’s suppose that you work from home, strictly follow social distancing guidelines, and avoid leaving the house at all costs but live with one roommate who recently traveled out-of-state to visit family. Upon their return, they start to develop a fever and decide to get a COVID test which comes back positive. You isolate from them immediately and decide to get a Rapid Antigen test five days (to increase the chances of an accurate test by allowing the virus to incubate) later. It comes back negative. What is the probability that you have COVID?

Recalling our previously outlined Bayesian approach to this question, we need to be able to estimate three probabilities:

  • \(P(COVID^+)\): The prior probability of you having COVID prior to the test. Since you never leave the house, it is nearly certain that the only way you could have been infected with the virus was from your roommate. A recent study estimated the secondary transmission rate of COVID-19 within households to be 53%. This estimate is certainly not perfect, but it does’t have to be, so you reason that your prior probability of having COVID before receiving the test result is 0.53. Notice that because you either have COVID or you don’t, \(P(COVID^-)\) is 1-0.53 = 0.47.

  • \(P(test\:result|COVID^+)\): The probability that you would test negative given that you have COVID. This probability is called the “false negative rate” of the test. Digging into the research, you find a study that estimates a false negative rate of 20% for the specific test that you received (not bad for a rapid test given that the false negative rates of rapid antigen tests are thought to be anywhere from 10%-50% according to a Harvard Medical School Blog Article). Again, this estimate is probably not perfect, but it’s the best that you have.

  • \(P(test\:result|COVID^-)\): The probability that you would test negative given that you do not have COVID. This probability is called the “specificity” of the test. The same study that you used to find the false negative rate estimates the true negative rate to be 95%.

Plugging these numbers into Bayes’ Theorem, you have:

\[\small \begin{split} P(COVID^+|test^-) & = \frac{P(COVID^+)*P(test^-|COVID^+)}{P(test^-)} \\ & = \frac{P(COVID^+)*P(test^-|COVID^+)}{p(test^-|COVID^+)*P(COVID^+) + p(test^-|COVID^-)*P(COVID^-)} \\ & = \frac{(0.53)(0.20)}{(0.20)(0.53) + (0.95)(1-0.53)} \\ & = 0.192 \end{split} \]

With this information, you determine that based on the evidence you have, there is still a 19.2% chance that you have COVID.

Scenario 2: Now suppose that your living situation is the same as scenario one, only you do not have a roommate. You can’t imagine that you have the virus given that you never leave the house, but you want to visit your parents for Thanksgiving, so you decide to get a PCR COVID test before leaving just to be safe. To your surprise, it comes back positive. What is the probability that you have COVID?

First, we estimate the probabilities for Bayes’ Theorem:

  • \(P(COVID^+)\): The prior probability of you having COVID prior to the test. Before getting the test, you thought that it was almost impossible for you to have the virus. If you do have it, you reason that you must have picked it up through a surface transmission on a grocery delivery. This type of transmission is known to be rare, and you usually wash your hands after touching any deliverires. You research to find an estimate that 3% of your county is currently infected with COVID and use that to assign your delivery driver a 3% chance of carrying COVID. You then find research that suggests the probability of you getting COVID from touching the same surface as an infected individual is 5%. Thus, you assign a prior probability of (0.05)(0.02) = 0.001, or 0.1%.

  • \(P(test\:result|COVID^+)\): In this scenario, this value is the probability that you would test positive given that you have COVID. This probability is called the “sensitivity” of the test. Digging into the research again, you find a study that estimates a false negative rate of 2% for the PCR COVID tests. If the false negative rate is 2%, that means that the true positive rate is 98%. If 2% of people who have COVID return a negative (falsely), the other 98% must positive (accurately).

  • \(P(test\:result|COVID^-)\): The probability that you would test positive given that you do not have COVID. This probability is called the “false positive rate” of the test. The same study that you used to find the test sensitivity estimates the true negative rate to be 95%. You find that an estimated false positive rate of 0.5% for PCR tests.

Plugging these numbers into Bayes’ Theorm Formula, you have:

\[\small \begin{split} P(COVID^+|test^+) & = \frac{P(COVID^+)*P(test^+|COVID^+)}{P(test^+)} \\ & = \frac{P(COVID^+)*P(test^+|COVID^+)}{p(test^+|COVID^+)*P(COVID^+) + p(test^+|COVID^-)*P(COVID^-)} \\ & = \frac{(0.001)(0.98)}{(0.98)(0.001) + (0.005)(0.999)} \\ & = 0.164 \end{split} \]

With this information, you determine that based on the evidence you have, there is only a 16.4% chance that you have COVID.

I made a simple app using RShiny that calculates these probabilities based on adjustable inputs. Check it out here.

Another Interesting Bayesian Message

In scenario 2 above, you conclude that the test result is more likely a false positive than a true positive despite the 0.5% false positive rate. Applying this to a larger scale, let’s suppose that a hospital implements a large-scale testing program where each of their 2,000 employees where each is tested weekly for COVID. They strike a deal with a PCR testing company that produces a test with a reported false-positive rate of 0.1% and a false-negative rate of 3% (note that we don’t have great estimates for the true rates for these tests). The COVID case rate in the area is low, and the previous COVID-monitoring program suggested that very few workers had been infected so far over the course of the pandemic. In other words, their COVID-prevention protocols seem to be working.

Assuming that each worker takes similar precautions, we can assign each of the 2,000 the same underlying (prior) probability of being infected with COVID each week. Since their protocols have been effective, and other evidence suggests the employees are much more careful than the average person, we approximate that this probability is 0.12%. We now have all the pieces of Bayes’ Theorem, which we can use to estimate that the posterior probability of COVID infection for an employee that tests positive is only 0.371.

\[\small \begin{split} P(COVID^+|test^+) & = \frac{P(COVID^+)*P(test^+|COVID^+)}{P(test^+)} \\ & = \frac{P(COVID^+)*P(test^+|COVID^+)}{p(test^+|COVID^+)*P(COVID^+) + p(test^+|COVID^-)*P(COVID^-)} \\ & = \frac{(0.0012)(0.97)}{(0.97)(0.002) + (0.0012)(0.998)} \\ & = 0.371 \end{split} \]

This result means that almost 2/3 of employees that return positive tests each week do not actually have COVID. Since a positive test, regardless of the prior probability, likely means two weeks away from work for the potentially infected individual and their close contacts, this can have some harmful implications for the hospital’s operations. On the other hand, the increased chance of catching the two true positives may be worth the consequences.

Regardless, this example illustrates the risk of over-testing a group of individuals that have a very low prior probability of being infected. If the underlying case rate is low, most positives are actually false, even if the false positive rate is small.

Closing Remarks

In reality, every individual either does or doesn’t have COVID. The probability that we are calculating is simply our best guess for the likelihood that someone has COVID based on the evidence that has been collected on the matter*. This probability can prove to be quite valuable when it is used to informing decision making around the disease. Remember, though, that the cost of spreading COVID is extremely high, so it best practice to stick to the side of caution. If there is even a small chance that you have COVID, quarantine and follow CDC guidelines!

*If we wanted to be even more robust in our application of Bayes’ Theorem to COVID, we would use probability densities to capture the uncertainty at play in our determination of priors. Maybe in a future post.