What's the probability you have cancer if you are a smoker?
Actually
nobody can answer that question exactly, but it is a special case of
the crucial general problem of how we update our beliefs when
we
discover new evidence. And to do that properly you need Bayes
Theorem, which we explain in purely lay terms here and mathematically
here (we strongly
recommend you read the lay introduction before proceeding).
You have some prior belief about
something (let's call it A)
being true or not. For example, A
might be your belief about a randomly selected person having cancer, or
it might be your belief about Spurs winning the FA Cup next year.
Now you find out some new piece of information (let's call it
B) which is
relevant to your belief. In the cancer example, B might be the
information that the selected person is a smoker; in the
Spurs example B might
be the information that Spurs' star player will be unavailable for the
next year due to injury.
In the cancer case you feel intuitively that you need to revise your
prior upwards (i.e. the probability should increase) and in the Spurs
case you
need to revise your prior downwards (i.e. the probability should
decrease). But by how much in each case?
Let's write the prior as P(A)
meaning "the probability A is true".
What we want is a way to calculate "the probability of A is true given
that B is
true" - which we write as P(A|B). This
is also called the posterior
belief.
Bayes
Theorem is
a formula for calculating the posterior from
the prior. It involves finding the probabilities of two other
statements:
One of these
is your prior belief about B,
i.e. P(B).
In the cancer case P(B)
is simply the proportion of the relevant population who are
smokers.
The other is
what is called the likelihood,
namely the probability of observing the evidence given that the
original statement is true. In other words this is P(B|A); in the cancer
case, this is the probability that a person is a smoker if we know they
have cancer.
We can find out P(B|A) simply by
finding the
proportion of known cancer patients
who are smokers.
Now Bayes theorem is simply the following fomula
So we get our posterior P(A|B) by multiplying
the prior
P(A) with the
likelihood p(B|A) and dividing by P(B).
So, in the cancer case, suppose
the relevant population is the set of people coming into a chest
clinic. Suppose that Ricky comes into the clinic for the first time.
Our
prior that Ricky
has cancer, P(A),
will typically be based on data from the
clinic. If 10% of people who registered with the clinic have been
diagnosed with cancer then our prior P(A)=0.1.
We should also know the proportion of registered patients who are
smokers. Suppose it is 50%. Then P(B)=0.5.
Finally,
suppose we know that 80% of patients diagnosed with cancer are smokers.
Then P(B|A)=0.8. So, using Bayes Theorem
we can compute the posterior P(A|B) as
P(A|B) = (0.8 x
0.1) / 0.5 = 0.16
Thus, if we
discover that Ricky
is a smoker our belief in Ricky having cancer
increases from 0.1 to 0.16. This is not a dramatic
increase. If you had to put a bet on it, you still wouldn’t
bet
that that Ricky has cancer without any other evidence. In
practice
the results of diagnostic tests will provide further evidence for which
you can use Bayes Theorem to revise your belief again.
You can download this model
(right click and save as) and then open it in the AgenaRisk tool to
see this example running.
The great thing about Bayes Theorem is that it can also be used in
those
very common cases where you don't have much statistical data, but you
do have subjective judgements (possibly of
experts).