![]() |
in partnership with | ![]() |
| Providing Outreach in Computer Science | Bringing Bayesian solutions to real-world risk problems |
In the section on confidence intervals we considered the classic problem of using sampling to make some inferences about an unknown population parameter, such as the percentage of people who support a particular candidate in a forthcoming election. Thus we saw that, frequentist statisticians use the confidence interval approach involving statements like:
Statement A: Support for candidate Joe Bloggs now stands at 43%. The margin of error is plus or minus three percent, with confidence at the 95% level.
We explained in <link> that statement A is NOT the same as the following statement
Statement B: There is a 95% probability that support for candidate Joe Bloggs lies between 40 and 46%.
even though most lay people assume it is.
As we saw it turns out that the true meaning of statement A was complicated and unnatural and that its complication arises from the fact that frequentists cannot speak about the probability of unknown parameters. So, whereas most people assume that a confidence interval is a statement about the probability that the population percentage lies within certain bounds, it turns out not to be the case at all, but something much more complex.
The notion of the probability of an unknown parameter is,
however, natural, in the Bayesian approach. Statement B is exactly the kind of
statement the Bayesian approach yields; such statements are much more
meaningful and easier to understand.
In the Bayesian approach we start with some prior
assumptions about the population proportion, which we express as a probability
distribution. These assumptions might range from total ignorance – any value
between 0 and 100 is equally likely (this is referred to as a uniform[0,100]
distribution) – to something quite specific, like the population proportion has
a mean of 30 with a variance of 50. Example assumptions are shown in FIG.
|
prior:
Uniform across the whole range 0,100 |
prior:
Normal with mean 30, variance 50 |
|
prior:
Uniform across the range 20 to 80 |
prior:
beta with alpha parameter 4, beta parameter 10 |
In the Bayesian approach we simply revise the probability of the unknown population proportion when we observe data, using Bayes Theorem. In practice we do this without having to do any calculations manually by using a Bayesian net tool. As shown in <FIG>

The number observed is defined as a Binomial distribution where n is the sample size and p is the population proportion divided by 100.
When we enter data about a sample the Bayesian net recalculatres the distribution of the (unknown) population proportion as shown here:

In this example we started with the assumption of total ignorance (uniform[0,100] prior) but note how the population proportion changes to something that looks like a Normal distribution with a mean 30. The lines on the graph show the 2.5 and 97.5 percentiles, so that 95% of the distribution lies between these lines. In this case the left hand percentile is about 18 and the right hand about 42. This means that there is a 95% probability the population mean lies within 30 plus or minus 12.
The population distribution resulting from alternative priors and sample data is shown here:
|
A: n=200, observed 60 (with Uniform[0,100] prior) |
B: n=1000, observed 300 (with Uniform[0,100] prior) |
|
C: n=50, observed 15 (with Normal[30,50] prior) |
D: n=50, observed 15 (with Beta{4,10] prior) |
|
E: n=50, observed 15 (with Normal[60,50] prior) |
F: n=1000, observed 300 (with Normal[60,50] prior) |
In each case we have assumed that the sample proportion is 30%. Some key things to note about these results are: