{{{begin-summary}}}
-
$P(a|b) = P(b|a)P(a)/P(b)$ is Bayes’ formula (“Bayes’ rule”, “Bayes’ theorem”); it is just a rewrite of the rules of probability. It is required that$P(b) ≠ 0$ . - Sometimes, we only want to know if
$P(h_1|e) > P(h_2|e)$ (probability of hypothesis 1 is greater than probability of hypothesis 2, given the evidence). Then we only have to compare$α P(e|h_1)P(h_1)$ vs.$α P(e|h_2)P(h_2)$ , where$α = 1/P(e)$ , which we never need to calculate. -
$P(h)$ is the “prior” of a hypothesis (cause/explanation)$h$ . -
$P(h|e)$ is the “posterior” of$h$ , given evidence$e$ is observed.
{{{end-summary}}}
Imagine building an expert system for medical diagnosis. You may include a rule like,
hasToothache(X) :- hasCavity(X).
The problem is that not every toothache is caused by a cavity. You may expand it thus,
hasToothache(X) :- hasCavity(X).
hasToothache(X) :- hasGumDisease(X).
hasToothache(X) :- hasAbscess(X).
hasToothache(X) :- hadTeethDrilledByAliens(X).
...
Now there are three different possible causes of the toothache. Yet still, some are missing. And cavities do not always cause toothaches. And a person may have both a cavity and an abscess. How do we deal with all these qualifications?
One answer is to use probabilistic reasoning. We will be able to say that cavities cause toothaches only some percentage of the time, and furthermore that having both a toothache and red, swollen gums makes gum disease more likely and a cavity less likely (observing swollen gums counts against the cavity diagnosis).
Russell and Norvig (from the textbook) provide three good reasons why we might choose to use probabilistic reasoning rather than logic-based reasoning for the medical domain:
- Laziness: It is too much work to list the complete set of antecedents or consequents needed to ensure an exceptionless rule and too hard to use such rules.
- Theoretical ignorance: Medical science has no complete theory for the domain.
- Practical ignorance: Even if we know all the rules, we might be uncertain about a particular patient because not all the necessary tests have been or can be run.
We’ll use propositional logic to represent we can be true or
false. Then, with the
Notation | Meaning |
---|---|
The probability that |
|
The probability that both |
|
The probability that |
|
The probability that |
|
Rule | Explanation |
A probability is always between |
|
The probability of something being true and the probability of the opposite add up to |
|
The probability of two statements being true simultaneously equals the probability that one is true, assuming the other already is known to be true, times the probability that the other is true (i.e., no longer assuming it is). | |
The probability of either of two statements being true equals the sum of the probabilities that either is true separately minus the probability they are both true simultaneously. |
Sometimes, like in medical diagnosis, we want to think about the propositions as events or causes. For example,
Proposition | Interpretation |
---|---|
This person has a toothache. | |
This person has a cavity. | |
This person has gum disease. |
We can specify how diseases cause symptoms:
This graph shows us that having a cavity somehow influences the chance that a toothache is also present. This is what we expect (and that’s why I put the arrows in the graph).
This means that it should be the case that,
Suppose that… | Interpretation |
---|---|
Knowing that a person has a cavity changes the probability that the person has a toothache. |
On the other hand, consider,
Suppose that… | Interpretation |
---|---|
Knowing that a person has a cavity does not change the probability that the person has red hair. |
We will say that
Let’s flesh out the probabilities for the toothache:
Probability of |
|||
---|---|---|---|
|
|||
… | … | (just |
We’ll also need to know the chance of having a cavity and, separately, the chance of having gum disease:
$P(c) = 0.10$ $P(g) = 0.05$
To calculate
\begin{equation}
\begin{aligned}
P(t) =& P(t|c ∧ g)P(c ∧ g)
&+ P(t|c ∧ ¬ g)P(c ∧ ¬ g) \
&+ P(t|¬ c ∧ g)P(¬ c ∧ g) \
&+ P(t|¬ c ∧ ¬ g)P(¬ c ∧ ¬ g) \
=& P(t|c ∧ g)P(c)P(g) \
&+ P(t|c ∧ ¬ g)P(c)P(¬ g) \
&+ P(t|¬ c ∧ g)P(¬ c)P(g) \
&+ P(t|¬ c ∧ ¬ g)P(¬ c)P(¬ g) \
=& 1.0*0.10*0.05 \
&+ 0.6*0.10*(1.0-0.05) \
&+ 0.3*(1.0-0.10)*0.05 \
&+ 0.05*(1.0-0.10)*(1.0-0.05) \
=& 0.11825
\end{aligned}
\end{equation}
If our tables are true, then the chance of some random person having a toothache, assuming you know nothing about their dental history, is 11.8%.
We can derive some other rules using just a little algebra:
Derived | Notes |
---|---|
Due to normal Boolean logic rules. | |
Due to rule regarding |
|
Bayes’ formula. Of course, it must be that |
That last derivation is especially interesting to us. Somebody else thought so, too (from Wikipedia):
Why is it interesting? Think about the medical diagnosis problem again.
Probability | Interpretation |
---|---|
If someone has a cavity and no gum disease, there is a 60% chance they have a toothache. | |
If someone has gum disease and no cavities, there is a 30% chance they have a toothache. | |
If somebody has a toothache, what is the chance they have a cavity? |
This last row in the table is a very important question. It’s asking us to determine the cause, given the effect.
Here is the calculation, given by Bayes’ formula:
\begin{equation}
\begin{aligned}
P(c|t)
=& P(t|c)P(c) / P(t)
=& (P(t|c ∧ g)P(g) + P(t|c ∧ ¬ g)P(¬ g))P(c) / P(t)
\
=& (1.0*0.05 + 0.6*(1.0-0.05))*0.10 / 0.11825 \
=& 0.5243
\end{aligned}
\end{equation}
Suppose we want to compare this probability with
Then we really want to know whether or not,
\begin{equation} P(c|t) > P(g|t) ≡ \frac{P(t|c)P(c)}{P(t)} > \frac{P(t|g)P(g)}{P(t)} ≡ P(t|c)P(c) > P(t|g)P(g) \end{equation}
Notice the common term,
Thus, we often write
\begin{equation}
P(c|t) > P(g|t) ? \quad
\begin{aligned}
P(c|t) =& α P(t|c)P(c)
P(g|t) =& α P(t|g)P(g)
\end{aligned}
\end{equation}
Suppose have the following causal graph:
We’ll need to give the conditional probabilities of all the nodes with
parents, and the unconditional (a priori) probabilities of the node
without a parent (
Table for
Probability of |
|||
---|---|---|---|
Table for
Probability of |
||
---|---|---|
Table for
Probability of |
|
---|---|
Now, what is
\begin{equation}
\begin{aligned}
P(z ∧ ¬ y ∧ x)
=& P(z | ¬ y ∧ x)P(¬ y ∧ x)
=& P(z | ¬ y ∧ x)P(¬ y | x)P(x) \
=& 0.4 * (1.0-0.3) * 0.4 \
=& 0.112
\end{aligned}
\end{equation}
What if we don’t know (or care) about the value of
\begin{equation}
\begin{aligned}
P(z ∧ ¬ y)
=& P(z ∧ ¬ y | x)P(x) + P(z ∧ ¬ y | ¬ x)P(¬ x)
=& P(z | ¬ y ∧ x)P(¬ y | x)P(x) + P(z | ¬ y ∧ ¬
x)P(¬ y | ¬ x)P(¬ x) \
=& 0.4 * (1.0 - 0.3) * 0.4 + 0.9 * (1.0 - 0.1) * (1.0 - 0.4) \
=& 0.598
\end{aligned}
\end{equation}
Here’s another example. This one models the causes of a possible report of a fire alarm and a possible report of smoke.
Table for
true | true | |
true | false |
Table for
true | true | |
true | false |
Table for
true | true | true | |
true | true | false | |
true | false | true | |
true | false | false |
Table for
true |
Table for
true |
Table for
true | true | |
true | false |
Now, suppose there is a fire and the alarm was not tampered with. What is the probability that somebody will report a fire? Notice that people have to leave the building before somebody will report the fire.
Let,
-
$report=T ≡ r$ ,$report=F ≡ \bar{r}$ -
$leaving=T ≡ l$ ,$leaving=F ≡ \bar{l}$ -
$alarm=T ≡ r$ ,$alarm=F ≡ \bar{r}$ -
$tampering=T ≡ t$ ,$tampering=F ≡ \bar{t}$ -
$fire=T ≡ f$ ,$fire=F ≡ \bar{f}$
Ok, here we go!
\begin{equation}
\begin{aligned}
P(r | \bar{t} ∧ f)
=& P(r | l ∧ \bar{t} ∧ f)P(l | \bar{t} ∧ f)
&+ P(r | \bar{l} ∧ \bar{t} ∧ f)P(\bar{l} | \bar{t} ∧ f) \
=& P(r | l ∧ \bar{t} ∧ f)(P(l | a ∧ \bar{t} ∧ f)P(a | \bar{t} ∧ f) + P(l | \bar{a} ∧ \bar{t} ∧ f)P(\bar{a} | \bar{t} ∧ f)) \
&+ P(r | \bar{l} ∧ \bar{t} ∧ f)(P(\bar{l} | a ∧ \bar{t} ∧ f)P(a | \bar{t} ∧ f) + P(\bar{l} | \bar{a} ∧ \bar{t} ∧ f)P(\bar{a} | \bar{t} ∧ f)) \
=& P(r | l)(P(l | a)P(a | \bar{t} ∧ f) + P(l | \bar{a})P(\bar{a} | \bar{t} ∧ f)) \
&+ P(r | \bar{l})(P(\bar{l} | a)P(a | \bar{t} ∧ f) + P(\bar{l} | \bar{a})P(\bar{a} | \bar{t} ∧ f)) \
=& 0.75*(0.88*0.99 + 0.0*0.01) + 0.01*(0.12*0.99 + 1.0*0.01) \
=& 0.655
\end{aligned}
\end{equation}
Wow, that was painful. It could be much worse. We didn’t even uses Bayes’ formula in that derivation, because we never needed to “reverse” the causal arrows in the graph.
If we asked instead, “what is the chance there is a fire when somebody reports a fire?” then we would need Bayes’ formula.
Let’s use some software to perform these calculations for us. Visit AISpace, specifically the Belief and Decision Networks page. Download the Java program.
Start it up, and you see this:
Click “File > Load Sample Problem” and choose “Fire Alarm Belief Network.” Now you have this:
Click the “Solve” tab. Choose the “Make Observation” tool button. Then, click the “tampering” node and choose “F” (false):
Do the same for the “fire” node but select “T” (true).
We have set the assumed/observed events. Now we want to know, what is
the chance of a report of a fire? I.e., we want to know,
Click the “Query” tool and then click the “report” node. Select “Brief”.
You get
Next, we’ll ask “how likely is the observation?” I.e., what is
This is easy. After the observations have been set, just click the “P(e) Query” tool (which means “probability of the evidence” a.k.a. the observations).
So, it seems
Finally, we’ll ask for a Bayesian inference. Clear the observations (set the observations to “<none>” for the “tampering” and “fire” nodes).
Then, make a “T” observation for “report.” Then query the “fire” node.
It seems that the chance of a fire if somebody reports it is 23.7%.