Chapter 1: Plausible Reasoning

  • Extending Deductive Logic to Incomplete Information
    • Jaynes reinterprets probability as an extension of deductive logic to cases of incomplete information.
    • Whereas a classical syllogism (a type of logical argument that uses deductive reasoning to arrive at a conclusion based on premises, where the conclusion logically follows from the premises) goes: \(\text{If } P \text{ then } Q, P \implies Q\)
    • In plausible reasoning we observe Q and infer that P is made more likely — but not proven — by that evidence.
    • Example:
      • A police officer sees a masked man near a broken window.
      • Although many explanations are possible, the observation increases the plausibility that the man is a thief.
      • This isn’t a strict logical deduction but a hypothesis update based on background experience.
    • This is huge because it transforms how we approach uncertainty. Instead of being stuck with a binary true/false framework of traditional deductive logic, Jaynes’s reinterpretation bridges the gap between certainty and uncertainty.
      • Traditional syllogisms guarantee a conclusion if the premises are met. However, in the real world we rarely have complete information.
      • By observing Q and inferring that P becomes more likely (though not proven), we move from rigid deduction to flexible, evidence-based reasoning.
    • Additionally, propositional logic can be used as the building block
    • And plausibility can be quantitatively represented
      • Jaynes asserts that plausibility should be represented by a real number.
      • A higher number indicates a greater degree of belief in the truth of a proposition.
  • Desiderata for Consistent Reasoning
    • Jaynes proposes an imaginary robot whose brain is designed to follow a set of carefully chosen rules of plausible reasoning, derived from fundamental desiderata expected in human cognition.
    • Its brain is to be designed by us, so that it reasons according to certain definite rules.
    • These rules will be deduced from simple desiderata which would be desirable in human brains; i.e. a rational person, on discovering that they were violating one of these desiderata, would wish to revise their thinking.
    • Jaynes introduces a set of criteria or desiderata that any such robot must satisfy:
      1. Degrees of plausibility are represented by real numbers
        • Plausibilities are real numbers to allow a clear, unambiguous comparison (e.g., “more plausible” means numerically larger).
      2. Qualitative correspondence with common sense
        • If additional information makes A more plausible, then for any proposition B (whose plausibility remains unchanged), the joint plausibility should not decrease:
        • If we update information (C) to (C’) in such a way that the plausibility for (A) is increased:
          $$ (A \mid C') > (A \mid C) $$

          but the plausibility for (B) given (A) remains unchanged:

          $$ (B \mid A C') = (B \mid A C) $$

          then this update can only produce an increase (never a decrease) in the plausibility that both (A) and (B) are true:

          $$ (AB \mid C') \geq (AB \mid C) $$

          and it must produce a decrease in the plausibility that (A) is false:

          $$ (\overline{A} \mid C') < (\overline{A} \mid C) $$
      3. Consistent Reasoning:
        1. If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.
        2. The robot always takes into account all of the evidence it has relevant to a question.
          • It does not arbitrarily ignore some of the information, basing its conclusions only on what remains.
          • In other words, the robot is completely nonideological.
        3. The robot always represents equivalent states of knowledge by equivalent plausibility assignments.
          • If in two problems the robot’s state of knowledge is the same (except perhaps for the labeling of the propositions), then it must assign the same plausibilities in both.
  • Mind Projection Fallacy
    • Jaynes warns against confusing the properties of our knowledge with properties of the world.
    • When we say “the probability of event X is p,” we are expressing our state of knowledge rather than a physical attribute of the event itself.
    • Many debates in probability stem from conflating epistemological uncertainty (our lack of complete information) with ontological indeterminacy (the inherent randomness of nature).

Chapter 2: The quantitative rules

  • Jaynes derives the product rule and the sum rule by manipulating syllogisms - rules we take as axioms when we first encounter probability.
  • (Forgive me for any latex/formatting crimes I commit)

Product Rule

Consider the expression \((AB \mid C).\) This can be decomposed in two natural ways:

  1. Is \(B\) true given \(C\)? (i.e., \(B \mid C\)) and then, given \(B\), ask: Is \(A\) true? (i.e., \(A \mid BC\))
  2. Alternatively, is \(A\) true given \(C\)? (i.e., \(A \mid C\)) and then, given \(A\), ask: Is \(B\) true? (i.e., \(B \mid AC\))

Thus, assume there exists a function (F) such that \((AB \mid C) = F\bigl[B \mid C,\, A \mid BC\bigr],\) and equivalently, \((AB \mid C) = F\bigl[A \mid C,\, B \mid AC\bigr].\)

Let \(u = (AB \mid C), \quad x = B \mid C, \quad y = A \mid BC.\) Substituting these into (F) leads to the Associativity Equation: \(F\bigl[F(x,y),z\bigr] = F\bigl[x,F(y,z)\bigr].\)

It turns out that the general solution can be expressed in terms of a continuous, monotonic function (w) as \(F(x,y) = w^{-1}\bigl(w(x)w(y)\bigr).\) Substituting this form into the associativity condition yields \(w^{-1}\Bigl(w\bigl(F(x,y),z\bigr)\Bigr) = w^{-1}\Bigl(w\Bigl(w^{-1}\bigl(w(x)w(y)\bigr)w(z)\Bigr)\Bigr) = w^{-1}\bigl(w(x)w(y)w(z)\bigr).\)

Thus, we arrive at the following expressions:

$$ \begin{aligned} AB \mid C &= w'\bigl(w(B \mid C),\, w(A \mid BC)\bigr) \quad (i)\\ BA \mid C &= w'\bigl(w(A \mid C),\, w(B \mid AC)\bigr) \quad (ii)\\[6pt] \end{aligned} $$

Equating (i) and (ii) gives:

$$ \begin{aligned} w(AB \mid C) &= w(B \mid C)\, w(A \mid BC) \;=\; w(A \mid C)\, w(B \mid AC) \end{aligned} $$

This is starting to look familiar to us, except that we see \(w\) instead of \(P\) or \(p\).

Since \(w\) transforms our plausibility levels, it’s helpful to determine what the range of our scale is.

If \(A\) is certain given \(C\):

  • \[AB \mid C = B \mid C\]
  • \[A \mid BC = A \mid C\]

So, \(w(AB \mid C) = w(B \mid C)\,w(A \mid BC) \;\;\Longrightarrow\;\; w(B \mid C) = w(B \mid C)\,w(A \mid C) \\ \;\;\Longrightarrow\;\; w(A \mid BC) = 1 \;\;\Longrightarrow\;\; w(A \mid C) = 1\)

If \(A\) is impossible:

  • \[AB \mid C = A \mid C\]
  • \[A \mid BC = A \mid C\]

Thus, \(w(AB \mid C) = w(B \mid C)\,w(A \mid BC) \;\;\Longrightarrow\;\; w(A \mid C) = w(B \mid C)\,w(A \mid C) \\ \;\;\Longrightarrow\;\; w(A \mid C) = 0 \, \text{or} +\infty\)

To accommodate \(+\infty\), we can transform \(w\) into \(1/w\), so that the scale becomes \(0\) to \(1\) again.

Sum Rule

Consider \(A\) and \(\overline{A}\). Intuitively, the plausibility of \(\overline{A}\) should depend on the plausibility of \(A\). So we can guess that there is a function \(S\) such that: \(w(\overline{A} \mid C) \;=\; S\bigl(w(A \mid C)\bigr).\)

From the condition \(w(\overline{\overline{A}} \mid C) = w(A \mid C)\), we get: \(S\bigl(S(x)\bigr) = x, \quad S(0) = 1, \quad S(1) = 0.\)

Also:

$$ \begin{aligned} w(AB \mid C) &= w(A \mid C)\,w(B \mid A C) \quad (i)\\ w(A \overline{B} \mid C) &= w(A \mid C)\,w(\overline{B} \mid A C) \quad (ii)\\[6pt] \end{aligned} $$

Substituting in \((i)\) and \((ii)\),

$$ \begin{aligned} w(AB \mid C) &= w(A \mid C)\,w(B \mid A C) \quad\\ &= w(A \mid C)\, S\!\bigl[w(\overline{B} \mid A C)\bigr] \\ &= w(A \mid C)\, S\!\Bigl[\frac{w(A \overline{B} \mid C)}{w(A \mid C)}\Bigr] \quad (iii)\\ \end{aligned} $$

Equivalently,

$$ \begin{aligned} w(AB \mid C) &= w(B \mid C)\, S\!\Bigl[\frac{w(B \overline{A} \mid C)}{w(B \mid C)}\Bigr] \quad (iv)\\ \end{aligned} $$

Now, Let \(B = \overline{AD}\) for some \(D\), which gives us:

$$ \begin{aligned} A \overline{B} &= A \overline{(\overline{AD})} \\ &= A (AD) \\ &= AD \\ &= \overline{B} \quad \quad \quad \quad \quad \quad (v)\\ \end{aligned} $$

and,

$$ \begin{aligned} B \overline{A} &= (\overline{AD}) (\overline{A}) \\ &= (\overline{A} + \overline{D}) (\overline{A}) \\ &= \overline{A} + (\overline{A})(\overline{D}) \\ &= \overline{A}(1 + \overline{D}) \\ &= \overline{A} \quad \quad \quad \quad \quad \quad (vi)\\ \end{aligned} $$

Define \(x = w(A \mid C)\) and \(y = w(B \mid C)\). Then using \((iii)\) and \((iv)\):

\[x \, S\!\Bigl( y \,\frac{S(x)}{x} \Bigr) \;=\; y \, S\!\Bigl( \frac{S(x)}{y} \Bigr) \quad \text{for all } x,y.\]

The general solution to this problem turns out to be \(S(x) = (1 - x^{m})^{(1/m)}\) for some constant \(m\).

Since \(w(\overline{A} \mid C) \;=\; S\bigl(w(A \mid C)\bigr)\),

$$ \begin{aligned} w(\overline{A} \mid C) &= (1-(w(A \mid C))^{m})^{(1/m)} \end{aligned} $$

which gives us,

$$ \begin{aligned} (w(A \mid C))^{m} + (w(\overline{A} \mid C))^{m} &= 1 \end{aligned} $$

The ‘p’

Recapping the product and sum rule,

$$ \begin{aligned} w(AB \mid C)^{m} &= w(A \mid C)^{m}\, w(B \mid AC)^{m} \quad \quad \quad (i) \\ \end{aligned} $$
$$ \begin{aligned} (w(A \mid C))^{m} + (w(\overline{A} \mid C))^{m} &= 1 \quad \quad \quad \quad \quad \quad (ii) \\ \end{aligned} $$

Let \(p = w([any \, statement \mid condition])^{m}\), and you finally get the probability equations we’re familiar with.