Stuart Truax, 2022-06
This repository contains a detailed explanation and dynamical simulation of the free energy principle (FEP). There are two components to the repo:
- An explanation and informal derivation on the FEP, based on [1].
- A dynamical simulation of a coupled mechanical-electrochemical system (i.e. the "primordial soup") described in [1].
The simulation is here.
Some results from the primordial soup simulation:
The free energy principle (FEP) is a principle that describes how complex systems, through interaction with their surroundings, achieve non-equilibrium steady-states through the minimization of the number of internal states of the system [2]. The FEP utilizes the formalisms of random dynamical systems and Bayesian inference to describe how a complex system can converge to a steady state that is dependent upon feedback from the external environment. The convergence to the steady state is an example of variational inference (i.e. coverging to a probability distribution via an optimization procedure) .
The FEP is useful in describing the formation of self-organizing systems. A claimed consequence of the principle is that self-organization is an emergent property of any ergodic random dynamical system that possesses a Markov blanket [1].
Given some data
Assume the following definitions:
-
$p(\mathbf{x}, \mathbf{\theta})$ - The "true" yet intractable probability distribution, with random variable$\mathbf{x}$ and parameters$\mathbf{\theta}$ . -
$q(\mathbf{x}, \mathbf{\theta})$ - The variational estimate of$p(\mathbf{x}, \mathbf{\theta})$ .
The Kullback-Leibler (KL) divergence provides an immediate way of
evaluating the distance between the estimate
Evaluating this form of the KL divergence involves calculating the log
expectation of
In variational inference, one intentionally chooses
The denominator
A workaround for this is to instead use the quantity:
Using
The ultimate goal in variational inference is to find the probability
distribution
where
This section will introduce some of the theoretical concepts and structures necessary to understand the FEP and its governing dynamics. The exposition in this and the following section closely follows the exposition and notation found in [1], and uses its terminology.
We begin with a lemma [1], which is a bit informal with its notion of "structural and dynamical integrity", but it broadly states what the FEP seeks to demonstrate:
Lemma: Any ergodic random dynamical system that possesses a Markov blanket will appear to actively maintain its structural and dynamical integrity.
To add more detail to this statement, some informal definitions of ergodic systems and Markov blankets follow:
-
Ergodic system: A dynamical system for which the time average of a measurable function of the system converges almost surely over some finite time T.
Call
$p(\mathbf{x})$ the ergodic density of state$\mathbf{x}$ , which is the probability that the ergodic system is in state$\mathbf{x}$ . An ergodic system has the following property:$$ p(\mathbf{x}) = \text{average proportion of time spent in state } \mathbf{x}$$
-
Markov blanket: A Markov blanket of a random variable
$Y$ in a set of random variables$S = {X_0, ..., X_1}$ is a set$S' \subseteq S$ such that$S'$ contains at least all the information needed to infer$Y$ (i.e. the remaining random variables in$S$ are redundant for the purpose of inference of$Y$ ) [5] . In the context of a Bayesian network, the Markov blanket of a node$Y$ consists of:-
The parents of
$Y$ . -
$Y$ itself. -
The children of
$Y$ . -
The other parents of the children of
$Y$ .
-
With these informal definitions in hand, one can define an ergodic
random dynamical system
Let the state space of the system be partitioned into external, sensory, active, and internal states. These states and their dependencies are defined below. We shall refer to an "agent" as the portion of the system constituted by the internal states and its Markov blanket.
-
$\Omega$ Fluctation Sample Space A sample space from which random fluctuations are drawn (similar to the fluctuations of Boltzmann's microcanonical ensemble). -
$\Psi : \Psi \times A \times \Omega \rightarrow \mathbb{R}$ External (Hidden) States (i.e. of the world "outside the banket") that cause the sensory states of the agent and depend on actions by the agent. -
$S: \Psi \times A \times \Omega \rightarrow \mathbb{R}$ Sensory States, which are the agent's sensations, and constitute a probabilistic mapping from action and external states. -
$A: S \times \Lambda \times \Omega \rightarrow \mathbb{R}$ Action, an agent's action that depends on its sensory and internal states. -
$\Lambda : \Lambda \times S \times \Omega \rightarrow \mathbb{R}$ -Internal States (i.e. the states of the agent), which cause action and depend on sensory states$S$ . -
$p(\Psi,s,a, \lambda | m)$ - Generative density, a probability over external states$\psi \in \Psi$ , sensory states$s \in S$ , active states$a \in A$ and external states$\lambda \in \Lambda$ for a system$m$ . -
$q(\psi | \lambda)$ - Variational density , an arbitrary probability probability density over external states$\psi \in \Psi$ that is parameterized by internal states$\lambda \in \Lambda$ .
The Markov blanket that interests us is the one defined with respect to
the internal states
The coupling between the states of this system can be illustrated in the causality graph of Figure 1. The blue oval in Figure 1 denotes the Markov blanket with respect to the internal states (i.e. the "agent").
Figure 1. The casuality graph for the FEP. The state variables and their dynamics are represented by the nodes. The couplings between nodes are represented by the arrows. The Markov blanket of the internal state $\lambda$ is represented by the blue oval (i.e. it encompasses the sensory, active, and internal states).
The dynamics of the interaction between these states will be described in next section.
With the definition of the ergodic dynamical system
Let the dynamics of the system be generally defined as follows:
$$\dot{x} = \underbrace{f(x)}\text{a flow} + \underbrace{\omega}\text{random fluctuation} \tag{1} $$
wherein the flow can be decomposed into the following:
with each function
As stated previously, the system is ergodic, which implies that it will
eventually converge to a random global attractor. Define the ergodic
density
where
Since
where
-
$R(x)$ is an asymmetric matrix such that$R(x) = - R(x)^{T}$ -
$G(x)$ is a scalar potential called the "Gibbs energy"
Inserting (4) into (3) and using
some vector calculus identities, one finds that
Now the flow
which is just (4) with
With
(6) and (7) in hand, one can observe that the
flow is a gradient ascent (i.e. a positive gradient) of the log ergodic
density
At this point, one can invoke the ergodic theorem to deduce that for
any point
The application of the ergodic theorem yields the following:
In essence,
(10) and
(11) have shown that the time average of
the internal and active states passing through a point
Here, the structure
The important result of
(10) and
(11) is that the expectation over a
posterior probability
For subsequent derivations, Friston introduces a density to quantify these posterior beliefs :
Def.: Let
At this point, the Bayesian nature of the variational density
Up to this point, the "Gibbs energy'
Lemma: For any Gibbs energy
and furthermore:
Notice that in
(13) and
(12),
The equality in
(14) is equivalent to the variational free energy
The proof of (14) relies on breaking down the integral using Bayes' Rule to obtain:
where
To conclude, due to the ergodicity of the system, the free energy
That is, the time average of the free energy
Interpreting the individual terms of (14) in a variational inference context would yield:
$$F(s,a, \lambda) = \underbrace{E_{q}[G(\psi,s,a,\lambda)]}\text{Energy (accuracy of model)} - \underbrace{H[q(\psi | \lambda)]}\text{Entropy (complexity of model)} \tag{16}\label{eq:fe_expanded}$$
The first term (i.e. the "accuracy") of the equation is the cross
entropy of the distributions
The second term quantifies the amount of complexity (i.e. information) in the external states can be captured by an internal state. This quantity is sought to be maximized via the maximum entropy (MaxEnt) principle[7].
A further manipulation of (16) puts it into a more usable form. Expanding the first term yields:
which is the Kulbeck-Leibler divergence between
Expanding further, and using the chain rule of probability, (17) can be broken down into two more terms:
$$= \underbrace{-\text{log} (p(s | m))}\text{surprise} + \underbrace{D{KL} [ q(\psi | \lambda) || p(\psi | s,a, m) ]}_\text{divergence} \tag{18}$$
The first term (surprise) quantifies the degree to which a sensory state
The final statement of the principle is that:
$$\boxed{ \underbrace{-\text{log} (p(s,a | m))}\text{surprise} + \underbrace{D{KL} [ q(\psi | \lambda) || p(\psi | s, a, m) ]}\text{divergence} \geq \underbrace{-\text{log} (p(s,a | m))}\text{surprise}} \tag{19}$$
Some informal and general results of the FEP follow:
For an ergodic dynamical system
-
The dynamics of the system causes the internal states to perform Bayesian inference on their surroundings [8]. This takes the form of encoding beliefs about the external states of the system.
-
The entropy over
$p(\lambda)$ (i.e. complexity of the internal states) will be maximized, but limited by the action of the flow$f$ . That is, an agent's internal states (and therefore beliefs) will not become disproportionately complex relative to its surroundings. -
The surprise of an agent is limited by distance between an agent's beliefs
$q(\psi|\lambda)$ and "reality"$p(\psi)$ .
Friston concludes that biological systems contain the following universal properties as a result of the FEP[1]:
-
Ergodicity
-
Possession of a Markov Blanket
-
Engagement in Active Inference
-
Autopoiesis (i.e. the maintenance of structural integrity through the creation and regeneration of oneself)
[1] K. Friston,"Life as we know it," J. of the Royal Society Interface, 10(86):20130475, 2013
[2] Free Energy Principle, Wikipedia
https://en.wikipedia.org/wiki/Free_energy_principle
[3] K. P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012, Section 21.2
[4] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, Chapters 18 and 19
[5] Markov blanket, Wikipedia
https://en.wikipedia.org/wiki/Markov_blanket
[6] Cross Entropy, Wikipedia
https://en.wikipedia.org/wiki/Cross_entropy
[7] Principle of maximum entropy, Wikipedia
https://en.wikipedia.org/wiki/Principle_of_maximum_entropy
[8] M. Aguilera, B. Millidge, A. Tschantz, C.L. Buckley, "How particular is the physics of the free energy principle?," Physics of Life Reviews_, vol. 10, pp.24-50, 2022
Footnotes
-
In statistical mechanics, the partition function $Z$ is often given in terms of the system Hamiltonian $H$, which for many systems can be derived analytically and often lends itself to calculable integrals. In the context of inference, the partition function instead requires integrating over all possible hypotheses $\mathbf{x}$ consistent with $\mathcal{D}$, often making the integral intractable. ↩