CPNS Lab
Computational Psychiatry & Neuropharmacological Systems (CPNS) · University of Exeter

Demo

Polyphonic Active Inference Gridworld

This is a toy environment where behaviour emerges from the negotiation of multiple concurrent objectives (“voices”) rather than a single dominating controller. Each voice runs a short-horizon rollout, proposes an action distribution, and a polyphonic coordination layer integrates them into the final move.

Tip: if the video doesn’t autoplay on mobile, tap once — it’s playsinline.

The heatmap shows the agent’s belief about hazard probability in each cell (not ground truth). Bars show each voice’s action posterior \(q_k(a)\), and the mixture weights \(\pi\) show which voice is currently most influential in the negotiated action.

World legend (how to read the map)

  • Coloured cells (heatmap): the agent’s belief about hazard probability in each location (blue ≈ low risk, yellow ≈ high risk).
  • Black squares: walls — impassable structure that both the real agent and its internal rollouts must respect.
  • Gold square: the current goal location. When reached, a new goal is spawned elsewhere.
  • Purple pentagons: charging stations that replenish the agent’s battery.
  • White circle: the agent’s current position.
  • White arrow: the action selected after polyphonic integration (the negotiated move).
  • Crosses (optional): true hazards, only shown when ground truth is revealed for debugging.

What you’re seeing (at a glance)

  • Belief map: \(p(\text{hazard}\mid \text{data})\) per cell, updated from local, noisy cues.
  • Walls: hard constraints; rollouts respect them (the agent can’t “imagine” walking through walls).
  • Goal: respawns after capture so the episode keeps generating interesting trade-offs.
  • Chargers: replenish battery; low battery increases pressure from the energy voice.
  • Action bars: one \(q_k(a)\) per voice plus the final mixture \(q(a)\).
  • \(\pi\) weights: “who’s winning the argument” at this moment — but with anti-dominance safeguards.

The four voices (conceptually)

Each voice is a small generative model + preference set. It scores short rollouts using an EFE-like objective (risk/utility, exploration, resource constraints), then converts rollout scores into an action distribution.

  1. Safety: risk-sensitive control: avoids states believed to be hazardous.
  2. Goal: pragmatic control: reduces distance-to-goal while respecting risk.
  3. Epistemic: exploration pressure: seeks uncertainty reduction and novelty.
  4. Energy: homeostatic control: maintains battery; prioritises chargers when reserves drop.

Active inference ingredients (non-domain-specific)

The agent does three things repeatedly: (i) infer hidden state (here: where hazards likely are), (ii) look ahead a few steps (rollout) using its transition model, and (iii) choose actions that minimise an expected “surprise/cost” functional.

In many active-inference demos, these ingredients are wrapped into a single objective. Here, we keep multiple objectives explicit by making them separate voices that run in parallel.

In code terms: per voice → rollout → EFE-like score → softmax → \(q_k(a)\).

Polyphonic coordination (the key idea)

Each voice proposes a distribution over actions \(q_k(a)\). The polyphonic layer keeps a running (leaky) evidence score for each voice and converts that into mixture weights \(\pi_k\). The negotiated action posterior is:

\[ q(a) \;=\; \sum_{k=1}^{K}\pi_k\,q_k(a) \]

Intuition: “do a weighted average of proposed actions” — but the weights evolve online.

A minimal version of the weight dynamics is:

\[ \ell_k(t) \leftarrow \rho\,\ell_k(t-1) + \underbrace{(-G_k^\*)}_{\text{evidence}} \;-\; \lambda\,\underbrace{\Omega_k}_{\text{anti-dominance / compatibility}} \] \[ \pi_k \;=\; \mathrm{softmax}\!\big(\gamma\,\ell_k\big), \quad \pi \leftarrow (1-\varepsilon)\pi + \varepsilon\,\pi_{\mathrm{floor}} \]

Here \(G_k^\*\) is the best (lowest) rollout score found by voice \(k\). \(\Omega_k\) is a light penalty when a voice becomes inconsistent with the others (so it can’t permanently dominate).

What “EFE-like” means here (simple version)

Expected free energy (EFE) is a planning functional that can trade off utility/risk against information gain. In many formulations you’ll see a decomposition into terms that look like: “seek preferred outcomes” + “reduce uncertainty”.

\[ G(\pi) \approx \underbrace{\mathbb{E}[\text{risk / cost}]}_{\text{pragmatic}} \;-\; \underbrace{\mathbb{E}[\text{information gain}]}_{\text{epistemic}} \]

In this gridworld, different voices emphasise different pieces of this trade-off.

The precise form varies by implementation; what matters for this demo is: each voice scores imagined futures in a way that encodes its “personality” (safety, goal-seeking, exploration, energy maintenance).

Why polyphonic intelligence matters

Real agents rarely have one objective. They juggle constraints, drives, and competing time-scales. Polyphonic intelligence treats this not as an afterthought (“add a penalty term”), but as a first-class architectural principle: maintain multiple concurrent generative models and integrate them without forcing collapse to a single winner.

The result is behaviour that can look more robust (doesn’t fail catastrophically when one objective becomes brittle), more adaptable (weights shift with context), and more interpretable (you can inspect which voice is driving which actions).