CPNS Lab
Computational Psychiatry & Neuropharmacological Systems (CPNS) · University of Exeter

Interactive demo

Polyphonic Active Inference Gridworld

This demo shows a simple active inference agent whose behaviour emerges from the negotiation of multiple concurrent objectives rather than a single controller. Safety, goal-seeking, exploration, and energy maintenance each run in parallel, propose candidate actions, and are then integrated by a polyphonic coordination layer into a single negotiated move.

Tip: if the video doesn’t autoplay on mobile, tap once — it’s playsinline.

The world panel shows the agent’s inferred hazard landscape rather than ground truth. The right-hand panels summarise the current balance of influence across voices, the action probabilities each voice favours, and the final negotiated action selected at that time-step.

World legend

  • Shaded grid: the agent’s current belief about local hazard probability. This is an inferred risk map, not the true world state.
  • Black squares: walls and impassable structure.
  • Gold square: the current goal location. Once reached, a new target is spawned elsewhere in the environment.
  • Purple pentagons: charging stations that replenish the battery.
  • White circle: the agent’s current position.
  • Arrow: the final negotiated action direction.
  • Crosses (optional): true hazards, shown only in debugging views.

What the dashboard shows

  • Belief map: \(p(\mathrm{hazard}\mid \mathrm{data})\) over the grid, updated from local noisy observations.
  • Voice weights: the current influence of each objective in the polyphonic mixture.
  • Action panel: the probability distribution over available actions for each voice, alongside the final mixture.
  • Diagnostics: compact state information including position, battery, score, selected action, and compatibility between voices.
  • Goal respawning: once a target is reached, a new reachable target is spawned so the task continues to generate changing trade-offs.

The four voices

Each voice represents a distinct objective with its own short-horizon evaluation of possible futures. Rather than collapsing everything into a single fixed reward, the agent maintains these pressures explicitly and integrates them online.

  1. Safety: avoids locations believed to be hazardous.
  2. Goal: moves toward the current target while remaining risk-aware.
  3. Epistemic: prefers information gain, novelty, and uncertainty reduction.
  4. Energy: maintains battery and prioritises charging when reserves become low.

Active inference ingredients

At each step the agent updates beliefs about hidden structure in the world, rolls imagined futures forward using an internal transition model, and selects actions that minimise an expected cost or surprise functional.

What makes this demo different is that these computations are distributed across several concurrent objectives rather than folded into one monolithic controller.

Per voice: rollout → score imagined futures → convert to \(q_k(a)\) → integrate into the negotiated policy.

Polyphonic coordination

Each voice proposes an action distribution \(q_k(a)\). A coordination layer then assigns mixture weights \(\pi_k\) according to current context, such as local threat, distance to goal, uncertainty, and battery urgency. The final negotiated action posterior is:

\[ q(a) \;=\; \sum_{k=1}^{K}\pi_k\,q_k(a) \]

In other words, the chosen action is a weighted synthesis of several simultaneously active objectives.

The weights evolve online rather than being fixed in advance, allowing the agent to shift smoothly between caution, goal pursuit, exploration, and homeostatic regulation as the situation changes.

What “EFE-like” means here

Each voice evaluates short imagined futures using a cost functional that captures its own priorities. In active inference terms, this resembles expected free energy in that it trades off pragmatic value against information-seeking pressure.

\[ G(\pi) \approx \underbrace{\mathbb{E}[\text{risk / cost}]}_{\text{pragmatic}} \;-\; \underbrace{\mathbb{E}[\text{information gain}]}_{\text{epistemic}} \]

Different voices place different emphasis on these terms, giving each one a distinct behavioural tendency.

The exact implementation here is intentionally simple. The point of the demo is not formal completeness, but to show how multiple active objectives can be kept explicit and negotiated online.

Why polyphonic intelligence matters

Real agents do not optimise a single objective in isolation. They balance safety, goals, uncertainty, energy, and other constraints across multiple time-scales. Polyphonic intelligence treats this multiplicity as a core architectural feature rather than an afterthought.

The result is behaviour that is often more interpretable, more adaptable to changing context, and less brittle than systems built around a single dominant controller. Because each voice remains visible, the agent’s decisions can be inspected in terms of which pressures were active and how they were resolved.