Demo
This is a toy environment where behaviour emerges from the negotiation of multiple concurrent objectives (“voices”) rather than a single dominating controller. Each voice runs a short-horizon rollout, proposes an action distribution, and a polyphonic coordination layer integrates them into the final move.
Tip: if the video doesn’t autoplay on mobile, tap once — it’s playsinline.
Each voice is a small generative model + preference set. It scores short rollouts using an EFE-like objective (risk/utility, exploration, resource constraints), then converts rollout scores into an action distribution.
The agent does three things repeatedly: (i) infer hidden state (here: where hazards likely are), (ii) look ahead a few steps (rollout) using its transition model, and (iii) choose actions that minimise an expected “surprise/cost” functional.
In many active-inference demos, these ingredients are wrapped into a single objective. Here, we keep multiple objectives explicit by making them separate voices that run in parallel.
In code terms: per voice → rollout → EFE-like score → softmax → \(q_k(a)\).
Each voice proposes a distribution over actions \(q_k(a)\). The polyphonic layer keeps a running (leaky) evidence score for each voice and converts that into mixture weights \(\pi_k\). The negotiated action posterior is:
\[ q(a) \;=\; \sum_{k=1}^{K}\pi_k\,q_k(a) \]
Intuition: “do a weighted average of proposed actions” — but the weights evolve online.
A minimal version of the weight dynamics is:
\[ \ell_k(t) \leftarrow \rho\,\ell_k(t-1) + \underbrace{(-G_k^\*)}_{\text{evidence}} \;-\; \lambda\,\underbrace{\Omega_k}_{\text{anti-dominance / compatibility}} \] \[ \pi_k \;=\; \mathrm{softmax}\!\big(\gamma\,\ell_k\big), \quad \pi \leftarrow (1-\varepsilon)\pi + \varepsilon\,\pi_{\mathrm{floor}} \]
Here \(G_k^\*\) is the best (lowest) rollout score found by voice \(k\). \(\Omega_k\) is a light penalty when a voice becomes inconsistent with the others (so it can’t permanently dominate).
Expected free energy (EFE) is a planning functional that can trade off utility/risk against information gain. In many formulations you’ll see a decomposition into terms that look like: “seek preferred outcomes” + “reduce uncertainty”.
\[ G(\pi) \approx \underbrace{\mathbb{E}[\text{risk / cost}]}_{\text{pragmatic}} \;-\; \underbrace{\mathbb{E}[\text{information gain}]}_{\text{epistemic}} \]
In this gridworld, different voices emphasise different pieces of this trade-off.
The precise form varies by implementation; what matters for this demo is: each voice scores imagined futures in a way that encodes its “personality” (safety, goal-seeking, exploration, energy maintenance).
Real agents rarely have one objective. They juggle constraints, drives, and competing time-scales. Polyphonic intelligence treats this not as an afterthought (“add a penalty term”), but as a first-class architectural principle: maintain multiple concurrent generative models and integrate them without forcing collapse to a single winner.
The result is behaviour that can look more robust (doesn’t fail catastrophically when one objective becomes brittle), more adaptable (weights shift with context), and more interpretable (you can inspect which voice is driving which actions).