Overview
The model places an active inference agent in a discrete two-dimensional world containing a current goal, impassable walls, charging stations, and hidden hazards. The environment is only partially observed, so the agent must maintain and update probabilistic beliefs about local threat structure as it moves. This is not simply path planning. It is a belief-guided control problem in which action depends on inferred context, uncertainty, and internal constraints as much as on the goal itself.
A central idea of the polyphonic framework is that action should not be treated as the output of a single monolithic utility function. Instead, distinct control units or voices express partially competing priorities. A higher-level latent mode then shapes how much influence each voice has at a given moment, allowing the overall policy to change as the situation changes.
World and internal architecture
Environment
The grid contains walls, one current goal, charging stations, and true hazards that are not directly visible unless sampled through noisy local observation.
State
The agent maintains a position in the grid together with a battery state that influences both preferences and transition success.
Observations
At each step the agent sees only a local patch around itself and receives noisy hazard cues, forcing it to infer danger rather than read it off directly.
Beliefs
A hazard belief map stores the posterior probability that each location is dangerous, and a separate reset memory map marks recently catastrophic outcomes as temporarily more aversive.
Modes
The agent infers a posterior over behavioural modes such as Explore, Pursue Goal, Recharge, Avoid Threat, and Verify, based on context including battery urgency, uncertainty, and local threat.
Voices
Safety, Goal, Epistemic, Energy, and Habit voices each evaluate the same candidate futures using a different weighting over cost and information terms.
Expected free energy and policy evaluation
For each voice, the agent evaluates a finite set of short-horizon policies. These candidate futures are rolled forward under approximate transition dynamics and scored using an expected free energy style decomposition. In this implementation, the planner combines three broad terms: risk, ambiguity, and epistemic value.
Risk captures mismatch with preferences, including hazard exposure, distance from the current subgoal, low battery, charger viability failure, reset-memory penalties, and control costs. Ambiguity reflects uncertainty in outcomes, approximated here from the entropy of hazard beliefs at predicted locations. Epistemic value acts as an information-seeking term, favouring locations that remain uncertain and relatively unvisited.
Each voice therefore forms its own posterior over policies. These voice-wise posteriors are then mixed using a set of adaptive voice weights derived from the current mode posterior, giving a polyphonic posterior over action rather than a single-system decision rule.
Why this matters
The model sits between a simple reward-maximising controller and a full exact active inference scheme. It is more structured than conventional utility maximisation because it separates competing imperatives, explicitly represents uncertainty, uses hierarchical context inference, and supports subgoal-based route restructuring.
At the same time, it remains computationally tractable and visually interpretable, which makes it useful both as a working agent and as a conceptual platform for studying how inference, arbitration, and planning interact.
Algorithm at a glance
Interpretation
One useful aspect of this framework is that failures are often informative rather than merely undesirable. Oscillations between goal pursuit and threat avoidance, deadlocks near bottlenecks, repeated visits to risky regions, or unstable subgoal switching can reveal which representational or planning ingredients are still missing from the model.
In that sense, the gridworld functions as more than a toy benchmark. It becomes a transparent setting in which theoretical assumptions about inference, arbitration, and internal control structure can be inspected directly.
Possible next steps
Natural extensions include smoother probabilistic threat fields, more expressive route-level subgoal inference, explicit belief updates over path structure, policy pruning or tree search for longer horizons, and a fuller discrete-state active inference formalism with explicit likelihood and transition matrices.
The broader motivation is to move toward richer active inference agents in which planning, uncertainty, internal drives, and hierarchical arbitration interact in a reusable and interpretable way.