This project implements a physically embodied drone agent that pursues a beacon in a cluttered 3D environment using a polyphonic active-inference-style architecture. The agent does not act directly on simulator truth. Instead, simulator state is converted into noisy observations, posterior beliefs are updated online, and action is selected by a short-horizon controller whose objectives are modulated by a slower hierarchical latent layer over visibility, progress, affordance, and behavioural context.
The result is an interpretable control system that can maintain a remembered target under occlusion, switch between search, approach, repositioning, recovery, and stabilisation, and use uncertainty and scene interpretation to arbitrate behaviour in a way that is much closer to a real inferential agent than a hand-scripted planner.
Many active inference demos either operate in very small discrete worlds or rely on hidden shortcuts that make control easier than it first appears. This system was built to push beyond that. The drone lives in a continuous 3D environment, receives imperfect local observations, and must maintain a target belief under partial occlusion while avoiding obstacles and stabilising flight. It does not simply chase ground-truth coordinates.
The more distinctive ingredient is the polyphonic control architecture. Instead of reducing behaviour to a single scalar objective, the controller combines multiple pressures: pragmatic target pursuit, safety, altitude and heading stabilisation, open-space preference, uncertainty reduction, and mode-specific contextual drives. These pressures do not remain fixed. In the final version, they are modulated by a slower latent layer that tries to infer what kind of situation the agent is currently in.
The drone moves in a PyBullet environment with obstacles and a target beacon. The simulator provides truth internally, but the controller only receives noisy derived observations.
Posterior beliefs are maintained over self state, target state, and local obstacle structure. This is the first layer that converts raw observations into hidden-state estimates.
A slower latent layer summarises what kind of situation the agent is in and feeds top-down biases back into control.
The controller does not issue commands from a single global optimiser. It arbitrates among a small family of interpretable behavioural regimes, each of which can dominate under different belief and context conditions.
The key point is that these are no longer brittle if-else switches. In the Stage B5 and Stage C variants, the controller maintains posterior-like probabilities over modes and uses those probabilities as an arbitration layer.
“Polyphonic” here means that action is not determined by one monolithic score. Several behavioural voices contribute pressure simultaneously, and their relative influence changes with the inferred situation.
The scene/context layer acts mainly by modulating the effective precision of these voices, rather than replacing the controller with a second heavy planner.
Let the fast hidden state factor as
In the current implementation these correspond approximately to:
where $d_k$ denotes the inferred obstacle distance along ray direction $k$.
The controller sees a noisy observation vector
These are generated from simulator truth but presented to the controller as noisy measurements:
where $h(\cdot)$ returns egocentric range, bearing, and elevation when the target is visible or softly observable.
The fast belief layer maintains approximate Gaussian posteriors over continuous latent states,
In the implementation, these are updated by lightweight prediction-correction steps rather than a fully general symbolic message-passing engine. The spirit is variational: beliefs are predicted forward under a transition model, corrected by precision-weighted observation errors, and retained as the sufficient statistics driving control.
For the target, egocentric cues are reconstructed into world-frame target estimates using the current self belief. Under chronic occlusion, soft cues are treated as weak evidence: confidence decays, covariance inflates, and the system is prevented from becoming overconfident under poor visibility.
For each candidate short-horizon control sequence $\pi_k$, the controller rolls the dynamics forward and computes a composite score with pragmatic, safety, stability, and epistemic components. A stylised form is:
Representative terms include:
where $C_{\mathrm{occ}}$ is an occlusion proxy and the weights are modulated by the slow latent context state. Lower $G(\pi_k)$ is preferred.
Behaviour is also mediated by a discrete posterior-like belief over controller modes,
where the mode logits $\ell_t$ depend on target confidence, visibility status, occlusion duration, local free-space structure, progress, and proximity to the target. This gives a compact inferential layer over behavioural regime selection.
The Stage C extension introduces a slower latent state
capturing visibility regime, progress regime, affordance regime, and contextual precision control. In practice this layer is updated from smoothed evidence over multiple timesteps and feeds back into control via precision-like variables such as exploit-vs-explore balance, target-memory precision, epistemic precision, safety precision, and goal precision.
so the higher layer changes how strongly different control voices are trusted without replacing the lower layer entirely.
Although the implementation is deliberately lightweight, the architecture can be interpreted in standard active inference terms. The fast layer minimises a variational free energy over hidden states,
while action selection approximately minimises an expected free energy surrogate over candidate policies,
In the present system, $G(\pi)$ is not derived from a fully exact deep generative model, but it is close enough in form and function to support the key active inference intuition: policies are selected not only for immediate pragmatic value, but also for their relationship to uncertainty, visibility, safety, and future evidence.
The codebase is organised around a clear separation between world, sensing, inference, and control:
The central engineering decision is that the controller never consumes simulator truth directly. Truth exists only to generate noisy observations, which are then filtered into posterior beliefs. This is what makes the project an inferential control system rather than simply a sophisticated heuristic tracker.
This separation proved crucial during development. Earlier versions became brittle when the target was treated as either fully known or too quickly “lost”. The final Stage C1.1 system works because the target can remain remembered under uncertainty without becoming unrealistically certain under chronic occlusion.
In the successful demonstration run attached to this page, the drone executes a mixture of search, rollout-based pursuit, recovery, and local stabilisation. Logs from that run show:
These numbers matter because they show that the final controller is not doing one thing all the time. It is switching intelligently between exploitative pursuit, evidence-seeking, and local stabilisation.