Thermodynamic Variational Laplace — Pong Agent

Speed × 1.0 Obs. noise σ 1.5 β schedule H (temp steps)

Free energy F

–

β (temperature)

–

Score

Agent posterior μ Observation Predicted intercept

What is this doing?

The agent maintains Gaussian beliefs over the ball state s = [x,y,vx,vy] and the paddle action a. It performs a lightweight Thermodynamic Variational Laplace (TVL) update by sweeping an inverse temperature β from 0→1 over H steps, iteratively updating the posterior mean μ with a Laplace (Newton) step on a surrogate free energy objective:

F(μ) ≈ β · ||o − g(μ)||²/σ² + KL[q(μ) || p(μ)] + λ·||a||²

Here, g is a simple forward model of Pong physics. Action selection uses a one-step expected free energy proxy that favours paddle positions that reduce predicted sensory surprise at the moment of contact.