The agent maintains Gaussian beliefs over the ball state s = [x,y,vx,vy] and the paddle action a. It performs a lightweight Thermodynamic Variational Laplace (TVL) update by sweeping an inverse temperature β from 0→1 over H steps, iteratively updating the posterior mean μ with a Laplace (Newton) step on a surrogate free energy objective:
F(μ) ≈ β · ||o − g(μ)||²/σ² + KL[q(μ) || p(μ)] + λ·||a||²
Here, g is a simple forward model of Pong physics. Action selection uses a one-step expected free energy proxy that favours paddle positions that reduce predicted sensory surprise at the moment of contact.