• Back to lab homepage
  • Thermodynamic Variational Laplace — Pong Agent

    Free energy F
    β (temperature)
    Score
    0
    Agent posterior μ Observation Predicted intercept
    What is this doing?

    The agent maintains Gaussian beliefs over the ball state s = [x,y,vx,vy] and the paddle action a. It performs a lightweight Thermodynamic Variational Laplace (TVL) update by sweeping an inverse temperature β from 0→1 over H steps, iteratively updating the posterior mean μ with a Laplace (Newton) step on a surrogate free energy objective:

    F(μ) ≈ β · ||o − g(μ)||²/σ² + KL[q(μ) || p(μ)] + λ·||a||²

    Here, g is a simple forward model of Pong physics. Action selection uses a one-step expected free energy proxy that favours paddle positions that reduce predicted sensory surprise at the moment of contact.