HJB Safety Demos — Kevin Weekly

Physical AI needs more than a confidence score

Frontier models are spectacular at perception, planning, and language. They're also opaque: their outputs come with probabilities, not bounds. For a chatbot, that's a UX problem. For a robot that has to decide whether to brake as a person steps into an aisle, it's a different category of problem entirely. "I'm 99.4% sure the person isn't there" is not what you want underneath two tons of moving steel.

Robust autonomy engineering — the discipline that has to sit between a learned policy and a physical actuator — makes a different bet. It accepts that the planner above it may be a black box, and asks a sharper question: given the physics of this system and what we know about the world, what is provably safe? Not "probably safe," not "safe most of the time" — provably, deterministically safe, under any admissible behavior of the things the robot doesn't control.

That guarantee has a name and a long pedigree. Isaacs framed it in the 1960s as a two-player differential game between you and the world;^[1] Mitchell, Bayen, and Tomlin gave it the modern formulation that the field is built on;^[2] Mitchell's level-set toolbox^[3] put it in everyone's hands; Bansal, Chen, Herbert, and Tomlin^[4] wrote the overview to read first. My own Berkeley dissertation leaned on it for an underactuated robotic fleet drifting in the Sacramento–San Joaquin delta, where the river current did more of the work than the boats.^[5]

The point of the demos below isn't the math. It's what the guarantee feels like once you have one. Three scenarios. Same underlying principle. Each one a piece of the safety layer you'd put under a learned planner before you let it run anything that moves.

1. The safety filter that doesn't over-stop

The most common failure mode of a learned planner in an industrial AMR isn't running into someone — fleet operators won't even let one ship without a deterministic safety layer underneath. The failure mode is over-stopping. Every time a person walks past the end of an aisle, the robot freezes for two seconds, throughput drops, and the human operators stop trusting the system.

The robot below has a black-box patrol planner on top (picks, places, routes) and a deterministic safety filter underneath. The filter knows the warehouse geometry. It knows the worst case a person can physically do from where they are right now — not where they probably are, where they could be in the next two seconds under any motion they might choose. If that worst case stays clear of the robot's reachable set, the filter stays out of the way and lets the planner work. The moment it doesn't, the filter takes over and the robot follows a heading it can prove keeps it safe. Drag the person around. Flip them to Adversarial and watch what changes.

Loading…

Robot

Person

Red = collision-imminent. Yellow = robot avoiding (TTR < 1 s). The robot patrols pick-points along the shelf aisles; when a person crosses its safety threshold it switches to the precomputed escape heading. Drag the person anywhere, or flip them to Adversarial and let the lower-value player do its worst.

2. Knowing when commit is justified

A learned pursuit policy might be very good on average — and have no idea when it has lost. The interesting question for any closing engagement isn't "what heading should I fly?" It's the meta-question: given the dynamics, the environment, and the adversary's freedom of action, is success still reachable from here at all? If the answer is no, you abort, hand back to a higher-level planner, and save the energy.

The blue region in the demo is exactly that answer. Anywhere inside it, success is guaranteed within the time horizon — even against an evader that plays as well as the math allows. Outside it, the guarantee is gone. The boundary is sharp, deterministic, and doesn't care what model you're using on top. The wind field deforms it; obstacles deform it; the relative speeds deform it. Try the four pursuer strategies against the scenarios — they all behave differently, but only one of them is operating with the guarantee in hand. This is the same machinery that Margellos & Lygeros^[6] and Fisac et al.^[7] formalized for reach-avoid problems, and that Bayen, Mitchell, Oishi, and Tomlin used for real aircraft autolander certification.^[8]

Loading…

Pursuer

Evader

Scenario

Wind

The blue region is the capture-guaranteed BRS — anywhere inside it, the HJB-optimal pursuer can force interception within the 10 s horizon, even against an optimal evader. Compare strategies: pure pursuit hugs the line-of-sight and gets out-cornered; proportional navigation is great against straight-line targets but ignores walls and wind; HJB exploits both. Drag the evader to set up your own scenario.

3. Knowing where your coverage actually is

Scale up. A network of defenders. A single fast threat that's locally faster than any of them. Which one scrambles? A learned dispatcher would learn something, and probably be right most of the time. The harder question is the inverse: where on the map can no one of you reach in time? Those are the gaps, and you want them on a screen before the alarm goes off, not after.

The colored cells in the demo are each defender's capture basin under the wind field. The boundaries don't follow geometry; they follow the physics. A defender launched from upwind into a jet stream reaches further than a closer defender launched into a headwind. The dark cells are the coverage holes — threats placed there cannot be reached by any of the six, full stop. This is what deterministic coverage analysis looks like. You can build a black-box dispatcher on top of it; you cannot replace it with one.^[9]

Loading…

View

Scenario

Six interceptor sites (130 km/h) defend against a faster cruise missile (185 km/h). The colored cells are each pursuer's capture basin under the wind field; dark cells are positions no defender can reach in time. Wind is the equalizer — read where the jet stream and vortex bend the basins. Click to place the threat, drag to set heading, then Launch.

The shape of robust autonomy

None of this replaces a learned policy. It doesn't try to. The pitch is the opposite — that the more capable the model on top, the more carefully you have to engineer what sits underneath it. A neural planner can be very smart and still wrong; the layer below has to be the one that's never wrong about what's physically possible. That's the part that scales. That's the part that certifies. That's the part you keep when the model version changes.

Three demos, three faces of the same idea: a safety filter that doesn't over-stop, a commit/abort criterion that doesn't guess, a coverage map that doesn't lie. The interesting frontier in physical AI isn't replacing this layer with a bigger network. It's building it well enough that you can let a bigger network sit on top of it.

If you're working on safe autonomy for physical systems — embodied agents, robotic fleets, aerospace — and want to compare notes, or want a version of one of these built for your own dynamics — nerd256@gmail.com.

References & further reading

R. Isaacs. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. Wiley, 1965. The original framing of pursuit-evasion as a two-player optimal-control problem.
I. M. Mitchell, A. M. Bayen, C. J. Tomlin. A time-dependent Hamilton–Jacobi formulation of reachable sets for continuous dynamic games. IEEE Transactions on Automatic Control, 50(7):947–957, 2005. The formulation the field is built on.
I. M. Mitchell. A toolbox of level set methods (version 1.1). Tech. Rep. TR-2007-11, Dept. of Computer Science, Univ. of British Columbia, 2007. The ToolboxLS implementation that put HJ reachability in everyone's hands.
S. Bansal, M. Chen, S. Herbert, C. J. Tomlin. Hamilton–Jacobi reachability: A brief overview and recent advances. IEEE Conference on Decision and Control (CDC), 2017. The current best overview; start here.
K. Weekly, A. M. Bayen, et al. Autonomous river navigation using the Hamilton–Jacobi framework for underactuated vehicles. IEEE Transactions on Robotics, 30(5):1250–1255, 2014. My dissertation work — HJ reach sets for a robotic floating-sensor fleet in the Sacramento–San Joaquin delta.
K. Margellos, J. Lygeros. Hamilton–Jacobi formulation for reach–avoid differential games. IEEE Transactions on Automatic Control, 56(8):1849–1861, 2011. The machinery behind "guaranteed capture / guaranteed survival."
J. F. Fisac, M. Chen, C. J. Tomlin, S. S. Sastry. Reach-avoid problems with time-varying dynamics, targets and constraints. Proc. 18th International Conference on Hybrid Systems: Computation and Control (HSCC), 2015.
A. M. Bayen, I. M. Mitchell, M. M. K. Oishi, C. J. Tomlin. Aircraft autolander safety analysis through optimal control-based reach set computation. AIAA Journal of Guidance, Control, and Dynamics, 30(1):68–77, 2007. The "real world" instance of what the drone demo is a toy version of.
J. A. Sethian. A fast marching level set method for monotonically advancing fronts. Proceedings of the National Academy of Sciences, 93(4):1591–1595, 1996. The Eikonal solver inside the interceptor-network demo.

← Back to kevinweekly.com