Platform
A marathoner runs a sprinter's pace.
They just keep going.
A lot of hard work stalls for one reason, and it is rarely raw intelligence. It stalls because the system loses coherence before it can finish. The model is smart enough at any single step; it just loses track of some of a thousand steps of its own decisions. FROS takes that load off the model and puts it somewhere that remembers. The intelligence stays fixed. The distance it can carry that intelligence stops being the limit.
The mechanism
One loop, five beats
Commit, check, repair, admit, resume. The agent works above; the record holds below. The fifth beat kills the session mid-task, and the record holds fast. Click a chapter to jump straight to it.
The distinction everyone collapses
“Long context” and “consistent over long context” are different products
A larger window is about how much a model can take in. Coherence is about whether it stays true to what it already concluded while it keeps going. A two-million-token window reads more and still drifts. The tools people reach for solve the first problem and leave the second one open.
Recalls the past, stops there
Memory tools store what happened and fetch it back later. Useful, and a different job from making sure the next step agrees with it. They remember; they leave the model free to ignore it.
Judges after the fact
Observability and eval tools tell you something drifted once it already has, usually by asking another model to grade the output. That is a measurement, taken too late to stop the error.
Orders the flow, skips the truth
Agent frameworks route steps and tools in the right order. They keep the process moving. Step forty goes unchecked against what step three established.
Catches it before it lands
A check that runs on every step, gives the same answer every time, and is sound about what follows from what. Preventive, deterministic, and consistent. That combination is the gap the other three leave.
Why models drift
Drift is a property of the path, not the step
Any single check, on a clean snapshot, a capable model does well. The trouble shows up over a trajectory, where state accumulates faster than the model can keep faith with it.
State accumulates
Step by step, the model commits to more: definitions, constraints, partial results. The pile of things it must stay consistent with grows the whole time, while the window that holds it stays the same size.
The checker drifts too
The obvious fix, have the model review its own work, runs into the same wall. The reviewer is the same kind of system, carrying the same overloaded state. It can confidently bless a contradiction that has slipped out of its view.
One thing in the loop holds steady
The engine holds the committed state outside the model and checks against all of it, the same way, regardless of how long the task has run. It is the one part of the loop whose accuracy holds as the trajectory gets longer.
Why a model actually uses it
It returns the fix along with the verdict
A guard that only blocks
A checker that rejects and stops there becomes something the model learns to route around. It adds friction and hands back a dead end, so under pressure the system finds a path that skips it.
A tool that answers
The model can ask a different question: I need this to be true, what target keeps everything else consistent? The engine returns the smallest change that satisfies it. That is a thing worth calling on every step, because it moves the work forward as it enforces.
Evidence
Proven in the hardest domains
We built the engine against domains where a wrong call costs something and the right call has to be defensible: resolving what a word means in context (94.5% on the standard benchmark, more than 12 points clear of every published neural system), and checking patient findings against 30,981 diagnoses with a written reason for each one ruled out. Both results are the engine on its own, zero learned parameters in the part that renders the verdict.
Where we draw the line
Two honest limits
Consistent can still be wrong
The guarantee is narrow: the model stays true to what it has committed to. Whether those commitments were good ones is a separate question. Feed it wrong assumptions and you get work that is wrong and perfectly coherent. FROS removes incoherence and leaves judgment to you.
Unbounded state, bounded reasoning
What offloads to the engine is the committed record, the part that can be made precise. The in-the-moment reasoning still lives in the model and is as bounded as it ever was. We extend coherence length. The ceiling on per-step intelligence stays where it is.