A marathoner runs a sprinter's pace.
They just keep going.

A lot of hard work stalls for one reason, and it is rarely raw intelligence. It stalls because the system loses coherence before it can finish. The model is smart enough at any single step; it just loses track of some of a thousand steps of its own decisions. FROS takes that load off the model and puts it somewhere that remembers. The intelligence stays fixed. The distance it can carry that intelligence stops being the limit.

One loop, five beats

Commit, check, repair, admit, resume. The agent works above; the record holds below. The fifth beat kills the session mid-task, and the record holds fast. Click a chapter to jump straight to it.

“Long context” and “consistent over long context” are different products

A larger window is about how much a model can take in. Coherence is about whether it stays true to what it already concluded while it keeps going. A two-million-token window reads more and still drifts. The tools people reach for solve the first problem and leave the second one open.

Memory

Recalls the past, stops there

Memory tools store what happened and fetch it back later. Useful, and a different job from making sure the next step agrees with it. They remember; they leave the model free to ignore it.

Eval

Judges after the fact

Observability and eval tools tell you something drifted once it already has, usually by asking another model to grade the output. That is a measurement, taken too late to stop the error.

Orchestration

Orders the flow, skips the truth

Agent frameworks route steps and tools in the right order. They keep the process moving. Step forty goes unchecked against what step three established.

FROS

Catches it before it lands

A check that runs on every step, gives the same answer every time, and is sound about what follows from what. Preventive, deterministic, and consistent. That combination is the gap the other three leave.

Drift is a property of the path, not the step

Any single check, on a clean snapshot, a capable model does well. The trouble shows up over a trajectory, where state accumulates faster than the model can keep faith with it.

01

State accumulates

Step by step, the model commits to more: definitions, constraints, partial results. The pile of things it must stay consistent with grows the whole time, while the window that holds it stays the same size.

02

The checker drifts too

The obvious fix, have the model review its own work, runs into the same wall. The reviewer is the same kind of system, carrying the same overloaded state. It can confidently bless a contradiction that has slipped out of its view.

03

One thing in the loop holds steady

The engine holds the committed state outside the model and checks against all of it, the same way, regardless of how long the task has run. It is the one part of the loop whose accuracy holds as the trajectory gets longer.

It returns the fix along with the verdict

A guard that only blocks

A checker that rejects and stops there becomes something the model learns to route around. It adds friction and hands back a dead end, so under pressure the system finds a path that skips it.

A tool that answers

The model can ask a different question: I need this to be true, what target keeps everything else consistent? The engine returns the smallest change that satisfies it. That is a thing worth calling on every step, because it moves the work forward as it enforces.

Proven in the hardest domains

We built the engine against domains where a wrong call costs something and the right call has to be defensible: resolving what a word means in context (94.5% on the standard benchmark, more than 12 points clear of every published neural system), and checking patient findings against 30,981 diagnoses with a written reason for each one ruled out. Both results are the engine on its own, zero learned parameters in the part that renders the verdict.

Two honest limits

01

Consistent can still be wrong

The guarantee is narrow: the model stays true to what it has committed to. Whether those commitments were good ones is a separate question. Feed it wrong assumptions and you get work that is wrong and perfectly coherent. FROS removes incoherence and leaves judgment to you.

02

Unbounded state, bounded reasoning

What offloads to the engine is the committed record, the part that can be made precise. The in-the-moment reasoning still lives in the model and is as bounded as it ever was. We extend coherence length. The ceiling on per-step intelligence stays where it is.

Operator-led pilots running now

Get pilot access Email Daniel