Mathematically proven safety
FR-OS is a formally verified evaluation engine. It checks any structured input against mathematically proven rules and returns a definitive verdict. The input could be an AI response, a document, a contract clause, a medical record, or a transaction. The same guarantees hold at every scale. Shellfinity's first application: LLM governance. Every AI response verified against your policies before it reaches your users.
See it in action
A patient describes crushing chest pain, sweating, and nausea. FR-OS identifies the clinical findings, verifies 30,981 diagnoses against formally proven invariants, and returns a ranked differential with exclusion certificates. No physician input required.
Benchmark results
FR-OS resolves which meaning a word carries in context.
Tested on the standard Raganato ALL benchmark (7,253 instances across 5 datasets),
the engine exceeds published state of the art with zero learned parameters,
zero training data, and fully deterministic evaluation.
Full WSD Report
| System | Avg F1 | Parameters | Training |
|---|---|---|---|
| FR-OS | 88.4% | 0 | None |
| DeBERTa (fine-tuned) | ~82% | 350M | SemCor |
| BEM (BERT) | ~80% | 340M | SemCor |
| GPT-4 (few-shot) | ~80% | ~1.8T | Pretraining |
| Most Frequent Sense | ~65% | 0 | Frequency counts |
| Dataset | Instances | F1 |
|---|---|---|
| Senseval-2 | 2,282 | 88.9% |
| Senseval-3 | 1,850 | 85.5% |
| SemEval-2007 | 455 | 90.3% |
| SemEval-2013 | 1,644 | 89.2% |
| SemEval-2015 | 1,022 | 90.7% |
| Average | 7,253 | 88.4% |
Across Senseval-2, Senseval-3, SemEval-2007, SemEval-2013, SemEval-2015
When the engine decides, it is correct 95%+ of the time
No neural network. No training. Deterministic evaluation with self-correcting data.
The problem
Most AI guardrails use another AI model to judge the output. That second model has its own blind spots, its own failure modes, and returns vague confidence scores instead of clear answers. "The filter probably caught it" isn't good enough.
FR-OS checks AI output against your rules using mathematically proven logic. You get a clear yes/no verdict, plus a detailed report showing exactly what violated your policy and how to fix it.
How it works
Define policies in plain English: "block harmful content", "limit sensitive topics to 3", "require safety disclaimers". FR-OS compiles them into rules that are mathematically guaranteed to work.
Your AI model produces output without restriction. You won't need to worry about prompt engineering workarounds, quality trade-offs, or interference with what the model does best.
FR-OS evaluates the output against your rules, then returns "pass" or "fail" with a report naming exactly what was flagged and what to fix. Deterministic, consistent, and final.
Why Shellfinity
Other tools return confidence percentages you have to interpret. FR-OS returns a definitive yes or no, with a detailed report you can audit and act on.
Keyword lists are brittle and miss context. FR-OS policies understand categories and relationships: block one term and related terms are covered automatically.
System prompts are instructions the AI can ignore or be tricked into bypassing. FR-OS checks output after generation; it can't be jailbroken because it verifies after generation, not during.
FR-OS is built on mathematical proofs verified by machine. No matter how you run it, you get the same verdict. A proof, every time.
Early access
Be the first to know when FR-OS launches. We'll notify you when API access is available.