RRunReceipts
For people who run coding agents unattended or in parallel

Your agent said “done, all tests passing.” Was it?

You kicked off the runs and went to bed. The morning ritual is scrolling transcripts and hoping nothing broke. RunReceipts replaces the hope: every run leaves a receipt, the diff, the checks that actually executed, exit codes included, and the receipt gets graded before the work counts.

or see the $29 one-time pass »

Built on 12 years of refusing self-reports, at LinkedIn, Google, Meta & ElevenLabs.

The rule comes from 12 years of threat intel at

A system reporting on itself is not evidence. That rule was the day job. Now it reads your agents' homework.

Free · no account · deterministic

The agent doesn't grade its own homework. This does.

Paste a run receipt, or point at a public GitHub PR, and get the structural grade in milliseconds: what changed, what ran, what passed. Same receipt in, same grade out, forever.

Runs a pure function. Nothing is stored, nothing hits a model.

Reading transcripts is not verification.

One agent lying politely is catchable by eye. Eight in parallel is not. Most of what people call verification is scrolling until you feel ok: vibes with a progress bar. The transcript is the agent's story about what it did. The diff is what it did.

$29one-time · not a subscription
3 freehosted verdicts to start
Exit codesevidence, not prose
Fresh contextthe verifier never sees the transcript
The grade

Every run leaves a receipt. The receipt gets graded.

Deterministic structural checks, every flag citing a principle. A success claim with no passing check that plausibly exercises it gets flagged the moment it lands, not three days later in production.

run-2041 · receipt
npm test :: 0tsc --noEmit :: 0npx eslint src/ :: (no exit code)diff: 3 files · +84 / -12
C

A check without an exit code is a claim that a check ran, not evidence that it did.

Fresh context

The verifier never reads the transcript.

A second pass judges the receipt against your original spec, and only that. An agent reading its own transcript inherits its own optimism. This one never gets the chance.

Fresh-context verdict

“The diff matches the spec's error-path requirement, and the passing tests exercise it. But the claim says ‘all tasks’ and the receipt only evidences one. Ask the agent for receipts on the other three.”

spec: matchedclaim: overreaches
The side effect

Your task specs get sharper.

When every run needs a machine-checkable definition of done, you find out fast which of your tasks were actually vibes. Finding that out before the overnight run, not after, is a feature.

Before the run, not after
  • “Refactor auth to be cleaner”vibes
  • “401 on stale cookie, test proves it”checkable
  • “Improve error handling”vibes
The hosted verifier

Stop merging on hope.

The free grader checks structure. The pass adds judgment: a fresh-context verifier reads the receipt against your spec and tells you, in plain text, whether the run earned its “done” and what to demand from the agent when it didn't.

One-time
cheaper than one silent failure reaching main
$29one-time
  • 300 fresh-context verdicts over 30 days
  • The verifier judges receipt vs spec, never the transcript
  • Grounded in the deterministic grade, cited flag by flag
  • Follow-up questions on any verdict
  • The free structural grader stays free, forever
  • One-time payment · no subscription to cancel
or start with 3 free verdicts »

Sign in, run 3 verdicts free, pay only if the verdicts earn it.

The hosted verifier

Run the fresh-context pass.

Fail closed, sleep fine

Kill the morning transcript scroll.

Grade a receipt free right now, no account. When you want the verdicts, it's $29 once. No receipt, no pass: a run that can't show evidence fails closed and gets redone.