Every run leaves a receipt. The receipt gets graded.
Deterministic structural checks, every flag citing a principle. A success claim with no passing check that plausibly exercises it gets flagged the moment it lands, not three days later in production.
A check without an exit code is a claim that a check ran, not evidence that it did.