What is measured?
TrustableClaw measures whether AI work can produce receipts, verify those receipts, and detect tampering.
TrustableClaw tests AI agents not only for output quality, but for governability, auditability, receipt verification, and tamper detection.
These benchmarks show whether AI work can be recorded, verified, and proven.
TrustableClaw reports 100% auditability coverage across OpenAI's full HumanEval benchmark, generating, verifying and detecting tamper attempts across all 164 standardized test cases without a receipt-verification failure.
Does TrustableClaw record and verify AI failures, not just successes?
To find out, we ran a 20-task SWE-bench Lite pilot where GPT-5.4 mini failed every single task. Every failure was fully recorded, cryptographically receipted, and tamper-verified without a single auditability gap. Because governance that only works when the AI succeeds is not governance at all.
This section reviews the technical claims in the Rational AI vs. LLM infographic. Each claim is evaluated against observed TrustableClaw architecture and runtime behavior, with verdicts showing which claims are backed, nuanced, or unsupported.
