Trust & privacy

Review agent updates without exposing what's private.

Verifiable Labs is designed for security review — redacted evidence, approval-gated exports, and private evaluation boundaries by default.

Book a demo Watch demo

0: Customer data in results
100%: Runs redacted by default
Approval: Required before any export
Every run: Produces an evidence record

The principle

Private by default. Evidence when needed.

Security review shouldn't mean handing over your data. Verifiable Labs reviews agent updates without exposing customer data, private evals, hidden cases, gold answers, raw traces, or secrets.

Redacted evidence

A reviewable record that reveals scores, not secrets

Every run produces a Generalization Card built for review — the decision, the machine reasons, and the per-suite deltas. The sensitive material that produced them never appears.

Decision, reasons, and per-suite score deltas
Engine verdicts and the policy that was applied
Redacted by default on every run

Generalization Card

decision: BLOCK
reasons: ood_regressed · hidden_regressed
public: 0.740 → 0.910
hidden: 0.732 → 0.611
ood: 0.701 → 0.488
record: redacted · reviewable

✗ candidate not promoted

Private boundaries

Hidden cases stay sealed

The baseline, hidden cases, gold answers, and raw traces stay inside the evaluation boundary. Candidates run against them, but nothing crosses back out into a result.

No customer data in results or demos
Hidden cases and gold answers never exposed
Managed or bring-your-own-key / self-hosted routing

Approval-gated exports

Nothing leaves the workspace without sign-off

Exports require approval before an evidence record can leave the workspace, with a full trail of who requested, approved, and downloaded it.

Request → approve → export, with an audit trail
Role-based approval controls
Audit-ready records for security and compliance

What a Generalization Card discloses

The gate decision — ship, block, or limit
Machine reasons behind the decision
Per-suite score deltas (scores, not answers)
Contamination & anti-hack engine verdicts
The gate policy that was applied

What it never contains

Customer data
Hidden evaluation cases
Gold answers
Raw model traces
Secrets or credentials

Improve what fails. Ship what holds.

Bring a baseline and candidate agent workflow. Verifiable Labs will show which updates should ship, which should be blocked, and which need limited rollout.

Book a demo Watch demo