Private beta opening for teams shipping AI agents.·Watch the demo

Trust & privacy

Review agent updates without exposing what's private.

Verifiable Labs is designed for security review — redacted evidence, approval-gated exports, and private evaluation boundaries by default.

0
Customer data in results
100%
Runs redacted by default
Approval
Required before any export
Every run
Produces an evidence record

The principle

Private by default. Evidence when needed.

Security review shouldn't mean handing over your data. Verifiable Labs reviews agent updates without exposing customer data, private evals, hidden cases, gold answers, raw traces, or secrets.

Redacted evidence

A reviewable record that reveals scores, not secrets

Every run produces a Generalization Card built for review — the decision, the machine reasons, and the per-suite deltas. The sensitive material that produced them never appears.

  • Decision, reasons, and per-suite score deltas
  • Engine verdicts and the policy that was applied
  • Redacted by default on every run

Generalization Card

decision
BLOCK
reasons
ood_regressed · hidden_regressed
public
0.740 → 0.910
hidden
0.732 → 0.611
ood
0.701 → 0.488
record
redacted · reviewable

✗ candidate not promoted

Private boundaries

Hidden cases stay sealed

The baseline, hidden cases, gold answers, and raw traces stay inside the evaluation boundary. Candidates run against them, but nothing crosses back out into a result.

  • No customer data in results or demos
  • Hidden cases and gold answers never exposed
  • Managed or bring-your-own-key / self-hosted routing
Private evaluation boundary

Approval-gated exports

Nothing leaves the workspace without sign-off

Exports require approval before an evidence record can leave the workspace, with a full trail of who requested, approved, and downloaded it.

  • Request → approve → export, with an audit trail
  • Role-based approval controls
  • Audit-ready records for security and compliance
Approval-gated export flow

What a Generalization Card discloses

  • The gate decision — ship, block, or limit
  • Machine reasons behind the decision
  • Per-suite score deltas (scores, not answers)
  • Contamination & anti-hack engine verdicts
  • The gate policy that was applied

What it never contains

  • Customer data
  • Hidden evaluation cases
  • Gold answers
  • Raw model traces
  • Secrets or credentials

Improve what fails. Ship what holds.

Bring a baseline and candidate agent workflow. Verifiable Labs will show which updates should ship, which should be blocked, and which need limited rollout.