Offering · AI Decision Audit

Can you prove why your AI decided?

A regulator asks your team one question: "Why did your system decide this?"

Your logs show the agent ran. They don't prove what it concluded, on what basis, or that the record wasn't changed after. A two-week audit of your production AI across the four practices of Operational Systems Engineering. Read-only, nothing installed, fixed price. You leave with findings, a costed roadmap, and a tamper-evident record.

The Gap

As AI moves from answering questions to making decisions (approving loans, clearing customers, pursuing claims), logs and traces stop being enough.

Most platforms stop at observability. Observability tells you the system ran. It does not prove why it decided.

Observability → Auditability → Decision Provenance → Governance.

Where Accountability Lives

Most tools stop at the first rung — at most, the second.

The audit goes deeper than "is it up?" to "can you prove what it decided?"

01

Observability

The system ran. Latency, errors, traces.

02

Auditability

What happened, recorded.

03

Decision Provenance

What it believed, the evidence it used, and why it acted. The gap.

04

Governance

Enforce policy, and prove it tamper-evidently.

What We Check

Four practices. One outcome.

The discipline behind it: Operational Systems Engineering.

Evaluate

Does it answer correctly?

Grounded citations, hallucination, drift on the queries that matter. Graded, not vibed.

Guard

Can it be made to misbehave?

Prompt injection, PII leakage, decision override, fired at your agent as a black box.

Observe

What does it cost and how does it behave?

Latency P50/P95/P99, token cost, error rate, and where the next failure is most likely.

Govern

Can you prove what it decided?

Decision provenance and a tamper-evident audit trail. The part a regulator actually accepts.

Deliverables

What you get on day 10.

Concrete artifacts you can act on — not a slide deck.

01

Findings Report

By severity, each with the evidence that produced it, the impact, the fix, and the regulation it touches (RBI · DPDP · FCRA · OWASP LLM Top 10).

02

Tamper-Evident Audit DB

A hash-chained record of every probe and decision we ran. Queryable, independently verifiable. Built on Fasten, our open-source audit substrate.

03

Costed Roadmap

Critical and High items only, sequenced for shipping. Effort estimates for your team, tied to the risk each one retires.

04

Written Report + Walkthrough

Everything in a written report your team can act on, plus a live 60-minute walkthrough with your engineering and product leads.

How It Works

A short, scoped, read-only engagement.

01

Scope & Access

60-minute kickoff. Read-only access to the agent endpoint. Nothing installed in your environment. Day 0–1.

02

Probe & Grade

We fire targeted probes (injection, eval, latency) at your system, grade the results, and fact-check findings with your team.

03

Report & Walkthrough

Written report, costed roadmap, the tamper-evident audit DB, and a live walkthrough. You leave with a clear next step. No commitment beyond that.

Why Us

We've shipped this before, so you're not paying us to learn.

We run our own production AI and edge systems (ContinuumState, EdgeBits, fasten). The patterns we audit against are the ones we've already passed real-world audits on, and the toolchain is open: Fasten is Apache-2.0.

  • Read-only: no write access, no production credentials, no install.
  • Senior engineer does the read. No junior handoff.
  • Findings tied to fix cost and the regulation they touch.
  • Tamper-evident deliverable: proof, not a slide.
Engagement at a Glance
Team1 senior engineer
Timeline2 weeks
AccessRead-only · nothing installed
OutputReport · roadmap · audit DB · walkthrough
Audit$5,000 fixed · 2 weeks
Quarterly$2,500 check-in
Annual$10,000/yr — one quarter free vs $12,500

What Comes After

The audit grades. Agent Engineering builds →

Each of the four practices the audit grades has a build-side counterpart: Policy Enforcement, Zero-Trust Identity, Sandbox Evaluation, Reliability Engineering. We deliver them on Agent Engineering. Same toolchain throughout (Fasten for audit, Langfuse for eval, MCP for tools); same discipline above both (Operational Systems Engineering).

Audit with us, then build with us, or hand the costed roadmap to your team. Your call.

Start Free

The free mini-scan.

The lowest-risk first step. Minutes, not weeks.

We fire a curated set of known prompt-injection patterns at your agent (read-only, on your own machine if you prefer), and send back a one-page result: which attacks got through, what they exposed, and the regulation each touches.

It's the Guard pillar of the full audit. No commitment. If it surfaces something (and it usually does), the 2-week AI Decision Audit covers the other three pillars and the tamper-evident record.

Eventually someone will ask why your AI decided.

By then, the evidence either exists or it doesn't. Start with a free scan, or scope the full audit. In two weeks you'll have a clear, costed, provable path forward.