What does an engagement cost?

Most engagements range from $5,000 to $40,000 fixed-scope. Trust audits start at $5k, RAG systems $12–25k, full multi-agent builds $25k+. Fractional AI engineering leadership is available on retainer.

What stack do you build on?

Next.js + TypeScript on the frontend; serverless on AWS (Lambda, DynamoDB, S3, Bedrock); LLMs via Bedrock, OpenAI, or Anthropic; RAG via OpenSearch or pgvector.

Taking 2 projects · Q2 2026

Trust infrastructure
for autonomous agents.

Q: What does Sadaf Labs do?

Sadaf Labs is a hybrid lab and studio. The studio provides selective AI consulting (RAG, multi-agent systems, AI MVPs), and the lab builds open-source trust infrastructure for autonomous agents — policy enforcement, audit trails, and replayable evals.

Q: How long does an engagement take?

Trust audits run 1–2 weeks. RAG systems take 3–4 weeks. Multi-agent builds run 4–8 weeks. AI MVPs are scoped at 30 days.

Q: Do you take new projects?

We take 2 client projects per quarter. Free 20-minute scoping calls are open to anyone with an agent in or going to production.

We design and ship the boring-but-critical pieces every agent needs before production: policy gates, audit trails, and replayable evals. Lab + studio.

Book a 20-min intro See case studies

SOC2-ready patterns Replayable evals Least-privilege tooling Human-in-the-loop

How trust is enforced

Agent → Policy → Audit → Eval

Built on the stack you already trust

AWS Lambda Bedrock OpenAI Anthropic Next.js Postgres DynamoDB OpenSearch Vercel TypeScript AWS Lambda Bedrock OpenAI Anthropic Next.js Postgres DynamoDB OpenSearch Vercel TypeScript

/workflows

Patterns we ship over and over

Four reference architectures we've productionized. Click through to watch each draw itself.

/workflows

How agents earn trust

Click through the patterns we ship. Each diagram animates the data flow as it draws.

Every agent step gated by policy, logged, and replayable.

Agent Trust Pipeline

/consulting

What we build

Fixed-scope packages so you know what you're getting. Custom welcome — every project starts with a free scoping call.

All packages →

Agent Trust Audit

Map your live agent, find policy gaps and prompt-injection surfaces, ship a hardened system prompt + audit-log scaffold.

from $5k1–2 weeks

RAG Systems

Search and answer over your docs, tickets, and code with citations — not hallucinations.

from $12k3–4 weeks

Multi-Agent Workflows

Supervisor + specialist agents that automate real workflows with humans in the loop.

from $25k4–8 weeks

AI MVP in 30 days

Idea → working product → first paying user. One sprint, fixed scope, real code.

from $20k4 weeks

/principles

How we work

We're engineers first. The lab and the studio share a bias: ship the boring infrastructure right, then move fast on top of it.

Real shipped work

Role, period, stack — verifiable. No vanity metrics, no fabricated logos.

Boring infra

Lambda, Postgres, S3. The hot framework du jour can wait.

Trust by default

Every agent gets policy + audit + replay before it sees real users.

Honest on scope

If LLMs are wrong for the job, we say so. Often they are.

OSS lab tools

PolicyLint · Trace Replay · Eval Harness

Lambdas in production

across recent client systems

Projects this quarter

selective by design

10y

Production miles

before we touched LLMs

/lab

Currently in the lab

Working drafts. Some are open source, some are in design-partner pilots. Real code, real status — no inflation.

Alpha · OSS

PolicyLint

Static analyzer for agent system prompts. Flags jailbreak surfaces, missing refusals, unbounded tool scope.

Try the demo

Design partner

Trace Replay

Record agent traces in prod, replay them in CI. Catch regressions before users do.

Join the pilot

OSS

Eval Harness

Lightweight golden-prompt eval runner that fits in a single CI step.

View on GitHub

/projects

Selected case studies

Real systems shipped to real customers. Each case study has the architecture, problem, and outcome.

All projects →

RAG · Multi-agent−68% response time

Support Copilot for SaaS

RAG-powered ticket triage with grounded citations and human handoff.

Read case study

OCR · LLM10× faster review

Document Intelligence Platform

Structured extraction over long-form legal contracts with audit trail.

Read case study

Agents · CRM3× pipeline velocity

Outbound Sales Agent

Lead qualification + reply drafting with human approval gate.

Read case study

/track-record

Receipts.

What we've actually shipped — with role, period, and stack. No invented numbers. References available on the intro call.

Full track record

Multi-service serverless platform
2024 — present
Senior engineer
5 services, 99 Lambdas in production. ARM64 + memory tuning cut p95 cold-start materially. Owned authn/authz.
RAG support copilot
2025
Architect + builder
End-to-end RAG over a B2B SaaS knowledge base with cited answers and human handoff.
Document intelligence pipeline
2025
Architect + builder
Structured extraction over long-form legal contracts with reviewable audit trail.

/demos

Try it in your browser

Mini-apps you can use right now. No login, no API keys, runs client-side. Simplified previews of production systems we ship.

All demos →

PolicyLint

Paste an agent system prompt → flag jailbreak surfaces, missing refusals, unbounded tool scope.

Static analysis

Try it

Doc Q&A

Paste a document, ask questions, get cited answers.

RAG · keyword retrieval

Try it

Meeting Notes Summarizer

Paste a transcript, get TL;DR + action items.

Extraction · summarization

Try it

/faq

Honest answers, up front

What does Sadaf Labs do?+

Hybrid lab + studio. The studio is selective AI consulting (RAG, multi-agent, AI MVPs). The lab builds open-source trust infrastructure for agents — policy, audit, evals.

How long does an engagement take?+

Trust audits 1–2 weeks. RAG systems 3–4 weeks. Multi-agent builds 4–8 weeks. MVPs scoped at 30 days.

What does it cost?+

Fixed-scope, $5k–$40k for most engagements. Trust audits start at $5k, RAG $12–25k, multi-agent $25k+. Retainer available for fractional AI lead.

Do you take new projects?+

2 per quarter. Free 20-min scoping call open to anyone with an agent in or going to production.

What stack?+

Next.js + TypeScript, AWS serverless (Lambda, DynamoDB, Bedrock), LLMs via Bedrock/OpenAI/Anthropic, RAG via OpenSearch or pgvector.

Are you raising?+

Pre-seed planning. Bootstrapped today via consulting. Investor brief is gated — email hello@sadaf-labs.com for the passcode.

/investors

Building for the agent-trust market

Pre-seed, bootstrapped via consulting today. The lab tools you see above are the product wedge — managed agent trust as a service in 2026. The investor brief is gated.

Investor brief Request passcode

Got an agent in production? Let's harden it.

Free 20-minute call. We'll map your agent on the whiteboard, find the trust gaps, and decide if we're a fit.

Book a call Try the demos

2 slots open this quarter hello@sadaf-labs.com

Trust infrastructurefor autonomous agents.

Patterns we ship over and over

How agents earn trust

What we build

Agent Trust Audit

RAG Systems

Multi-Agent Workflows

AI MVP in 30 days

How we work

Currently in the lab

PolicyLint

Trace Replay

Eval Harness

Selected case studies

Support Copilot for SaaS

Document Intelligence Platform

Outbound Sales Agent

Receipts.

Multi-service serverless platform

RAG support copilot

Document intelligence pipeline

Try it in your browser

PolicyLint

Doc Q&A

Meeting Notes Summarizer

Honest answers, up front

Building for the agent-trust market

Got an agent in production? Let's harden it.

Trust infrastructure
for autonomous agents.