Now in private beta

A few engineers can now
operate a serious AI lab

Autonomous agents plan and execute your research. You set the direction and make the calls. Every claim is backed by evidence: code, data, metrics, artifacts. Human judgment stays in control.

Read the Manifesto

Already approved? Sign in to your lab shell

proof foundry - training session
claude-4-opus|8x H100|* live
_
artifacts: 7|tokens: 47,291|cost: $847.20
building evidence chain...

The problem

Your best result is
somewhere in there.

Experiments in Weights & Biases. Code in GitHub. Data in S3. Decisions buried in Slack threads. Configs in files named config_final_v2_USE_THIS.yaml.

When reviewers ask for reproduction, you spend weeks digging through commit history, asking teammates who have already left, praying the checkpoint still exists.

“Which config gave us the 94.2% result?”

A question that costs a week every single time.

~/ml-research — the reality
Evidence Chain#qwen72b-ft-0847
Verified
Verified ClaimModel achieves 94.2% on HumanEval+
Evaluationhumaneval_plus_v2.json
Modelqwen72b-ft.safetensors (142GB)
Trainingfinetune.py @ 9a2f7c1
Configlr=2e-5, epochs=3, qlora
Datasetevol-codealpaca-v2 (sha: 8f3a...)
Approvalresearcher@lab.ai • $847.20

Immutable links:Modify any artifact and the entire chain requires re-verification. No more “which config was that?”

The solution

Every claim traces
to its source.

The evidence record links claims to experiments, experiments to code, code to configs, configs to data. Change any piece and the chain breaks, requiring re-verification.

  • Results link to exact code commits
  • Configs captured at execution time
  • Human approvals logged in the chain
  • Orphaned artifacts flagged automatically

“Show me the exact config for our best result.”

One click. Always. Forever.

Human governance

Agents run fast.
You control what matters.

Most operations happen autonomously. But expensive compute, published claims, and evaluation changes require your sign-off. You stay in control without slowing things down.

Configurable approval gates

expensive_compute> $100
publish_claimexternal
modify_evalbenchmarks
data_mutationdatasets

“95% autonomous, 5% human judgment.”

Agents handle the routine. You decide what truly matters.

governance demo
running

The vision

The future lab is small

Building a frontier research lab used to require hundreds of researchers and dedicated infrastructure teams. A handful of exceptional engineers should have the same leverage.

3
engineers
N
AI agents
100x
research output

From the community

Fine-tuned a domain-specific LLM in 3 days. My PhD lab took 2 months on similar work. The evidence chain meant my advisor actually trusted the results.

Sarah Chen

PhD Student, Stanford AI Lab

We shipped a paper-quality ablation study with 2 engineers. Our 15-person competitor took 6 months on the same problem. Research memory is unreal.

Marcus Rodriguez

Technical Founder, Stealth AI Startup

Finally stopped losing experiments to entropy. Six months of research history, fully queryable. I can ask why we abandoned an approach and get the real answer.

Dr. James Park

ML Research Lead, Frontier Labs

“One to three cracked engineers. Autonomous research agents. Strong human judgment. A system that plans, runs, verifies, and remembers.”

That is the lab we are building.

Early Access

Join the waitlist

We are onboarding teams in private beta. Tell us what you are working on and we will reach out with a private signup link or access code when a spot opens.

We respect your privacy. No spam, ever.