Now in private beta

A few engineers can now
operate a serious AI lab

Autonomous agents plan and execute your research. You set the direction and make the calls. Every claim is backed by evidence: code, data, metrics, artifacts. Human judgment stays in control.

Read the Manifesto

Already approved? Sign in to your lab shell

proof foundry - training session

claude-4-opus|8x H100|* live

artifacts: 7|tokens: 47,291|cost: $847.20

building evidence chain...

The problem

Your best result is
somewhere in there.

Experiments in Weights & Biases. Code in GitHub. Data in S3. Decisions buried in Slack threads. Configs in files named config_final_v2_USE_THIS.yaml.

When reviewers ask for reproduction, you spend weeks digging through commit history, asking teammates who have already left, praying the checkpoint still exists.

“Which config gave us the 94.2% result?”

A question that costs a week every single time.

~/ml-research — the reality

Evidence Chain#qwen72b-ft-0847

Verified

◆

Verified ClaimModel achieves 94.2% on HumanEval+

◇

Evaluationhumaneval_plus_v2.json

◇

Modelqwen72b-ft.safetensors (142GB)

◇

Trainingfinetune.py @ 9a2f7c1

◇

Configlr=2e-5, epochs=3, qlora

◇

Datasetevol-codealpaca-v2 (sha: 8f3a...)

◇

Approvalresearcher@lab.ai • $847.20

Immutable links:Modify any artifact and the entire chain requires re-verification. No more “which config was that?”

The solution

Every claim traces
to its source.

The evidence record links claims to experiments, experiments to code, code to configs, configs to data. Change any piece and the chain breaks, requiring re-verification.

Results link to exact code commits
Configs captured at execution time
Human approvals logged in the chain
Orphaned artifacts flagged automatically

“Show me the exact config for our best result.”

One click. Always. Forever.

Human governance

Agents run fast.
You control what matters.

Most operations happen autonomously. But expensive compute, published claims, and evaluation changes require your sign-off. You stay in control without slowing things down.

Configurable approval gates

expensive_compute> $100

publish_claimexternal

modify_evalbenchmarks

data_mutationdatasets

“95% autonomous, 5% human judgment.”

Agents handle the routine. You decide what truly matters.

governance demo

running

The vision

The future lab is small

Building a frontier research lab used to require hundreds of researchers and dedicated infrastructure teams. A handful of exceptional engineers should have the same leverage.

engineers

AI agents

100x

research output

From the community

“Fine-tuned a domain-specific LLM in 3 days. My PhD lab took 2 months on similar work. The evidence chain meant my advisor actually trusted the results.”

Sarah Chen

PhD Student, Stanford AI Lab

“We shipped a paper-quality ablation study with 2 engineers. Our 15-person competitor took 6 months on the same problem. Research memory is unreal.”

Marcus Rodriguez

Technical Founder, Stealth AI Startup

“Finally stopped losing experiments to entropy. Six months of research history, fully queryable. I can ask why we abandoned an approach and get the real answer.”

Dr. James Park

ML Research Lead, Frontier Labs

“One to three cracked engineers. Autonomous research agents. Strong human judgment. A system that plans, runs, verifies, and remembers.”

That is the lab we are building.

Early Access

Join the waitlist

We are onboarding teams in private beta. Tell us what you are working on and we will reach out with a private signup link or access code when a spot opens.

A few engineers can nowoperate a serious AI lab

Your best result issomewhere in there.