Graded by a real deployment, not a judge model.
Write-isolated verification infrastructure for cloud-infrastructure agents. The reward is earned against live infrastructure, and it cannot be faked.
Frontier coding models look nearly solved on short, single-file benchmarks, and still struggle with the long-horizon infrastructure work that runs real production systems. We believe that gap is a data problem, not a capability ceiling. There is very little long-horizon training signal a lab can actually trust.
Assay exists to build that signal. We turn real cloud-engineering work into training environments where an agent provisions, migrates, and repairs actual infrastructure, and where the grade comes from the deployed system itself, not from a model's opinion of the work.
The verifier runs where the agent cannot reach it. The record it writes cannot be edited, replayed, or forged. Trust is the product; everything else follows from it.
Sam Silva
Founder, Assay
Two accounts. One wall.
The agent works and deploys in its own cloud account. The verifier observes from a separate trust domain, grades what actually exists, and signs a record the agent can never touch.
Trust domain A agent
Live infrastructure
The deployed system, observable state
Agent
Writes infrastructure as code
Trust domain B verifier
Verifier
Grades deployed state deterministically
Signed record
Immutable, independently verifiable
Deploy-graded.
Success is observed from the state of live infrastructure. Deploys succeed or fail; resources exist or they do not.
Write-isolated.
Grading runs in a separate cloud account. No shared credentials, no network path, no way for the agent to reach its own reward.
Independently verifiable.
Every grade is a signed, write-once record. A lab can verify each one with a public key, without trusting our word.
We are small on purpose.
Assay is a founding team being assembled around one uncompromising design. If verification is your research problem, we should talk.