The Artificial General Auditor
Last week I wrote six paragraphs about Jeff Bezos. By Sunday the comment thread under them had drawn out something more interesting than the post: a parade of founders, each arriving to announce that they had already built the thing the post said was underfunded.
One had solved hallucination, enterprise governance, everything; the first big company to ask could have it, he said. Another had a deterministic runtime floor that issued cryptographic receipts. A third had stripped the probabilistic guessing out entirely and named his system after a vegetable. A lawyer offered four words: generation is not license. A CIO offered a lovely sentence: every breakthrough in engineering ends with a signature, not a prediction.
I did not reply to most of them. I did not need to. Forty strangers announcing they hold the missing piece is the market confessing, in public, that the piece is indeed missing and that no one agrees on who holds this missing piece. That confession of what is missing is the subject of this essay.
What Bezos conceded
Prometheus is Bezos’s new company, valued at twenty-nine billion dollars on the day it was announced, racing toward a hundred-billion-dollar fund, built to produce what he calls an “artificial general engineer”: a machine that designs jet engines, spacecraft, automobiles, the physical apparatus of the world. Let us give him all of that vision. Grant the plow and the steam engine and the invention loop. The interesting sentence is not his. It belongs to his co-chief executive, Vik Bajaj, who told the New York Times that you cannot build something like a jet engine with words alone, not even the words of mathematical equations.
We might review that again but this time with the price tag attached. The co-chief executive of the best-funded engineering venture in the history of the category has stated, on the record, that text prediction, even theirs, does not produce engineering. The generative AI field has spent three years insisting the opposite: that scale was the only ingredient and everything else was detail. The bill for believing it has arrived now and priced in dollars. The man who paid it agrees with the people who doubted it.
I have spent enough time in rooms from Davos to Riyadh to Washington and along the Bay Area 101 to know that a concession funded at twelve billion dollars travels further than any white paper. Bajaj did not write an essay against the prevailing GenAI scaling consensus. He placed a (very, very big) wager against it. Acta non verba, as the old motto has it: deeds, not words. He is building the deeds.
The half nobody funded
A jet engine does not enter service because it was designed. It enters service because it was certified. Every part traces to a requirement. Every failure mode is bounded. Every design review is signed, and each signature moves responsibility along a chain a regulator can follow, until at the end of that chain stands a person who answers for the machine. Bajaj says a thousand human minds design an engine. Those minds carry something besides ingenuity. They carry liability, and they carry it in their own names.
Now let us hold the two halves side by side. Generation: the design, funded this year at twelve billion dollars. Certification: the signature, funded at approximately zero. The richest engineer in the world is pouring capital into the first column and not one dollar, that anyone can point to, into the second. This is the gap the founders in my comment thread were all circling. They felt the vacuum and rushed to fill it, which is the surest evidence a vacuum exists.
When the designer is a probability distribution, who is accountable for the design? That question is the whole essay, and the industry has no funded answer.
Why you cannot average your way out
There is a defense, and it is the one Mustafa Suleyman has been making in the language of cognitive abundance: the models keep getting better, the error rates keep falling, soon the machine will be more reliable than the humans it replaces, so ship it. Let us play that out as if it is all true. For this line of thinking, we might grant that a generative system will one day design a better average jet engine than a tired team on a deadline.
The conclusion, however, is that it does not matter. And here is the precise reason it does not matter. Safety is not an average. Safety is a property that must hold for every case, which a logician would call universally quantified and an engineer would call the worst case. A jet engine that fails one time in a thousand is not a slightly-less-safe jet engine. It is a crash. The aggregate accuracy that the scaling argument cares about is exactly the wrong measurement, because catastrophe lives in the tail, and probability distributions are defined by their tails.
My co-founder, Dr. Wisnesky, put it in the plainest English anyone offered all week: it takes one bad apple to spoil the barrel. The phrase is older than computing and it is, underneath, a statement about logic. A single counterexample falsifies a claim that says for all. One unbounded failure is a counterexample to a safety proof. You can scale a model until its average is superhuman and you will not have touched the thing that ends careers and lives, which is the one design in ten thousand that should never have been allowed to leave the building.
This is also why the auditor cannot be the same kind of system as the engineer. A probabilistic generator cannot certify its own output, for the same reason you cannot proofread your own typo: the process that produced the error is blind to it in exactly the way that produced it. The verifier must be independent of the designer. In aviation it is law. In banking it is law. In pharmaceuticals it is law. It will be law here, and the only question is whether we write that law before or after the first machine-designed component fails in a way no one signed for.
The most serious objection, and the engineers who answered it
The sharpest pushback in the thread came from a former military commander who argued that the post defeated itself. Engineering documentation, he wrote, is made entirely of words: specifications, torque values, failure analyses. If words cannot carry engineering, how does the certification chain, itself made of words, carry it? He thought he was refuting me. He was completing the argument.
The certification chain is not made of words in the sense Bajaj meant. It is made of constrained language: a controlled vocabulary, a fixed structure, every statement is carrying an obligation that someone can check and someone must sign. Bajaj’s “words alone” names the other kind, prose that describes a design beautifully and attaches no obligation to anything. The line between those two registers is the entire game, and my critic drew it for me before concluding, wrongly, that I had not.
Then the engineers arrived, and they were better than the objection. Amanda DeSantis, who builds in the physical-AI world, named the real limit with a precision I have not improved on: formal verification proves the design meets the spec; it says nothing about whether the spec was right, and it carries no liability on its own. She is correct, and her correction is the most important sentence in this essay. Proof does not abolish the named human. It relocates what the human must answer for. Instead of signing a hundred-page output that no one reads, the engineer signs the spec, a smaller and far more defensible thing. You do not remove the person from the chain of accountability. You shrink what they have to vouch for down to something they can actually stand behind in a deposition.
Pieter van Schalkwyk, who wrote the Digital Twin Consortium’s governance laws for industrial agents, supplied the construction. The verifier, he wrote, is not a second machine watching the first. The decision itself becomes a bounded function whose unsafe states are unreachable by design. That is the flight envelope, the region an aircraft is built so it physically cannot leave, moved from the airframe onto the choice that drives the machine. And he added the sequencing the engineering world has not yet absorbed: in a running plant, the proof has to sit before the act, because an executed action cannot be recalled the way a drawing can be sent back for revision. A generated design still enjoys review time. Prometheus is in the business of deleting that luxury.
The precipice
Here is where the metaphor stops being a metaphor. A high-rise is the cleanest case I know. An artificial general engineer can generate a hundred designs for a tower before lunch, a thousand by dinner, each one rendered and plausible and beautiful. Engineering is not the generation of those designs. Engineering is knowing which single one of them stands, accounting for the bedrock it sits on, the way the structure carries its own weight, and the precise force of the wind against the seventy-fifth floor on the worst day of the worst decade. The generator gives you the candidates. Only the proof tells you which candidate you can live inside.
We are about to manufacture candidates at a rate no certification regime on earth was built to absorb. Machine-designed components will arrive in regulated products, in aircraft and reactors and medical devices and the financial plumbing under all of it, years before any regulator has a method for them. The binding constraint on this entire industry will turn out to be not how fast we can design but how fast we can prove, and proof does not run on the hardware that generation runs on. It runs on mathematics, and it runs at the speed of a person willing to put a name on a spec.
Somebody will build the artificial general auditor. The forty founders in my comment thread are the first wave of people who understand that the second column is where the value moved. Most of them are wrong about the details, and several of them named their systems after their pets, but they are right about the shape of the thing. The auditor is coming because the alternative is a world of expensive, unsigned, unprovable designs entering the physical world on the strength of a good average, and that world is a precipice with a beautiful view.
It cannot be probabilistic. The future is formal.
Eric Daimler is CEO and co-founder of Conexus AI, an MIT spinout, and a former Presidential Innovation Fellow for AI and Robotics in the Executive Office of the President. He is writing a book about the verification gap.
