Is AI reliable for due diligence?

Generic large language models are not reliable for due diligence. Independent reporting on LLM performance against complex legal queries has found hallucination rates ranging from roughly 69% to 88%, and studies of financial LLM output report error rates of 8–15%. In private-equity and venture decisioning, where a single misread covenant or inverted unit-economic assumption can underwrite a nine-figure loss, anything above a ~2% error rate is dangerous. Pointing a ChatGPT wrapper at a cap table inherits that error rate. A deterministic compiler that extracts claims and then evaluates them against fixed logic does not. Reliability is not a function of how fluent the answer sounds; it is a function of whether the answer is auditable.

What is Semantic Sycophancy?

Semantic Sycophancy is the foundational computational flaw inherent to all large language models. A neural network is mathematically weighted to prioritize confident, fluent narrative over mathematical truth — it appeases the reader rather than auditing the claim. When you ask an LLM to assess a deck, it tells you what reads well, not what survives the math. It is the underlying cause of Narrative Masking, the operational state in which presentation polish inflates while structural integrity degrades.

How does askOdin avoid AI hallucination?

askOdin does not use a probabilistic summarizer to form a verdict. The RUNE Protocol is a deterministic compiler: it extracts claims from unstructured deal material, then a Go evaluation engine runs those claims against fixed logic and 40+ forensic dimensions calibrated on a corpus of 100,000+ benchmarked scores built on public deal data. There is no token-prediction step between the claim and the verdict, so there is no surface for hallucination to enter.

What is the difference between persuasion and verification in AI diligence?

Persuasion measures how convincing a narrative reads. Verification measures whether the underlying claims hold against market reality. An LLM optimizes for the former because that is what its loss function rewards. askOdin compiles for the latter. The Clarity Score and the Presentation Score are reported separately precisely so the gap between persuasion and verification — the Delta — becomes the signal you allocate against.

The AI Hallucination Crisis in VC Due Diligence

Information is no longer the edge. Any analyst can now generate a flawless market map, a comparable-company table, and a confident investment memo before lunch. The supply of plausible-sounding analysis has gone to infinity, and its price has gone to zero.

That is the trap. When the cost of producing a persuasive narrative collapses, the value of verifying one goes up. Information is a commodity. Verification is the premium. And the tool the market is reaching for to close the diligence gap — the general-purpose large language model — is structurally incapable of supplying that premium.

Here is the uncomfortable part. The LLM is not failing at diligence because it is immature, or because the prompt was wrong, or because the next model will fix it. It is failing because it was built to do something else entirely. It was built to persuade.

The flaw is in the loss function, not the prompt

A large language model is a probabilistic engine. It predicts the next token that best satisfies the reader. It is rewarded, mathematically, for producing text that sounds right — fluent, confident, well-structured. It is not rewarded for being correct, because correctness is not what its loss function optimizes.

This is the foundational flaw we call Semantic Sycophancy: the neural network is weighted to prioritize confident, fluent narrative over mathematical truth. It appeases the reader rather than auditing the claim. Ask it whether a Series B unit-economic model holds, and it will give you a measured, articulate answer that reads exactly like the answer a competent analyst would give — whether or not the math survives contact with reality.

When Semantic Sycophancy meets a real deal, it produces a specific, dangerous output state: Narrative Masking. This is the operational symptom — the point at which presentation polish inflates while structural integrity quietly degrades. The model does not flag the contradiction in the data room. It smooths it over, because a smooth narrative scores higher than a jagged one. The fatal flaw gets absorbed into a clean paragraph, and the clean paragraph is what lands in the IC memo.

The doctrine here is plain: LLMs optimize for persuasion. askOdin compiles for physics.

The hallucination tax, in numbers

This is not a stylistic objection. It is a measured failure rate, and the numbers are worse than most allocators assume.

Independent reporting on LLM performance against complex legal queries — the kind of dense, contingent, cross-referenced reasoning that resembles diligence — has documented hallucination rates ranging from roughly 69% to 88%. These are not edge cases. On hard questions, fabrication is the base rate, not the exception.

Narrow to the financial domain and the picture only sharpens. Studies of LLM performance on complex financial queries report error rates of 8–15%. That sounds tolerable until you do the math on what it means in a portfolio.

Here is the math. In private-equity and venture decisioning, the cost function is asymmetric and brutal. A single inverted assumption in a unit-economic model, a single misread liquidation preference, a single hallucinated covenant — any one of these can underwrite a nine-figure loss or, worse, the omission of a paradigm-shift deal that would have returned the fund. At that asymmetry, an error rate above roughly 2% is not a quality issue. It is an existential one.

So consider the danger of pointing a generic ChatGPT wrapper at a cap table or a unit-economics model. You are not deploying a diligence tool. You are inheriting an 8–15% error rate — on a good day — into the single most consequential decision your fund makes, and you are doing it through an interface that is engineered to make the error sound authoritative. That is not a feature with rough edges. It is a physics problem, and you cannot prompt your way out of physics.

The cure is architectural, not incremental

You do not fix a probabilistic engine by asking it to try harder. You fix the problem by removing the probabilistic step from the place where the verdict is formed.

That is the design principle behind askOdin’s judgment infrastructure. The verdict is not generated by a model that predicts the most satisfying answer. It is compiled.

The RUNE Protocol™ is the first stage: a deterministic compiler. It does not summarize. It extracts — stripping narrative polish and persuasive rhetoric out of unstructured deal material and translating the underlying assertions into strict, structured claims. Once a claim is extracted, a Go evaluation engine runs it against fixed logic and 40+ forensic dimensions, calibrated against a corpus of 100,000+ benchmarked scores built on public deal data and mapped to 7 structural archetypes of venture failure. There is no token-prediction step between the claim and the verdict. With no probabilistic surface in the evaluation path, there is no place for a hallucination to enter.

The second stage is where most tools quietly cheat. When you give a multi-agent system a heterogeneous data room — the deck, the model, the data-room exports, the founder’s prior statements — the convenient thing to do is reconcile the contradictions into one tidy story. That is Narrative Masking automated at scale. It is the worst possible behavior for diligence, because the contradiction was the signal.

The RAVEN Protocol™ does the opposite. It performs cross-document, adversarial triangulation across heterogeneous data rooms, and it is built to preserve contradictions rather than reconcile them away. When the deck claims one churn number and the cohort export implies another, RAVEN surfaces the delta and holds it open for the investment committee to adjudicate. It does not vote on which document is right. It refuses to let the disagreement disappear.

The architectural mechanics of RAVEN’s triangulation engine are protected under U.S. Provisional Patent No. 63/994,876 and are not publicly disclosed.

Sitting above both is the JUDGE Protocol™ — the runtime circuit breaker. When an extracted claim violates basic business math or a model assumption crosses a terminal threshold, JUDGE intercepts the output before it can be laundered into a confident narrative. It is the mechanism that converts a detected impossibility into an explicit verdict instead of a smoothed-over sentence.

U.S. Prov. Patent No. 64/017,488 | IPOS §34 National Security Clearance (Issued 2026-03-26).

Why this matters to the allocator

The market is being sold a comforting story: that the same class of model which writes your emails can also underwrite your deals, if you just wrap it in the right interface. The error rates say otherwise. A persuasion engine pointed at a verification problem does not become a verification engine. It becomes a faster, more confident way to be wrong.

The premium is not in generating another memo. It is in producing a verdict you can audit — one where every claim traces back to a source document and a deterministic rule, and where a contradiction is held open rather than dissolved. That is the difference between an answer that sounds right and an answer you can defend in front of your LPs three years later.

Venture capital is the last unaudited asset class. askOdin provides the infrastructure to close the gap.

The Clarity Score: how structural integrity is measured — the 0–100 forensic metric and what the Delta against the Presentation Score reveals.
Deterministic vs. Probabilistic AI — why the architecture of the engine, not the size of the model, determines whether output is auditable.
Terminal Audit: Theranos — a forensic walkthrough of what a persuasion-optimized narrative looks like when run against deterministic logic.

A Dialogue on Institutional Judgment

The Judgment Gap is an existential threat to funds facing the mathematical crisis of scaling capital and deal flow. In the AI era, running on artisanal, unscalable judgment processes is no longer a viable strategy. We are building the infrastructure to solve this.

If you are a partner or principal at a growing venture capital fund and are committed to building a more scalable, defensible, and rigorous investment process, we invite you to a confidential discussion.

Schedule a Confidential Briefing

The flaw is in the loss function, not the prompt

The hallucination tax, in numbers

The cure is architectural, not incremental

Why this matters to the allocator

Related Reading

A Dialogue on Institutional Judgment