Databases in the Age of AI

Chapter I

The Reader Changes

For fifty years, every database ever built shared one design assumption so universal that nobody bothered to write it down: the queries come from people. A person fills a screen, an application translates it to SQL, the database answers, a person reads the result. Every layer of the stack — the connection pool sized for hundreds of clients, the optimizer tuned for a known workload, the BI dashboard polled each Monday morning — encodes the rhythm of human attention. Humans query twice a day. The systems were built accordingly.

That assumption just expired. In May 2025, when Databricks paid roughly a billion dollars for Neon, a serverless Postgres startup, the acquisition came with a statistic that deserves to be remembered as the moment the era turned: more than 80 percent of the databases provisioned on Neon were created by AI agents, not by humans. Not queried by agents — created by them, as casually as a function allocates a variable.

CNBC, May 14 2025: Databricks acquires Neon for ~$1B; Ali Ghodsi cites the 80% agent-provisioned figure. cnbc.com

A month later, Anthropic published a blog post titled “How Anthropic enables self-service data analytics with Claude,” reporting that 95 percent of business analytics queries inside the company are now answered by an agent rather than a human analyst. A widely shared LinkedIn-style reading of that post concluded, with the confidence native to the genre, that this is why Databricks will “absolutely explode” over the next decade: agents need huge, analytic-ready, well-defined tables, Spark makes those tables cheaply, Unity Catalog governs their definitions, therefore Databricks is where the agents will get their data.

Anthropic, June 3 2026. claude.com/blog

This essay takes that argument seriously — more seriously, in fact, than the post takes itself. Because hidden inside the cheerful vendor syllogism is a genuinely deep question, the kind Alex Petrov spent three hundred pages of Database Internals teaching us how to ask: when the client changes, which parts of the database are physics, and which parts are merely habit?

Petrov’s book — still the best single map of how data systems actually work — has a quiet thesis running beneath the B-trees and the Paxos proofs: there is no perfect database, only frozen arguments about trade-offs. Every storage engine is a bet about who will read, who will write, how often, and how patiently. For half a century, all of those bets priced in a human on the other end of the wire. The agentic era doesn’t end the argument. It changes who’s arguing — from clients that query twice a day to clients that query twice a second, spawn a thousand siblings, and never get tired of checking their own work.

Every storage engine is a frozen argument about trade-offs. AI doesn’t end the argument — it changes who’s arguing.

What follows is a tour of the stack in the order Petrov would take it: first the evidence from the one company that has actually run analytics agents at scale, then the storage physics underneath, then the new agent-native database species, then the war over meaning itself — and finally an honest accounting of who captures the value, which is where the LinkedIn syllogism gets interesting, and wrong.

Chapter II

One Correct Answer

The Anthropic post is the most important document yet published on analytics agents, not because of what it promises but because of what it measures. The headline — 95 percent of business analytics queries automated, at roughly 95 percent aggregate accuracy — is less interesting than the ablations underneath it.

21%

accuracy of the raw agent on Anthropic’s internal evals — frontier model, full data access, no curated context

95%+

accuracy with “skills”: curated markdown encoding definitions and analyst procedure

→65%

where accuracy decayed after one month without skill maintenance — context rots at the speed of schema change

+6%

accuracy from an adversarial-review subagent, bought with 32% more tokens and 72% more latency

Read those numbers again. The model was never the bottleneck. A frontier model with full access to the warehouse scored 21 percent. The same model, handed a few dozen curated reference files and a runbook describing how a senior analyst actually works, scored above 95 — and in some domains 99. The gap between a useless analytics agent and a production one is not intelligence. It is context engineering over governed definitions — which is to say, it is data modeling, the least glamorous discipline in software, suddenly load-bearing for the most glamorous one.

The post’s central diagnosis deserves quoting verbatim, because it is the thesis of the entire agentic-analytics era:

“Coding is an open-ended solution space that rewards the models’ creativity… In contrast, for analytics use cases, there’s often only a single correct answer using a single correct source, in which there’s no deterministic way of proving the correctness.”

“By far the most common failure is that the agent can’t map a concept (‘revenue for product X’) to the single correct table, column, and metric definition, usually because there are multiple plausible candidates with subtly different implementations… If revenue resolves to one governed dataset instead of forty plausible candidates, the problem largely disappears before the agent ever has to search.”
— Anthropic, “How Anthropic enables self-service data analytics with Claude,” June 2026

Wrong code fails a test. A wrong metric gets pasted into a board deck and silently corrodes a decision. This asymmetry is why the post insists that “the most important aspect of ensuring analytics agents are accurate is via strong data foundations” — dimensional models, transforms, tests, metadata — and why it notes, against the fashionable view that compute solves everything, that “physical rollups and caches still matter for cost and performance,” so long as they derive mechanically from canonical models rather than competing with them.

Trust hierarchy in the post: semantic layer › lineage › query corpus › business context. The semantic layer is “the mandatory default path for every data question.”

Two of the ablations are stranger and more instructive. First: giving the agent grep access to thousands of historical SQL queries — a corpus that contained correct answers for roughly 80 percent of the questions it got wrong — moved accuracy by less than one point. The answers existed; retrieval over an ungoverned pile couldn’t surface them. Search is not a substitute for curation. Second: Anthropic tried having the model bootstrap the semantic layer itself from raw tables and query logs. The result “produced plausible-looking definitions that encoded the very ambiguities we were trying to eliminate.” The machine, asked to invent meaning, faithfully reproduced the organization’s confusion. Their conclusion: let Claude draft the documentation, but a human owns the definition.

Independent evidence agrees with embarrassing consistency. The Spider 2.0 benchmark — 632 real enterprise text-to-SQL problems over thousand-column BigQuery and Snowflake schemas — launched with frontier models solving 17 percent, against 86.6 on the toy academic predecessor. Cube’s paired benchmark found that adding a 4-kilobyte semantic-layer document — one afternoon of analyst work — improved three different frontier models by 17 to 23 points, and that “the presence or absence of the semantic-layer document accounts for essentially all of the significant variance… model choice does not.” dbt Labs measured the same shape and supplied the best one-line summary in the literature: “With text-to-SQL, failure looks like a plausible but incorrect answer. With the semantic layer, failure looks like an error message.”

Spider 2.0: arXiv 2411.07763 (ICLR ’25 oral). Cube paired benchmark: cube.dev. dbt 2026 benchmark: getdbt.com.

So the LinkedIn post’s first premise survives scrutiny, strengthened: agents do need governed, analytic-ready, well-defined tables, and the companies that have them get order-of-magnitude better agents than the companies that don’t. Hold that thought. The question of who profits from it is four chapters away, and the answer is not a syllogism.

Chapter III

The Physics Doesn’t Care

Descend now from semantics to storage, into Petrov country. Database Internals is organized as a long negotiation with the memory hierarchy: RAM at a hundred nanoseconds, NVMe at a hundred microseconds, object storage at a hundred milliseconds — five orders of magnitude between the top and the bottom, every structure in the book a scheme for avoiding random I/O on slow media. B-trees exist because binary trees have catastrophic fanout on disk. LSM-trees exist because sequential writes are cheap and random ones are not. The RUM conjecture — you may optimize read, update, and memory overheads, pick two — governs everything like a conservation law.

Petrov, Database Internals (O’Reilly, 2019). RUM conjecture: Athanassoulis et al., EDBT 2016.

None of that is repealed by intelligence. An agent issuing ten thousand writes per second through Raft pays the same quorum round-trip a human application pays. S3 is exactly as far from RAM no matter how eloquent the client. What the AI era does instead is more interesting: it adds new access methods to Petrov’s taxonomy, promotes old footnotes to headlines, and shifts the workload’s center of gravity.

The new access method. Petrov’s index chapters cover B-trees, LSM-trees, and hashing. The AI era adds a fourth first-class citizen: the approximate nearest-neighbor index — HNSW’s navigable small-world graphs, IVF with product quantization, and DiskANN, whose entire contribution is a deeply Petrovian move: re-imposing disk-friendly layout on a graph structure so a billion vectors fit on one node’s SSD. Vector indexes obey the old trade-off triangle conspicuously — HNSW is read-fast and memory-gluttonous, IVF trades recall for space — but they add something genuinely new: a recall axis. For the first time, correctness itself became a tunable parameter of the access method. And in production, vector indexes are converging on LSM-like segment-and-merge architectures with tombstoned deletes. It is Chapter 7 of Petrov, with cosine distance.

HNSW: Malkov & Yashunin, 2016. DiskANN: Subramanya et al., NeurIPS 2019. Now native in SQL Server 2025, Oracle 23ai, MongoDB Atlas, Elastic, pgvector.

The confession. Learned indexes — Kraska, Beutel, Dean and Polyzotis’s 2018 “The Case for Learned Index Structures,” refined by ALEX and the PGM-index — replace B-tree internals with a model of the key distribution’s CDF. Their practical wins remain narrow: read-mostly, in-memory, sorted numeric data. But their conceptual contribution is permanent. A B-tree was always an implicit statistical model of your data; learned indexes merely make it confess. The deeper agents reach into databases, the more the index, the optimizer, and the cache reveal themselves as what they always were — machine learning with worse marketing.

The buffer pool ate the database. The largest structural shift since Petrov’s book went to press is disaggregation. Aurora’s slogan — the log is the database — became Neon’s architecture: compute ships WAL to a Paxos quorum of safekeepers while pageservers materialize pages onto S3; the “database server” dissolves into a stateless cache hierarchy over an object store. WarpStream did the same to Kafka. In Petrov’s terms, Chapter 5’s humble page cache, scaled to a datacenter, became the storage engine — and replication-for-durability quietly delegated itself to S3’s internal redundancy, with consensus shrinking to a small metadata plane. The WAL used to be the database’s private diary. Aurora and Neon published it.

Agents change the demand curve, not the physics. A quorum still costs a round trip, no matter how eloquent the client.

And the workload inverts. When the client is an agent, the query stream changes shape in ways every layer of the engine feels. Agents fan out speculatively — bursts of schema introspection, sampling, trial aggregations, most results discarded — which makes information_schema a hot path and scale-to-zero compute an architectural requirement rather than a billing gimmick. Agents re-ask semantically identical questions across thousands of sessions, which promotes result caches, query fingerprinting, and incrementally-maintained materialized views from optimizations to core product. Agents tolerate staleness when exploring and demand freshness when concluding, which turns Petrov’s consistency spectrum — linearizable, causal, session, eventual — from an architect’s dilemma into a price menu the client shops from per query. And agents think slowly between statements: a pessimistic lock held across an LLM’s multi-second pause is poison, so optimistic MVCC validation wins by default.

The physics holds. The demand curve flips. What the demand curve now wants most of all is the subject of the next chapter: it wants the database to behave like git.

Chapter IV

Git for Data

An agent’s dream database looks like git: branch instantly, write freely, diff the result, merge or discard without ceremony. Which is to say — copy-on-write, one of the oldest tricks in Petrov’s Chapter 6, where it lived for decades as an implementation detail of LMDB’s B-trees. The agentic era’s most consistent product pattern is the promotion of that page-level trick to the product’s entire surface.

Neon branches a Postgres database at a WAL position in milliseconds, and markets checkpoints and instant rollback explicitly “for agents” — a fork per experiment, validated, then merged or thrown away. Databricks bought it and productized it as Lakebase, governed under Unity Catalog and reportedly growing twice as fast as the company’s warehouse product. Tiger Data answered with “Agentic Postgres” and zero-copy forks, then — on June 9, 2026, the day before this essay — launched Ghost, “a database service designed and built specifically for AI agents”: unlimited databases, hard spend caps, native MCP. AgentDB lets an agent create a database by generating a UUID, on the thesis that “agents create 1000× more databases than humans.” Turso rewrote SQLite in Rust so that a database is a file, not a process — no cold start, a database per agent per tenant. Supabase, where AI builders auto-provision a backend per workspace, reports that over 60 percent of new databases are spun up by AI coding tools, and raised at $10.5 billion on the strength of it.

Lakebase: databricks.com. Tiger Ghost: June 9 2026. AgentDB: agentdb.dev. Supabase $10.5B round: June 4 2026, prnewswire.com.

80%

of Neon databases provisioned by agents at acquisition — the database as disposable artifact, not infrastructure

60%+

of new Supabase databases created by AI coding tools, Claude Code the largest single contributor

1000×

AgentDB’s thesis: how many more databases agents create than humans (directional, vendor-claimed — and directionally believable)

The interface changed with the lifecycle. The agent’s port of entry into a database is no longer a driver or a BI tool but MCP — the Model Context Protocol — and every serious vendor now ships an official server: Snowflake exposing Cortex Analyst as tools, Supabase splitting admin from data planes, Tiger shipping “master prompts” that amount to a packaged DBA in a system message. The cautionary tale arrived on schedule: Anthropic’s own reference Postgres MCP server was deprecated after a SQL-injection flaw bypassed its read-only restrictions. The emerging consensus — read-only credentials for production, session sandboxes, and branch-per-experiment as the safe write path — is the transaction-isolation debate of the 1980s replayed at the agent layer.

Datadog Security Labs on the Postgres MCP injection: securitylabs.datadoghq.com.

Meanwhile a genuinely new workload arrived: the agent’s own memory. Zep’s Graphiti stores knowledge as a bi-temporal graph — every edge carries the time it became true and the time it stopped being true; contradictions invalidate rather than delete. Mem0 runs an extraction-and-consolidation pipeline over a vector store and claims 90-percent token savings against stuffing full history into context. Strip the branding and the requirements are a database researcher’s wish list: hybrid retrieval (vector plus BM25 plus graph traversal) in one engine, temporal validity intervals, upsert-with-invalidate semantics, namespace-per-agent isolation. Postgres — pgvector, full-text, JSONB — and the graph databases are both claiming the workload, and the safest prediction in this essay is that “memory store” follows “vector database”: a feature, not a category.

One more pattern, easy to miss and historically resonant. OtterTune — Andy Pavlo’s CMU spinout that put machine learning inside the database to tune its knobs — shut down in 2024. The capability that survived is its mirror image: the LLM-DBA sitting outside the database, reaching in through MCP. AI inside the engine lost to the engine inside the AI’s toolbelt. There is a lesson there about where intelligence wants to live in the stack, and it foreshadows the value-capture argument of Chapter VI.

An agent’s dream database looks like git — which is to say, copy-on-write: the oldest trick in Petrov’s Chapter 6, promoted to the whole product.

Chapter V

The Meaning Moat

Now the war the LinkedIn post thinks Databricks has already won. Its second premise: Unity Catalog shipped first and manages “all the definitions needed to manage analytics and agents at scale” better than anyone. The premise is half-true, and the half that’s true is the half that’s being commoditized fastest.

What’s true: Databricks saw earliest that the catalog — not the table format, not the engine — is where agentic gravity concentrates, and it has executed ferociously. Unity Catalog was open-sourced in June 2024 with the hyperscalers as launch partners. Metric Views (“define once, trust everywhere”) put governed measures and dimensions in the catalog itself, queryable from SQL, Genie, and external BI. The Tabular acquisition bought the creators of Iceberg; the Neon acquisition bought the agent-native OLTP layer; the company crossed a $5.4 billion run-rate growing 65 percent, was valued at $134 billion in December 2025, and by mid-2026 was reportedly raising at $165–175 billion. As execution, it is close to flawless.

Run-rate and valuation: Databricks press releases; CNBC Dec 16 2025; The Information/Reuters on the 2026 round (talks, not closed). Snowflake, for scale: ~$4.3B product revenue FY2026, +25%.

What’s also true: everyone else read the same memo. Snowflake donated Polaris to Apache, where it graduated to top-level in February 2026 and became the neutral Iceberg REST standard; Horizon now reads and writes Snowflake-managed Iceberg from external engines. dbt and Fivetran merged into a ~$600M-ARR company positioning dbt as “the standard context layer for agentic analytics,” with MetricFlow going Apache 2.0. Salesforce paid $8 billion for Informatica to give Agentforce a governed data foundation. Microsoft wires Purview through OneLake; AWS exposes Glue as an Iceberg REST catalog so even Databricks can read it. Cube, AtScale, and ThoughtSpot sell the semantic layer as a portable, engine-agnostic asset — AtScale’s pitch is explicitly that because Iceberg commoditized compute, the semantic layer must not be owned by any one engine. The catalog wars of 2024–2026 are the SQL-dialect wars replayed at the metadata layer, and they are trending the same direction: toward open interfaces and away from moats.

Polaris TLP: Feb 18 2026. dbt–Fivetran merger: Oct 13 2025. Salesforce–Informatica closed Nov 18 2025. Catalog landscape: State of Iceberg Catalogs, June 2026.

The evidence that semantics is where accuracy lives, however, keeps strengthening. Every vendor now publishes the same experiment with the same shape: AtScale measured Gemini at 20 percent accuracy on raw TPC-DS schema and 92.5 percent with the semantic layer. Snowflake claims Cortex Analyst doubles raw GPT-4o. Google claims LookML grounding cuts errors by two-thirds. Discount each as marketing and the independent replications still stand — Cube’s paired benchmark, dbt’s, Spider 2.0’s brutal floor. The pattern is not in dispute. The dispute is over who owns the layer that produces it.

And here is the inconvenient subtlety the bull case skips: the semantic layer is an organizational achievement, not a product feature. “Revenue” has eleven definitions at your company not because you lack a catalog but because Finance, Sales Ops, and Product genuinely disagree — bookings versus billings versus recognized — often for defensible reasons. A catalog records the disagreement; it cannot resolve it. Buying Unity Catalog and expecting clean semantics is buying Confluence and expecting clean documentation. Anthropic’s post says this plainly, twice: governance without enforcement “decays back to the multiple candidates problem,” and the definitions must be owned by humans because the alternative — letting the model invent them — reproduced “the very ambiguities we were trying to eliminate.” Governed garbage is still garbage. The vendor sells the substrate. The moat is dug by the customer, slowly, in meetings.

A catalog records the disagreement about what “revenue” means. It cannot resolve it. That work is organizational — and vendor-neutral.

Chapter VI

Stress-Testing the Bull Case

Grant everything granted so far: agents are becoming the dominant consumers of data; their accuracy depends on governed semantics; lakehouse platforms with strong catalogs are best positioned to supply them. The progression the optimists describe is real, and it is worth laying out precisely, because each stage raises the bar on the platform:

Stage A

Humans ask questions

“What was revenue across product lines last year?” Text-to-SQL with a human checking each answer. Needs: discoverability, metadata, certified-table tiers. Most enterprises are here.

Stage B

Humans ask for judgment

“What should we do to grow this product line?” The agent composes dozens of queries; humans review the conclusion, not the intermediate steps. Needs: trustworthy primitives — semantic layers, blessed rollups, lineage — because nobody audits query #37 of 60.

Stage C

Agents patrol the estate

Thousands of standing agents continuously scanning, pushing findings before anyone asks. Needs: machine-scale workload isolation, non-human identity and audit, cost governance — and a feedback loop where confirmed and rejected findings train the next pass.

Each stage shifts value from compute over data to governed meaning over data. That is the steelman, and it is a real thesis, not vibes. Now the stress test — six complications, each sufficient to bend the conclusion:

“One correct answer” cuts both ways. Analytics punishes error more than coding — but it also makes verification cheap and mechanical. An agent can compute revenue three ways, reconcile against the GL, and flag divergence; numerical reconciliation is a crisper signal than any unit test. Anthropic already buys 6 points of accuracy with an adversarial-review subagent. Curated catalogs are how you make today’s models accurate. Tomorrow’s may simply afford to cross-examine the data — and the cheaper verification gets, the less a pre-blessed single path is worth.
The GPT-6-class question. A company’s metric definitions are encoded — redundantly, messily, but completely — in its query logs, dbt repos, dashboards, and Slack threads. The curated catalog is a compressed cache of that knowledge. Anthropic’s bootstrap experiment failed with 2026 models; the moat assumes it keeps failing. If a GPT-6/Claude-N-class model can decompress semantics from lineage and usage directly, the moat shrinks from “we hold the meaning” to “we hold the access logs” — weaker, and contestable.
Stage C is a CFO horror story. Thousands of always-on agents speculatively scanning a metered lakehouse is a cloud-bill catastrophe by design. The economically rational architecture pushes speculation onto cheap embedded engines — DuckDB-class compute over extracted slices — and touches the metered platform only for blessed reads. The thesis predicts an explosion of query volume on the lakehouse; the unit economics predict the explosion happens mostly off it.
Value pools at the scarcest layer. The stack is model → agent harness → semantic layer → catalog → storage. If frontier-vendor analyst agents work equally well across Databricks, Snowflake, BigQuery and Postgres — and their makers are highly motivated to ensure they do — the data platform becomes excellent, fungible plumbing under someone else’s relationship with the CFO. The Windows/Intel question, replayed: Databricks can win the workload and still not capture the increment.
Open formats dissolve the bind. Iceberg ended the table-format war; Polaris is standardizing the catalog interface; metrics specs are following (MetricFlow back to Apache 2.0). Every layer Databricks would monetize is being wrapped in a neutral API within eighteen months of shipping — frequently with Databricks’ own participation, because openness is its sales argument against Snowflake. You cannot both commoditize a layer and own it.
Hyperscaler bundling. Microsoft folds Fabric and Copilot agents into E5 agreements; Google folds BigQuery and Gemini into GCP commits; AWS does AWS things. “Explode over the decade” requires out-executing three distribution machines that give the competing product away — the classic mid-cap squeeze, survivable (Snowflake reaccelerated to 30 percent growth in its install base’s teeth) but not a tailwind.

The honest synthesis: the post is directionally right that governed semantics appreciate as agents proliferate, and wrong to assume the catalog vendor automatically captures the appreciation. Semantics are an achievement vendors can host but cannot sell. “Databricks explodes” is one scenario. “Databricks becomes the superbly-paid plumbing under someone else’s agent” is at least as likely — and both are fully consistent with every fact in the Anthropic post.

Chapter VII

The Last Human Query

What does the decade actually look like? Some second-order effects are already legible. Machine-issued queries will exceed human-issued ones on enterprise warehouses — the only question is the multiple, and whether vendors publish the telemetry. The analytics engineer becomes a semantic engineer: less SQL authorship, more curating definitions, adjudicating metric disputes, and maintaining eval suites — Anthropic already gates each domain’s agent on a ~90 percent eval threshold and finds that 90 percent of data-model PRs ship with corresponding context changes in the same diff. CI for meaning, not just for code.

The dashboard — a human-polling interface — peaks and declines, replaced by conversation and push: findings delivered with provenance footers and reproducible queries attached. Natural language becomes the authoring interface while SQL persists as the verification substrate: rarely written, always inspectable. NL is the new REPL; SQL is the new assembly. And when frontier-model capability converges, the durable corporate moat is not intelligence but agent-legible data plus the accumulated eval corpus — every question asked, every answer confirmed or corrected: company-specific ground truth that no one can buy and no competitor can scrape.

Predictions, dated and falsifiable, so this essay can be graded:

2027

High confidence

Every major platform ships a first-party catalog-grounded analyst agent; standalone text-to-SQL startups are effectively dead as a category.

2028

High confidence

Machine-initiated queries exceed human-initiated queries on at least one major cloud warehouse, per vendor-published telemetry.

2028

High confidence

“Semantic engineer” / agent-data-steward appears as a distinct title at >100 enterprises; dbt-or-successor ships agent-eval CI as a core feature.

2029

Medium

A platform offers a contractual correctness SLA on a metrics API — versioned definitions, signed lineage: the “agent contract.”

2029

Medium

Catalog interoperability is table stakes; Unity Catalog’s proprietary surface is mostly open-API-wrapped, neutralized as lock-in.

2030

Medium

Net-new dashboard creation declines year-over-year at major BI vendors; push and conversation are the default delivery surface.

2030

Medium

>20% of agent analytical compute runs on embedded engines (DuckDB-class) over extracted slices, not metered warehouse compute — driven purely by cost.

2031

Speculative

A frontier model infers a company’s metric semantics from lineage + query logs alone, matching the curated catalog on a golden-question eval — published as a benchmark.

2033

Speculative

The value-capture verdict is visible in revenue: data layer vs. model/agent layer. This essay puts 55/45 on the model layer taking the larger share of the increment.

2034

Speculative

A Fortune 500 incident — a material decision traced to an autonomous agent’s silently wrong metric — triggers the first regulatory guidance on agent-generated financial analysis.

Petrov closed his book on distributed consensus — the machinery by which unreliable nodes agree on a single truth despite failure, delay, and contradiction. It is hard to imagine a better metaphor for what just became the most valuable problem in enterprise software. The agents are ready. The models are nearly ready. What stands between a company and the future where its data estate answers questions, recommends actions, and patrols itself is not intelligence and was never intelligence. It is agreement — on what “revenue” means, on which table is true, on who owns the definition. The database of the AI era is, in the end, a consensus protocol that includes the humans.

The last human query will not be a SELECT statement. It will be the question we keep asking each other in the meeting after the agent presents its findings: do we trust this number? Everything in this essay — the catalogs, the semantic layers, the eval suites, the branches, the provenance footers — is infrastructure for answering yes.

Continued in Essay № 02: The 2027 Syllabus — the graduate course this decade needs and no flagship university teaches yet, with field notes on what MIT, Harvard, CMU, Stanford, and Berkeley actually teach in 2026. And the syllabus is no longer hypothetical: the full course site has all fourteen weeks of lecture notes and four lab handouts, free and open.

❦

Appendix

Sources & Provenance

[01] Anthropic, “How Anthropic enables self-service data analytics with Claude”, June 3 2026. All accuracy, decay, and ablation figures.
[02] Petrov, A., Database Internals: A Deep Dive into How Distributed Data Systems Work, O’Reilly, 2019.
[03] Lei et al., “Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows”, ICLR 2025; leaderboard at spider2-sql.github.io.
[04] Cube, “Why Semantic Layers Make LLM Analytics Reliable”, paired benchmark, 2026.
[05] dbt Labs, “Semantic Layer vs Text-to-SQL” benchmark, 2026.
[06] CNBC, Databricks–Neon acquisition, May 14 2025.
[07] Databricks, Lakebase launch, June 2025; Unity Catalog open-sourcing, June 2024; Metric Views docs.
[08] Kraska, Beutel, Dean, Polyzotis, “The Case for Learned Index Structures,” SIGMOD 2018; Ding et al., ALEX, SIGMOD 2020; Ferragina & Vinciguerra, PGM-index, VLDB 2020.
[09] Malkov & Yashunin, HNSW, 2016; Subramanya et al., DiskANN, NeurIPS 2019; Athanassoulis et al., RUM conjecture, EDBT 2016.
[10] Snowflake, Apache Polaris engineering blog; Polaris graduated to Apache TLP Feb 18 2026; Cortex Analyst accuracy (vendor-internal).
[11] Fivetran ✕ dbt Labs merger, Oct 13 2025; Salesforce–Informatica close, Nov 18 2025.
[12] Supabase $500M at $10.5B, June 4 2026; Tiger Data Ghost GA, June 9 2026; AgentDB; Turso multitenant architecture.
[13] Datadog Security Labs, Postgres MCP SQL-injection case study.
[14] Rasmussen et al., Zep/Graphiti, arXiv 2501.13956; Mem0, arXiv 2504.19413.
[15] AtScale TPC-DS semantic-layer study, BigDATAwire (vendor study, public methodology).
[16] Market figures: Databricks press releases ($5.4B run-rate, Feb 2026), CNBC ($134B, Dec 2025), PYMNTS/Reuters ($165–175B talks, unconfirmed); Snowflake FY2026 8-K; State of Iceberg Catalogs, June 2026.
[17] Counter-perspectives: Benn Stancil, “The context layer”; MotherDuck, “Your data model is the semantic layer”; OtterTune postmortem via dbtune.com.
[18] All vendor-published accuracy claims (Cortex ~90%, AtScale 92.5%, Looker “two-thirds”) are self-reported and flagged as such in the text. Databricks publishes no headline Genie accuracy figure as of June 2026.