ML inside the engine mostly lost; ML advising the engine quietly won — and the LLM agent outside the engine is the strangest winner of all.
Lecture 1 — The case for (and against) learned indexes · Lecture 2 — Bao: ML steering the optimizer
What if an index is just a model — and the model class is up for debate?
k is k − 1: one subtraction, O(1).pos = F(key) · N.err_min, err_max.p, binary-search [p − err_min, p + err_max].less memory than the B-tree’s index structure — an RMI over 200M doubles fits in a few megabytes, i.e., in L2.
Replace the optimizer, or advise it? The answer decided what shipped.
PostgreSQL’s cardinality errors on Join Order Benchmark multi-joins (Leis et al., VLDB 2015) — independence and uniformity assumptions collapse under correlated predicates.
ANALYZE rebuilds a histogram cheaply.enable_hashjoin, enable_nestloop, …| System | Where the ML sits | Worst case | Fate |
|---|---|---|---|
| Neo (VLDB ’19) | Replaces the optimizer | Arbitrarily bad plan | Influential; never deployed as-is |
| Bao (SIGMOD ’21) | Advises: 1 of 48 hint sets | A valid classical plan | Pattern adopted (MSFT steered optimizer, Meta AutoSteer & kin) |
| OtterTune (’17–’24) | Outside, tuning knobs | Bad config (recoverable) | Good tech, dead company (2024) |
| LLM-DBA via MCP | Outside, as a client | Whatever you let it execute | The current frontier |
SQL Server 7.0 ships the Index Tuning Wizard. IBM’s LEO was Bao’s feedback loop twenty years early — turned off because customers feared plan instability. The binding constraint is operator trust, not model accuracy.
EXPLAIN ANALYZE, pg_stat_statements, schema in context.