DATA 2027 · Week 13 · Part IV — Frontier & Futures

Self-Driving, Self-Assembling, Self-Designing

Twenty years of databases that promise to tune themselves — and the autonomous DBA that actually shipped is an agent with a master prompt and a connection string.

Lecture 1 — From Auto-Tuning to Self-Design · Lecture 2 — The Agent as DBA

Lecture 1 · Tuesday

From Auto-Tuning to Self-Design

OtterTune’s Gaussian processes, the Data Calculator’s design continuum, and why the company died while the capability didn’t.

L1 · The 2017 claim

Automated is not autonomous

Pavlo, CIDR 2017: advisors recommend, a human applies and absorbs blame.
A self-driving DBMS must close the loop itself.
Predict the workload, act without a maintenance window, learn.
Built into Peloton, later reborn as NoisePage.

L1 · Forecast first

Autonomy is a bet on the future workload

Architecture ordering is the argument: forecast first, then plan.
A 40-minute index build only pays if those queries still arrive.
Bet quality is bounded by forecast quality, not ML cleverness.

L1 · Forecasting

QueryBot 5000: template, cluster, predict

Raw logs: millions of distinct statements per day — too many to forecast.
Template queries (strip constants), then cluster by arrival-rate correlation.
Ensemble: linear regression short-horizon, recurrent net longer.
Predict an hour to a week out — schedule builds at the 3 a.m. trough.

L1 · Forecasting

A few clusters carry the workload

95%

of query volume covered by the top 5 template clusters, across the workloads QueryBot 5000 studied (SIGMOD 2018).

L1 · Forecasting

Week 13’s client doesn’t sleep

Agent-generated SQL templates poorly — every prompt shifts the join order.
Bursts track upstream model behavior, not human circadian rhythm.
Forecasting research assumed the diurnal human; agentic clients break it.

L1 · Planning

A control problem in an ML costume

Receding-horizon control: actions with benefit, apply-cost, confidence.
Re-plan as observations arrive over the forecast window.
Constraint 1: every action needs an undo.
Constraint 2: never explore on the critical path.
Both return Thursday — with an LLM in the planner’s seat.

L1 · The loop

The loop is the product

Fig. — Pavlo et al., CIDR 2017. OtterTune swapped PLAN for a Gaussian process; the Data Calculator widened ACT; the 2026 LLM-DBA replaces the planner. The guardrail line never changed.

L1 · OtterTune

The knob problem

~350

configuration knobs in PostgreSQL (MySQL: 500+). They interact nonlinearly, and the defaults are tuned for a machine from 2008.

L1 · OtterTune

Tuning as black-box optimization

SIGMOD 2017: Gaussian-process regression over configs, pick by expected improvement.
Factor analysis + k-means prune redundant metrics; LASSO ranks knobs.
Order of 10 knobs capture most of the achievable gain.
TPC-C: up to 58–94% lower latency than default configs.
No model of the database needed — it’s a function; sample it.

L1 · OtterTune

Postmortem: structural, not scientific

Founded 2020, ~$14.5M raised; shut down 2024.
Tuning is episodic — what does month nine’s subscription buy?

The channel belonged to the clouds: knob access needs provider cooperation.
By 2024 the capability had dissolved into the substrate.

L1 · History

The advisor-mode plateau

AutoAdmin index wizard: SQL Server 7.0, 1998. DB2 SMART. Oracle’s advisor, 2004.
All plateaued at: system recommends, human approves.
The approval gap was never technical —
it’s about who gets paged at 3 a.m. when the advice is wrong.

L1 · Data Calculator

Stop tuning structures, derive them

10³²

valid two-node-type designs from ~50 layout primitives (Idreos et al., SIGMOD 2018). B-trees, LSM-trees, tries, hash tables: just the famous coordinates.

L1 · Data Calculator

Cost synthesis: design as navigation

Compose per-primitive cost models learned from micro-benchmarks on target hardware.
What-if answers in seconds, not person-months of implementation.
RUM tradeoff: read vs. update vs. memory — optimize two.
RAG pipelines are also design-space points: chunking, index family, reranker depth.
A “Calculator for serving pipelines” is this week’s open problem.

Lecture 2 · Thursday

The Agent as DBA

The first broadly deployed autonomous DBA is not inside the engine — it holds a connection string and a master prompt.

L2 · The twist

The 2017 roadmap didn’t predict this

A general LLM agent outside the engine: connection string + MCP server.
Every serious Postgres platform ships MCP tools: schema, EXPLAIN, stats, migrations.
Master prompts: multi-thousand-token operational playbooks in prose.
OtterTune’s expertise, re-encoded — executed by a model that reads the manual.

L2 · What changes

1 — The tuner can read

OtterTune’s black box was the only honest contract for narrow ML.
The agent knows checkpoint_completion_target spreads checkpoint I/O.
It knows Postgres 17 changed vacuum memory accounting.
It spots your ORM’s N+1 pattern in the log.
It debugs causally where the GP could only regress.

L2 · What changes

2 — Experiments on branches

Copy-on-write branching: a writable replica in seconds, for cents.
The missing substrate for “never explore on the critical path.”
2017 learned from cautious production nudges;
2026 forks reality, replays a captured workload, measures.

L2 · What changes

Fork reality, then ask permission

Fig. — The 2026 experiment loop: the agent never touches production directly; it forks, replays, gates, and opens a PR a human can argue with.

L2 · What changes

Tuning run #47, verbatim

-- branch: tune/checkpoint-2026-06-09 (fork of prod@LSN 0/8A3F1C40)
-- hypothesis: p99 spikes align with checkpoints (~every 140s)
ALTER SYSTEM SET max_wal_size = '8GB';                 -- was 1GB
ALTER SYSTEM SET checkpoint_completion_target = 0.9;   -- was 0.5
-- replay: 30 min, 14,212 statements, agent-traffic mix 71%
-- result: p99 412ms → 287ms · p50 38ms → 37ms
--         WAL volume +9% · recovery-time est. +6.2 min
-- gate:   p99 −20%, WAL ≤ +15%, recovery ≤ +10 min  → PASS
-- action: open PR with diff + transcript; do NOT touch prod

L2 · What changes

3 — It explains itself

A GP emits a configuration vector.
The agent emits hypothesis, experiment, result, pull request.
The approval gap was always a trust gap.
Legible reasoning is the first technology that narrows it.

L2 · What doesn’t change

Now the cold water

Hallucinated knob semantics — settings renamed two releases ago.
Over-indexing: twelve locally-good indexes quietly halve write throughput.
The lethal trifecta, highest-privilege form: private data + untrusted content + DDL.
Plus every analytics-agent failure mode from weeks 10–12.

L2 · The harness

Guardrails, evals, rollback

Guardrails: allowlisted tools; DDL only on branches; SLO tripwires that auto-revert.
Evals: replayed incidents with known-good outcomes — a prompt edit is a deploy.
Rollback: every change carries a down-migration and its branch evidence.
Tuesday’s loop survives intact; only the PLAN box got smarter.

L2 · Field note

The 40-minute lock

Agent applies ALTER TABLE … ADD COLUMN … DEFAULT — “safe in Postgres 11+,” correctly cited.
Production was Postgres 10: full rewrite, 40 min of ACCESS EXCLUSIVE on orders.
Fix: a version-pinned eval case + migrations through the human CI gate.
Right in general, wrong here — LLM operations in one incident.

L2 · Honest ledger

Learned components, 2026 scorecard

Component	Status 2026	Why
Automatic indexing	Production, fleet-scale	Azure SQL since 2019; verify + auto-revert
Knob tuning	Absorbed into platforms	OtterTune dead 2024; lives as defaults
Optimizer steering	Production, narrow	Bao-style hints; classical optimizer as floor
Learned cardinality	Advisor-mode / lab	Wins benchmarks; loses on drift, tail risk
Learned indexes	Niche	Absorbed into LSM parts; B-tree undefeated
LLM agent as DBA	Early production, gated	Branch-only DDL, evals, human-merged PRs

L2 · The pattern

The cost of being wrong

Learned components ship in proportion to how cheap mistakes are to detect and undo.
Auto-indexing shipped: an index is verifiable and reversible.
Learned cardinalities stall: a bad estimate hides in a slow plan, no alarm attached.
The physics was never ML capability. Nothing about 2026 is new on that axis.

Autonomy never shipped as a product you buy. It shipped as a layer you stop noticing.

— Week 13 lecture notes

Checkpoint · Discussion

Before you leave

Two knobs jointly beat what each predicts alone — name the phenomenon. Why does it force GP-style sample-efficient methods?
Design an agent’s promotion gate: four falsifiable metrics, including one that punishes over-indexing.
What is the RUM conjecture’s analogue for RAG pipelines — which three desiderata can’t all win?

Readings · Week 13

Read before Thursday

Self-Driving Database Management Systems — Pavlo et al., CIDR 2017. Which rung does an LLM agent occupy?
The Data Calculator — Idreos et al., SIGMOD 2018. Read for the design-space framing.
OtterTune postmortem — A. Pavlo, blog, 2024. Why the science worked and the business didn’t.