DATA 2027 · Week 01 · Part I — Foundations Under New Workloads

The Client Has Changed

Fifty years of database architecture assumed a human on the other end of the socket — this week we learn the classical machine, then measure what happens when the client becomes a language model.

Lecture 1 — Anatomy of a DBMS: the five components · Lecture 2 — The agentic workload, measured

Lecture 1 · Tuesday

Anatomy of a DBMS: The Five Components

The canonical map of the machine — and what each box assumes about you.

L1 · The canonical map

One paper, five boxes

Hellerstein, Stonebraker & Hamilton, FnT Databases 2007.
Postgres, MySQL, Oracle, SQL Server, DB2 all converged here.
Each component encapsulates one hard systems problem.
Draw this diagram from memory by Friday.
The semester: what breaks when the human client disappears.

L1 · The five boxes

The five components

Process manager — admission control, connection-to-worker mapping.
Query processor — parser → rewriter → optimizer → executor.
Transactional storage manager — access methods, buffer pool, locks, log.
Shared components — catalog, memory allocator, replication.
Administration & monitoring.

L1 · Process manager

Deciding when work enters

Process-per-connection: classic Postgres.
Thread-per-connection: MySQL.
Thread pool with worker queue: SQL Server.
Admit too much → thrash the buffer pool.
Admit too little → waste the hardware.

L1 · Query processor

SQL string → result

Fig. 1 — Parser → rewriter → optimizer → executor. Even parsing requires catalog reads: a shared, cached, contended structure.

L1 · Storage manager

The deepest layer

Access methods: heaps and B+-trees.
Buffer pool, lock manager, log manager.
Welded together so ACID emerges from their cooperation.
Home of ARIES write-ahead logging and two-phase locking.

L1 · Ownership boundaries

Why it survived five decades

The optimizer never touches a disk page.
The buffer pool never sees SQL.
The lock manager doesn’t know what a join is.
Separation of concerns survived five decades of hardware churn.

L1 · One SELECT, end to end

The most boring query imaginable

SELECT o.status, o.total_cents
FROM   orders o
WHERE  o.id = 48121;

Sent from an application in the same availability zone.

L1 · One SELECT, end to end

Life of the query

Parser builds AST, resolving orders against the catalog.
Optimizer picks index lookup: ~4 page reads (3 B+-tree levels + heap).
Executor pulls through the iterator tree; buffer pool has page 7841 resident.
MVCC snapshot check confirms row visibility.
One tuple serialized, back over TCP.

L1 · The latency budget

Numbers, not vibes (~1.1 ms total)

Stage	Typical cost (warm)	Share
Network (RTT, same AZ)	~500 µs	~45%
Parse + catalog lookup	~50 µs	~5%
Plan (optimize)	100–300 µs	~15%
Execute (buffer-pool hit)	50–150 µs	~10%
Serialize + return	~100 µs	~10%
Connection setup (amortized)	~150 µs	~15%

L1 · The latency budget

Two lessons hide in that table

For point queries, the wire is the bottleneck, not the database.
100× more queries multiplies the part the DBMS can’t optimize.
Parse + plan together cost more than execution.
At thousands of similar queries, parse/plan becomes the hottest path.

L1 · Field note

“Relatively few, relatively long-lived connections”

100

Postgres ships with max_connections = 100 — every sizing default descends from the 2007 assumption. In 2024–2026, agents opened hundreds of short-lived connections per task; platforms bolted poolers (PgBouncer, RDS Proxy, Neon’s proxy) in front of every database they sold.

L1 · Where the pressure lands

Four pressure points, in oxblood

Fig. 2 — The five-component architecture (HS&H, FnT 2007). Nothing in the black ink changes this semester; everything in red does.

L1 · The implicit contract

What the architecture assumes about you

Connections are long-lived and few.
Queries arrive at human cadence; parsing is noise.
A stable application shapes the working set; LRU converges.
Transactions are short — think-time is microseconds of app code.
Metadata queries are rare — the schema was read once.

Lecture 2 · Thursday

The Agentic Workload, Measured

Strip away the hype: what does the workload look like on the wire?

L2 · The client changed

Who creates databases now?

80%

of new databases on Neon’s platform were created by agents, not humans (reported 2025). The dominant client of the managed-Postgres business changed in under two years.

L2 · Session shape

Tens become thousands

Human analyst: tens of queries — write, stare, sip coffee, refine.
Agent on one task: hundreds to thousands.
Introspect schema → sample rows → try → read error → retry.
Cross-check with a second formulation, then run the final query.

L2 · Session shape

The cost model inverts

40 queries × 300 µs plan = 12 ms — irrelevant.
4,000 queries = 1.2 s of pure CPU, per session, per agent.
Fixed per-query overheads become first-order terms.
Agents don’t change the physics; they change which terms dominate.

L2 · Five measurable axes

Human (2015) vs. agent (2026)

Dimension	Human client	Agent client
Queries per session	~10–50	~500–5,000
Inter-statement gap	seconds–minutes	50 ms–10 s (token gen)
Schema introspection	rare	every session
Semantic duplicates	low	high — k textual variants
Speculation	none	3–10 parallel probes
Staleness tolerance	implicit, unstated	often explicit and large

L2 · Catalog pressure

`information_schema` as a hot path

Agents have no persistent memory of your schema.
Every session: column lists, foreign-key crawls, LIMIT 5 samples.
Catalog views were designed as cold administrative paths.
Many are unindexed joins over a dozen system tables.
Now run at the top of every session, concurrently.

L2 · Curated context

The fix isn’t faster catalogs

21% → 95%

Anthropic’s self-service analytics (June 2026): agents succeeded on ~21% of warehouse questions with raw schema access, 95% with curated semantic context. A 4.5× improvement — not one byte from the storage engine.

L2 · Speculative fan-out

Agents hedge

“Why did revenue dip in March?” → five parallel probes.
By region, product, channel, cohort, data-quality check.
It will use one or two; the rest is wasted work.
Admission control protects against too many users…
…not too many hypotheses from one user.

L2 · Near-duplicates

A thesis-shaped hole

>= '2026-03-01' vs. > '2026-02-28' — same intent.
Aliases renamed, column order shuffled: textual caches miss.
The optimizer replans from scratch each time.
A cache keyed on normalized plan structure would absorb most of it.
No mainstream engine ships one yet.

L2 · The lock-manager poison

Think-time inside transactions

Fig. 3 — Under two-phase locking, expected queue depth grows like λ·H. The same arithmetic hits MVCC gentler: long snapshots block vacuum, bloat version chains.

L2 · The lock-manager poison

Do the arithmetic

2,000×

Raise lock-hold time H from 2 ms to 4 s. A lock that conflicted once a day now backs up a thousand waiters — expected queue depth grows like λ·H.

L2 · Remedies

Old ideas, new urgency

Autocommit by default.
Optimistic concurrency with retry.
Multi-statement logic moved into stored procedures, invoked atomically.
Session-level staleness contracts — yesterday’s data needs no locks.
Weeks 5 and 6 take these apart properly.

L2 · What does not change

Agents don’t repeal physics

Unchanged: fsync costs what it costs.
B+-tree lookup is still O(log_B N) pages.
Buffer pool lives or dies by hit ratio; ARIES still recovers.

Two bins: workload problems — caching, scheduling, context, API design.
Physics problems — already solved; don’t unsolve them with enthusiasm.
The interesting research lives at the boundary.

The database didn’t slow down. The client started thinking out loud while holding the lock.

— Week 1 lecture notes, DATA 2027

Checkpoint · Discussion

Before you leave

40 vs. 4,000 point queries: where does wall-clock time go?
Why do prepared statements help agents far more than humans?
Speculative fan-out: admit all k probes, serialize by p_i, or budget?

Readings · Due Thursday

Read before Thursday

Architecture of a Database System — Hellerstein, Stonebraker & Hamilton, 2007. Sections 1–4.
What Goes Around Comes Around — Stonebraker & Hellerstein, Readings in Database Systems.
Self-Service Analytics with Claude — Anthropic engineering blog, June 2026.