DATA 2027 · Week 01 · Part I — Foundations Under New Workloads

The Client Has Changed

Fifty years of database architecture assumed a human on the other end of the socket — this week we learn the classical machine, then measure what happens when the client becomes a language model.

Lecture 1 — Anatomy of a DBMS: the five components · Lecture 2 — The agentic workload, measured

Lecture 1 · Tuesday

Anatomy of a DBMS: The Five Components

The canonical map of the machine — and what each box assumes about you.

L1 · The canonical map

One paper, five boxes

L1 · The five boxes

The five components

L1 · Process manager

Deciding when work enters

L1 · Query processor

SQL string → result

SQL in parser tokens → AST rewriter views · folding optimizer enumerate plans executor pull iterators catalog even parsing reads it buffer pool page requests
Fig. 1 — Parser → rewriter → optimizer → executor. Even parsing requires catalog reads: a shared, cached, contended structure.
L1 · Storage manager

The deepest layer

L1 · Ownership boundaries

Why it survived five decades

L1 · One SELECT, end to end

The most boring query imaginable

SELECT o.status, o.total_cents
FROM   orders o
WHERE  o.id = 48121;

Sent from an application in the same availability zone.

L1 · One SELECT, end to end

Life of the query

L1 · The latency budget

Numbers, not vibes (~1.1 ms total)

StageTypical cost (warm)Share
Network (RTT, same AZ)~500 µs~45%
Parse + catalog lookup~50 µs~5%
Plan (optimize)100–300 µs~15%
Execute (buffer-pool hit)50–150 µs~10%
Serialize + return~100 µs~10%
Connection setup (amortized)~150 µs~15%
L1 · The latency budget

Two lessons hide in that table

L1 · Field note

“Relatively few, relatively long-lived connections”

100

Postgres ships with max_connections = 100 — every sizing default descends from the 2007 assumption. In 2024–2026, agents opened hundreds of short-lived connections per task; platforms bolted poolers (PgBouncer, RDS Proxy, Neon’s proxy) in front of every database they sold.

L1 · Where the pressure lands

Four pressure points, in oxblood

CLIENT (now an agent) PROCESS MANAGER admission control · dispatch QUERY PROCESSOR parser rewriter optimizer executor TXN STORAGE MANAGER access methods buffer pool lock mgr · log mgr SHARED COMPONENTS catalog memory mgr replication · admin 1 2 3 4 PRESSURE POINTS 1 admission ctrl: speculative fan-out 2 optimizer: near-dup plans 3 catalog: info_schema hot path 4 lock manager: LLM think-time holds locks for seconds
Fig. 2 — The five-component architecture (HS&H, FnT 2007). Nothing in the black ink changes this semester; everything in red does.
L1 · The implicit contract

What the architecture assumes about you

Lecture 2 · Thursday

The Agentic Workload, Measured

Strip away the hype: what does the workload look like on the wire?

L2 · The client changed

Who creates databases now?

80%

of new databases on Neon’s platform were created by agents, not humans (reported 2025). The dominant client of the managed-Postgres business changed in under two years.

L2 · Session shape

Tens become thousands

L2 · Session shape

The cost model inverts

L2 · Five measurable axes

Human (2015) vs. agent (2026)

DimensionHuman clientAgent client
Queries per session~10–50~500–5,000
Inter-statement gapseconds–minutes50 ms–10 s (token gen)
Schema introspectionrareevery session
Semantic duplicateslowhigh — k textual variants
Speculationnone3–10 parallel probes
Staleness toleranceimplicit, unstatedoften explicit and large
L2 · Catalog pressure

information_schema as a hot path

L2 · Curated context

The fix isn’t faster catalogs

21% → 95%

Anthropic’s self-service analytics (June 2026): agents succeeded on ~21% of warehouse questions with raw schema access, 95% with curated semantic context. A 4.5× improvement — not one byte from the storage engine.

L2 · Speculative fan-out

Agents hedge

L2 · Near-duplicates

A thesis-shaped hole

L2 · The lock-manager poison

Think-time inside transactions

HUMAN-ERA TXN — think-time = app code (µs) BEGIN UPDATE COMMIT lock held H ≈ 2 ms AGENT TXN — think-time = token generation (s) BEGIN UPDATE …model decides what to do next: 500 ms – 10 s… COMMIT lock held H ≈ 4 s — a 2,000× increase queue depth grows like λ·H → a thousand waiters
Fig. 3 — Under two-phase locking, expected queue depth grows like λ·H. The same arithmetic hits MVCC gentler: long snapshots block vacuum, bloat version chains.
L2 · The lock-manager poison

Do the arithmetic

2,000×

Raise lock-hold time H from 2 ms to 4 s. A lock that conflicted once a day now backs up a thousand waiters — expected queue depth grows like λ·H.

L2 · Remedies

Old ideas, new urgency

L2 · What does not change

Agents don’t repeal physics

  • Unchanged: fsync costs what it costs.
  • B+-tree lookup is still O(logB N) pages.
  • Buffer pool lives or dies by hit ratio; ARIES still recovers.
  • Two bins: workload problems — caching, scheduling, context, API design.
  • Physics problems — already solved; don’t unsolve them with enthusiasm.
  • The interesting research lives at the boundary.
The database didn’t slow down. The client started thinking out loud while holding the lock.
— Week 1 lecture notes, DATA 2027
Checkpoint · Discussion

Before you leave

Readings · Due Thursday

Read before Thursday