- Name the five architectural components of a classical DBMS (Hellerstein, Stonebraker & Hamilton, FnT 2007) and state what each owns and what each assumes about the client.
- Trace a single
SELECTend-to-end — parser, rewriter, optimizer, executor, buffer pool — and assign a defensible latency budget to each stage. - Characterize the agentic workload quantitatively: session length, fan-out, duplicate rate, introspection frequency, and inter-statement think-time.
- Explain why LLM think-time inside open transactions poisons lock-based concurrency control, with a back-of-envelope lock-hold-time argument.
- Identify which of the classical assumptions break under agents (and which don’t — the physics of B-trees and fsync survive unchanged).
Anatomy of a DBMS: The Five Components
Every serious relational engine — Postgres, MySQL, Oracle, SQL Server, DB2 — converged on essentially the same internal organization, and the canonical map of it is Hellerstein, Stonebraker & Hamilton’s Architecture of a Database System (Foundations and Trends in Databases, 2007). The paper divides the system into five interacting components, and the division is not arbitrary: each component encapsulates one hard systems problem. You should be able to draw this diagram from memory by Friday, because the rest of the semester is a guided tour of what happens to each box when its oldest assumption — a human client — quietly stops being true.
The five boxes
First, the process manager handles admission control and the mapping of client connections to workers — process-per-connection (classic Postgres), thread-per-connection (MySQL), or a thread pool with a worker queue (SQL Server). Its core job is to decide when work enters the system: admit too much and you thrash the buffer pool; admit too little and you waste the hardware. Second, the query processor turns SQL strings into results: parser → rewriter → optimizer → executor. Third, the transactional storage manager is the deepest layer: access methods (heaps, B+-trees), the buffer pool, the lock manager, and the log manager, welded together so that ACID emerges from their cooperation — this is where ARIES-style write-ahead logging and two-phase locking live. Fourth and fifth, the paper groups shared components (the catalog, the memory allocator, replication and loading utilities) and administration and monitoring. Memorize the ownership boundaries: the optimizer never touches a disk page; the buffer pool never sees SQL; the lock manager doesn’t know what a join is. That separation of concerns is the reason these systems survived five decades of hardware churn.
One SELECT, end to end
Take the most boring query imaginable, sent from an application in the same availability zone:
SELECT o.status, o.total_cents
FROM orders o
WHERE o.id = 48121;
The bytes arrive over TCP and the process manager hands them to a worker. The parser tokenizes and builds an AST, resolving orders and its columns against the catalog — note this carefully: even parsing requires catalog reads, which is why the catalog is a shared, cached, contended structure. The rewriter expands views and applies constant folding. The optimizer enumerates plans — here trivially: a unique index on id exists, so it’s an index lookup, selectivity 1/N, cost roughly 4 page reads (3 B+-tree levels plus one heap page). The executor runs the plan as a pull-based iterator tree: IndexScan calls into the access method, the access method asks the buffer pool for page 7841, and — if we’re lucky — the page is already resident and pinned in a frame, so no I/O occurs. A shared lock (or in Postgres’s MVCC world, a snapshot visibility check) confirms the row version is visible to our transaction. One tuple flows up the iterator tree, gets serialized into the wire protocol, and goes back over TCP.
The latency budget
Numbers, not vibes. For the warm-cache point lookup above, against a well-tuned Postgres in the same AZ:
| Stage | Typical cost (warm) | Share of ~1.1 ms total | Notes |
|---|---|---|---|
| Network (RTT, same AZ) | ~500 µs | ~45% | Cross-region: 30–80 ms — swamps everything |
| Parse + catalog lookup | ~50 µs | ~5% | Avoidable via prepared statements |
| Plan (optimize) | 100–300 µs | ~15% | Grows superlinearly with join count |
| Execute (buffer-pool hit) | 50–150 µs | ~10% | A cold read adds 100 µs (NVMe) to 10 ms (network storage) |
| Serialize + return | ~100 µs | ~10% | Per-row cost; large result sets shift the budget here |
| Connection setup (amortized) | ~150 µs | ~15% | Un-amortized: a fresh Postgres connection costs 10–50 ms |
Two lessons hide in this table. First, for point queries the database is not the bottleneck — the wire is, which means a client that issues 100× more queries multiplies the part of the budget the DBMS can’t optimize away. Second, parse and plan together cost more than execution. For a human issuing twelve queries an hour, who cares. For a client issuing thousands of structurally-similar queries, the parse/plan column suddenly becomes the system’s hottest code path — hold that thought until Thursday.
max_connections = 100. In 2024–2026, serverless platforms watched agents open hundreds of short-lived connections per task and had to bolt poolers (PgBouncer, RDS Proxy, Neon’s proxy) in front of every database they sold. The architecture didn’t bend; the deployment did.
Where the pressure lands
Here is the classical diagram, with the four places the agentic workload presses hardest marked in oxblood. Keep it in view during Thursday’s lecture.
What the architecture assumes about you
Close Tuesday by making the implicit explicit. The classical design assumes: (a) connections are long-lived and few; (b) queries arrive at human cadence, so per-query overheads like parsing are noise; (c) the working set is shaped by a stable application, so the buffer pool’s LRU-ish policies converge; (d) transactions are short because application code between statements executes in microseconds; (e) metadata queries are rare — the developer read the schema once, in a design doc. Every one of these is a statement about client behavior, not about storage physics. Which is exactly why they’re the assumptions that can break.
The Agentic Workload, Measured
Strip away the hype and ask the only question a systems person should ask: what does the workload look like on the wire? When the client is an LLM-driven agent — a coding assistant exploring a schema, an analytics agent answering a business question, a provisioning agent standing up infrastructure — the traces look unlike anything the 2007 architecture was tuned for. Neon reported in 2025 that roughly 80% of new databases on its platform were created by agents rather than humans; the dominant client of the managed-Postgres business changed in under two years. The workload changed with it, along five measurable axes.
Session shape: tens become thousands
A human analyst at a SQL prompt issues on the order of tens of queries per session — write, stare, sip coffee, refine. An agent solving one task routinely issues hundreds to thousands: it introspects the schema, samples rows from candidate tables, tries a query, reads the error, retries, cross-checks its answer with a second formulation, then runs the final query. The per-query cost model inverts. At 40 queries/session, a 300 µs plan cost is 12 ms of total optimizer time — irrelevant. At 4,000 queries, it’s 1.2 seconds of pure CPU per session, multiplied by every concurrent agent. Fixed per-query overheads that were engineering rounding errors become first-order terms. This is the central quantitative claim of the course: agents don’t change the physics; they change which terms in the cost model dominate.
| Dimension | Human client (circa 2015) | Agent client (circa 2026) |
|---|---|---|
| Queries per session | ~10–50 | ~500–5,000 |
| Inter-statement gap | seconds–minutes (thinking, typing) | 50 ms–10 s (token generation) |
| Schema introspection | rare; humans read docs once | every session; information_schema is the hot path |
| Semantic duplicates | low; humans reuse their last query | high; same intent, k textual variants |
| Speculation | none; one query at a time | fan-out of 3–10 parallel probes, most discarded |
| Error handling | stop, read, ask a colleague | instant retry, often with the same mistake rephrased |
| Staleness tolerance | implicit, unstated | often explicit and large — “yesterday’s data is fine” |
information_schema as a hot path
An agent has no persistent memory of your schema. Every session begins with discovery: SELECT table_name, column_name, data_type FROM information_schema.columns …, foreign-key crawls over pg_constraint, a few SELECT * … LIMIT 5 samples to learn value distributions. Catalog views were designed as cold administrative paths — many are implemented as unindexed joins over a dozen system tables — and now they’re executed at the top of every agent session, by every agent, concurrently. The fix is not faster catalogs; it’s curated context. Anthropic’s self-service analytics work (June 2026) is the canonical measurement: agents answering business questions over the warehouse succeeded on roughly 21% of questions with raw schema access, and 95% once given curated semantic context — table descriptions, certified metric definitions, join-path documentation. A 4.5× improvement, and not one byte of it came from the storage engine. The leverage moved up the stack, into metadata. Week 9 is entirely about this.
Speculative fan-out and near-duplicate queries
Agents hedge. Asked “why did revenue dip in March?”, a planner-style agent may dispatch five probe queries in parallel — by region, by product, by channel, by cohort, by data-quality check — knowing it will use one or two. From the database’s perspective this is a burst of concurrent, expensive, mostly wasted analytical work, which classical admission control (built to protect against too many users, not too many hypotheses from one user) handles badly. Relatedly, agents generate near-duplicate semantic queries: WHERE created_at >= '2026-03-01' versus WHERE created_at > '2026-02-28', aliases renamed, column order shuffled. Textual plan caches miss; the optimizer replans from scratch each time. A cache keyed on normalized plan structure — or a result cache keyed on semantic equivalence — would absorb most of it. No mainstream engine ships one yet. This is a thesis-shaped hole.
Think-time inside transactions: the lock-manager poison
The sharpest failure is also the simplest. Interactive transactions assume the client’s “think time” between statements is application code: microseconds. An agent’s think time between statements is token generation: 500 ms to 10 s while the model decides what to do next. Now do the arithmetic for a two-phase-locking system. If a transaction holds a row lock for duration H and conflicting transactions arrive at rate λ, expected queue depth grows like λ·H. Raise H from 2 ms to 4 s — a 2,000× increase — and a lock that conflicted once a day now backs up a thousand waiters. The same arithmetic applies, gentler but real, to MVCC: long snapshots block vacuum/garbage collection and bloat version chains. The remedies are old ideas with new urgency: autocommit by default, optimistic concurrency with retry, moving multi-statement logic into the database (stored procedures the agent invokes atomically), or — most interesting — session-level staleness contracts, since the agent that tolerates yesterday’s data needs no locks at all. Weeks 5 and 6 take these apart properly.
What does not change
End with discipline. The agent client does not repeal physics. fsync still costs what it costs; a B+-tree point lookup is still O(logB N) page accesses; the buffer pool still lives or dies by its hit ratio; ARIES still recovers correctly. Serializability is still defined by conflict graphs, and an agent that needs read-modify-write isolation needs it exactly as much as a Perl script did in 1999. The discipline of this course is to sort every observed “AI database problem” into one of two bins: workload problems, which we solve with caching, scheduling, admission control, context curation, and API design — and physics problems, which were already solved and must not be unsolved by enthusiasm. Most of what you’ll read on vendor blogs belongs in bin one. The interesting research lives at the boundary.
Read Before Thursday
This Week’s Problems
Using the Tuesday latency-budget table, compute total wall-clock time spent in (a) network, (b) parse+plan, and (c) execute, for a human session of 40 point queries and an agent session of 4,000. Repeat with cross-region networking (RTT 60 ms). For each of the four scenarios, name the single optimization with the highest payoff, and state in one sentence why prepared statements help the agent far more than they ever helped the human.
Build a synthetic agent against local Postgres. Your script must: (1) introspect information_schema.columns and sample 5 rows from each table; (2) issue 500 near-duplicate queries — same semantics, varied literals, aliases, and whitespace; (3) open a transaction, update one row, sleep a “think-time” T drawn uniformly from [0.5 s, 5 s], then commit, while 8 concurrent workers contend for the same row. Using pg_stat_statements and pg_locks, report: parse+plan time as a fraction of total query time in phase 2 (with and without prepared statements), and p50/p99 lock-wait time in phase 3 as a function of T. Two plots, one paragraph of interpretation each.
Design an admission-control policy for speculative fan-out. An agent dispatches k parallel probe queries with estimated costs c₁…c_k, will ultimately consume the results of exactly one (unknown a priori, with prior probabilities p_i), and may cancel the rest the moment it commits to a branch. Formalize the scheduler’s decision: under what conditions on k, the p_i, system load ρ, and cancellation latency does admitting all probes minimize the agent’s expected end-to-end latency, versus serializing them in descending p_i order, versus admitting a budgeted subset? Propose a concrete protocol extension (wire-level) by which an agent could declare “speculative, priority p_i, cancellable” per statement, and argue what the executor should do differently with that hint. There is no known complete answer; a rigorous partial model with honest stated assumptions earns full credit.