DATA 2027 DATA SYSTEMS IN THE AGENTIC ERA ← SCHEDULE
Reference · Resources & Reading List

Everything Worth Reading

All forty-two weekly readings in one place, annotated; the two books that hold the course together; the benchmarks, the tools, and where to look after Week 14 ends.

Scope: Weeks 1–14, Labs 1–4  ·  Format: citations + one-line reading guidance  ·  Duplicates: two papers are assigned twice, on purpose
§ 1 · The Canon

The 42 Weekly Readings, Annotated

Three readings a week for fourteen weeks. The list below is the complete canon, organized by part, with each paper's one-line reading instruction carried over from its week page. Two readings appear in two different weeks each — the 2005 What Goes Around Comes Around and the Anthropic self-service analytics post — so the deduplicated list runs to forty entries. Where a reading is assigned twice, both week numbers are shown: the repetition is the assignment.

Part I — Foundations Under New Workloads (Weeks 1–4)

1.1
Architecture of a Database System — Hellerstein, Stonebraker & Hamilton, Foundations and Trends in Databases, 2007. Sections 1–4 only.The canonical map of the machine. Focus on the process model (§2) and the life of a query (§1.1) — and, on every page, ask which stated assumption is about client behavior.
1.2
What Goes Around Comes Around — Stonebraker & Hellerstein, in Readings in Database Systems, 4th ed., 2005. Assigned twice: Weeks 1 & 14.WEEK 1: thirty-five years of data-model fashion cycles in twenty pages — read it as inoculation against rebranded old ideas. WEEK 14: skim the XML chapter and grade its 2005 predictions against the 2024 scorecard — a rare controlled experiment in technological forecasting.
1.3
Self-Service Analytics with Claude — Anthropic engineering blog, June 2026. Assigned twice: Weeks 1 & 9.The 21%→95% result. WEEK 1: notice that none of the gains required touching the database engine. WEEK 9: focus on what the curated skills actually contain (metric definitions, join guidance, pitfalls) and who maintains them — it is a semantic layer in everything but name.
2.1
Database Internals, ch. 2–4 & 7 — Alex Petrov, O’Reilly, 2019.Ch. 2–4 give you the B-tree at implementation depth (cell layouts, splits, B-link); ch. 7 is the cleanest LSM treatment in print. Read with a pencil; redo the fanout math for 16 KB pages.
2.2
Designing Access Methods: The RUM Conjecture — Athanassoulis, Kester, Maas, Stoica, Idreos, Ailamaki & Callaghan, EDBT 2016.Short and sharp. Focus on the overhead definitions in §2 and the design-space figure; come to class able to say which corner the last system you used had silently chosen.
2.3
The Log-Structured Merge-Tree (LSM-Tree) — O’Neil, Cheng, Gawlick & O’Neil, Acta Informatica, 1996.Read §1–3 for the rolling-merge idea; skim the rest. The cost model is HDD-era — your job is to notice exactly which assumptions flash and NVMe broke, and which survived.
3.1
“One Size Fits All”: An Idea Whose Time Has Come and Gone — Stonebraker & Çetintemel, ICDE 2005.Read for the method, not the predictions: how to argue from workload characteristics to architecture. Note which of its bets aged well (warehousing, streams) and which didn’t.
3.2
C-Store: A Column-oriented DBMS — Stonebraker et al., VLDB 2005.Focus on §3–4: projections, sort-order-dependent compression, and the WS/RS split. Ask yourself at every design choice: which corner of the RUM triangle is being traded away?
3.3
MonetDB/X100: Hyper-Pipelining Query Execution — Boncz, Zukowski & Nes, CIDR 2005.Focus on §2 (why TPC-H Q1 gets <10% of hand-coded performance in tuple-at-a-time engines) and the vector-size experiment — the U-shaped curve is the whole lecture in one figure.
4.1
Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases — Verbitski et al., SIGMOD 2017.The canonical “ship only the log” paper. Focus on §2–3: the write amplification accounting in Figure 2, and derive the 4/6–3/6 quorum from the AZ+1 model yourself before reading their derivation.
4.2
The Snowflake Elastic Data Warehouse — Dageville et al., SIGMOD 2016.Read for the three-layer architecture and §3.3 on micro-partitions and pruning. Ask at every section: which property here is enabled by immutability alone?
4.3
Building an Elastic Query Engine on Disaggregated Storage — Vuppalapati et al., NSDI 2020.The production retrospective: what the 2016 bet got right, measured. Focus on the workload skew and cache hit-rate data, and the open problem of intermediate (shuffle/spill) data.

Part II — New Access Methods & Engines (Weeks 5–8)

5.1
The Case for Learned Index Structures — Kraska, Beutel, Chi, Dean & Polyzotis, SIGMOD 2018.Read §1–3 closely for the CDF framing and the RMI; treat the eval skeptically and bring one benchmark objection (the SOSD authors found several).
5.2
Bao: Making Learned Query Optimization Practical — Marcus, Negi, Mao, Tatbul, Alizadeh & Kraska, SIGMOD 2021.Focus on §2’s design constraints and the Thompson-sampling loop — note how every choice traces back to a specific deployment failure of Neo.
5.3
SageDB: A Learned Database System — Kraska, Alizadeh, Beutel, Chi, Ding, Kristo, Leclerc, Madden, Mao & Nathan, CIDR 2019.Read as a manifesto, not a system paper; as you read, mark each proposed component as policy or mechanism, and check your marks against what Redshift shipped.
6.1
Product Quantization for Nearest Neighbor Search — Jégou, Douze & Schmid, IEEE TPAMI, 2011.The codebook math behind every compressed vector index. Work through §III until the 256^16-codebook trick and the asymmetric distance tables feel obvious; skim the GIST results.
6.2
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs — Malkov & Yashunin, IEEE TPAMI, 2018.Focus on the neighbor-selection heuristic (Alg. 4) and the layer assignment — the skip-list analogy is in the paper. Note carefully what the memory model assumes.
6.3
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node — Subramanya, Devvrit, Simhadri, Krishnaswamy & Kadekodi, NeurIPS 2019.Read for the systems decisions, not the graph theory: why α > 1, why PQ steers while SSD stores, why node + adjacency share a block. Compare its cost model against HNSW’s before class.
7.1
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics — Armbrust, Ghodsi, Xin & Zaharia, CIDR 2021.Read it as an argument, not an ad: extract the three technical bets in §3 and decide which one carries the most risk. Note what the paper says two-tier architectures cost in staleness and copies.
7.2
Apache Iceberg Table Spec, v2 — Apache Software Foundation; pair with Ryan Blue’s “Iceberg: a fast table format for S3” talk, Netflix, 2018.Focus on the manifest and snapshot sections: find where per-column bounds live and convince yourself the commit protocol needs nothing stronger than one CAS. The talk supplies the Hive war stories the spec politely omits.
7.3
Photon: A Fast Query Engine for Lakehouse Systems — Behm, Palkar, Agarwal et al., SIGMOD 2022.Focus on §3’s decision to vectorize-and-interpret rather than code-generate (and the JVM pathologies in §2), plus the adaptive per-batch kernels. Skim the eval with Week 2’s benchmark skepticism.
8.1
Calvin: Fast Distributed Transactions for Partitioned Database Systems — Thomson, Diamond, Weng, Ren, Shao & Abadi, SIGMOD 2012.Focus on §3 (the sequencer) and §3.2.1 (dependent transactions / OLLP) — convince yourself why determinism really removes 2PC, and what it costs.
8.2
Spanner: Google’s Globally-Distributed Database — Corbett et al., OSDI 2012.Read §3 (TrueTime) and §4.1.2 (commit wait) closely; skim the rest. Work the invariant: why must locks be held through the wait?
8.3
Neon architecture posts: “Architecture decisions in Neon” & the pageserver/branching deep-dives — Neon engineering blog, 2022–24.Focus on GetPage@LSN, the layer-file map, and what branch creation actually writes — verify Thursday’s O(metadata) claim against their design.

Part III — Semantics, Agents, Governance (Weeks 9–12)

9.1
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows — Lei et al., ICLR 2025.The 86→17 cliff, quantified. Focus on §3’s task construction and the error analysis: count how many failures are schema-scale or business-logic problems rather than SQL-skill problems.
9.2
Can LLM Already Serve as a Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (BIRD) — Li et al., NeurIPS 2023.Read for the “external knowledge” design decision — the first benchmark to admit questions underdetermine SQL — and bring a skeptical eye to the gold annotations; we discuss the audit findings in class.
9.3
How Anthropic Enables Self-Service Data Analytics with Claude — Anthropic engineering blog, June 2026. See entry 1.3 above — assigned in both Weeks 1 and 9; the Week-9 reading instruction is folded into that entry.Reread it after Spider 2.0 and BIRD: the same post reads completely differently once you know what the benchmarks can’t measure.
10.1
Semantic Operators: A Declarative Model for Rich, AI-Based Data Processing (LOTUS) — Patel, Jha, Asawa, Pan, Guestrin & Zaharia, VLDB 2025.The algebra itself. Focus on the formal semantics of accuracy targets and the cascade algorithms for sem_filter/sem_join — especially how thresholds are calibrated to give statistical guarantees, not vibes.
10.2
Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing — Liu, Russo, Cafarella et al., CIDR 2025.The optimizer story. Focus on the physical plan space (model tiers, fusion, code synthesis) and how plan search handles the (runtime, cost, quality) Pareto frontier; skim the implementation section.
10.3
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing — Shankar, Parameswaran & Wu, VLDB 2025.Optimization beyond physical operator choice. Focus on the rewrite directives and on how LLM-as-judge validates rewrites — ask yourself where this validation loop could be fooled.
11.1
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory — Chhikara, Khant, Aryan, Singh & Yadav, arXiv:2504.19413, 2025.Read §3 as a write-path spec: extraction then ADD/UPDATE/DELETE/NOOP consolidation. Focus on the latency/token tables and ask where an audit log would have to live.
11.2
Zep: A Temporal Knowledge Graph Architecture for Agent Memory — Rasmussen, Paliychuk, Beauvais, Ryan & Chalef, arXiv:2501.13956, 2025.The Graphiti bi-temporal edge model is the heart. Map t_valid/t_invalid onto SQL:2011 periods and Type-2 SCDs as you read — the correspondence is nearly exact.
11.3
MemGPT: Towards LLMs as Operating Systems — Packer, Wooders, Lin, Fang, Patil, Stoica & Gonzalez, arXiv:2310.08560, 2023.Read the OS metaphor adversarially: main/external context is a buffer pool with the application as its own buffer manager. List the failure modes that DBMSs solved by NOT doing this.
12.1
Model Context Protocol — Specification — Anthropic & the MCP community, 2024–2025.Read the architecture and the tools/resources/prompts sections. Focus on the trust boundaries and what the spec explicitly leaves to server authors — that list is the attack surface.
12.2
The Lethal Trifecta & The Supabase MCP can leak your entire SQL database — Simon Willison, 2025.The trifecta post is the mental model; the Supabase walkthrough is the proof. Trace each leg of the trifecta onto each step of the exploit as you read.
12.3
OWASP Top 10 for LLM Applications — OWASP, 2025.Read LLM01 (Prompt Injection), LLM02 (Sensitive Information Disclosure), and the excessive-agency entry. Map each to a database control from Thursday’s defenses table.

Part IV — Frontier & Futures (Weeks 13–14)

13.1
Self-Driving Database Management Systems — Pavlo et al., CIDR 2017.The manifesto. Focus on the forecast→plan→act architecture and the explicit analogy ladder from “advisor” to “autonomous” — then ask which rung an LLM agent occupies.
13.2
The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models — Idreos et al., SIGMOD 2018.Read for the design-space framing, not the implementation: layout primitives, the 10^32 continuum, and how learned micro-benchmark models compose into whole-structure cost predictions.
13.3
OtterTune postmortem — A. Pavlo, blog post, 2024.A rare honest startup autopsy by its own founder. Focus on why the science worked and the business didn’t: episodic value, and platforms absorbing the feature.
14.1
What Goes Around Comes Around… And Around… — M. Stonebraker & A. Pavlo, SIGMOD Record 53(2), 2024.The lecture’s backbone. Focus on the scoring of NoSQL and graph, and on which “lessons” the authors say never change — you will cite both sides of it in the debate.
14.2
The Seattle Report on Database Research — D. Abadi et al., CACM 65(8), 2022.The field’s last pre-agentic self-portrait. Focus on what the community ranked urgent in 2022 versus what this course argued matters now — the gaps are your debate ammunition.
14.3
What Goes Around Comes Around — Stonebraker & Hellerstein, 2005. See entry 1.2 above — assigned in both Weeks 1 and 14; the Week-14 reading instruction is folded into that entry.The pairing with 14.1 is the point: the same authorship lineage grading its own twenty-year-old predictions.
A note on linksWhere a reading lists only a venue, find it through the venue’s proceedings or the authors’ pages — every paper above is freely available from at least one of those. We deliberately don’t pin URLs for papers: they rot faster than citations do.
§ 2 · The Two Books

Petrov, and the Alternate On-Ramp

Only one book is required: Alex Petrov’s Database Internals (O’Reilly, 2019; databass.dev). It is the course’s implementation-depth backstop — when a lecture asserts something about page layouts, recovery, or consensus and you want to see the actual mechanics, Petrov is where you go. Only chapters 2–4 and 7 are formally assigned (Week 2), but the rest of the book shadows the syllabus:

ChaptersTopicWhere it lands in DATA 2027
ch. 1Introduction; storage-engine taxonomyBackground for Week 1
ch. 2–4B-tree basics, file formats, implementing B-treesAssigned, Week 2; foundation for Lab 1
ch. 5Transaction processing & recovery (WAL, ARIES-style thinking)Weeks 2 & 8; Lab 2’s WAL design
ch. 6B-tree variants (B-link, copy-on-write trees)Week 2 skim; copy-on-write returns in Week 8
ch. 7Log-structured storage (LSM-trees, compaction)Assigned, Week 2; the spec for Lab 1
ch. 8–10Distributed systems primer; failure detection; leader electionBackground for Week 4
ch. 11–12Replication, consistency, anti-entropyWeeks 4 & 8
ch. 13–14Distributed transactions; consensusWeek 8, alongside Calvin and Spanner

The alternate on-ramp is Martin Kleppmann’s Designing Data-Intensive Applications (O’Reilly, 2017; dataintensive.net). It covers much of the same territory one level of abstraction up — system properties and trade-offs rather than page formats and cell layouts. If Petrov’s chapter 4 feels like reading a disassembly, start with DDIA’s chapters 3 (storage and retrieval) and 5–9 (replication through consistency), then come back. Students who arrive from an applications background consistently report that DDIA-first, Petrov-second is the gentler path; students who have written a storage engine before can skip DDIA entirely.

§ 3 · Benchmarks & Leaderboards

How Progress Gets Measured — and Mismeasured

Week 9 and Lab 3 lean on two text-to-SQL benchmarks, and you should know both as artifacts, not just as numbers.

What to know about eval quality

Treat every leaderboard number as a measurement made with an imperfect instrument. Independent audits of BIRD’s gold annotations have found a substantial fraction of reference queries that are arguably wrong — ambiguous questions, gold SQL that doesn’t match the stated intent, schema values that contradict the “external knowledge.” This matters in both directions: systems get penalized for correct answers and rewarded for reproducing annotation mistakes. When a model “beats human performance” on a benchmark whose human-written gold labels contain errors, ask what is actually being measured. Lab 3 makes this concrete: part of your grade is auditing your own eval set and reporting the annotation defects you find. The habit generalizes — Week 2’s benchmark skepticism (RUM corners, hardware assumptions) and Week 9’s annotation skepticism are the same skill applied at different layers.

§ 4 · Tools You’ll Touch

The Lab Toolchain

Four labs, four primary tools. Install all of them in Week 1; nothing here takes more than a few minutes to set up, and Lab 1’s toolchain check is due before Week 2.

ToolLabRoleLink
Rust + cargoLab 1 (Weeks 2–4)Implementation language for VLSM, your LSM-tree with vector segments — you’ll own memtables, SSTables, compaction, and a vector access method.rust-lang.org
MinIOLab 2 (Weeks 5–8)S3-compatible object store run locally; the disaggregated storage layer under Mini-Neon’s copy-on-write pages and branches.min.io
DuckDBLab 3 (Weeks 9–10)The analytical engine your text-to-SQL agent targets and your eval harness queries — in-process, zero-ops, full SQL.duckdb.org
Python 3.12+Lab 4 (Weeks 10–12)Host language for the semantic-operator optimizer: logical plans, model-tier physical operators, and cascade calibration.python.org

Secondary dependencies (an LLM API key for Lab 3 — Lab 4’s simulator makes no real API calls — and plotting libraries for lab reports) are listed on each lab page: Lab 1, Lab 2, Lab 3, Lab 4.

§ 5 · Staying Current

After Week 14

This course will be stale in places within a year — Part III especially. The fix is a short list of feeds that have stayed reliable while everything else churned.