DATA 2027 DATA SYSTEMS IN THE AGENTIC ERA ← SCHEDULE
Instructor's Guide

Teaching a Course That Doesn't Exist Yet

Everything on this site is real coursework; only the registrar entry is fictional. This page is for the person who wants to fix that — what to keep, what to cut, what to strip from the instructor materials before students see them, and what it costs you in grading hours.

Audience: faculty & TAs  ·  License: CC BY 4.0 — adapt freely, attribute somewhere  ·  Prereq for teaching: read both companion essays and run all four labs' starter code yourself first
This guide covers
  • Why the course is sequenced stack-upward, and how the central thesis doubles as a grading instrument.
  • Three pacing variants: 14-week semester, 10-week quarter, undergraduate adaptation.
  • Lab logistics — what ships ready-made, what you must prepare, and the expected grading load.
  • Enforcing "accountability, not abstinence" when frontier models are allowed everywhere.
  • Re-skinning the Vantage Retail Group trap schema for your institution's domain.
1 · Philosophy

Why Stack-Upward, and Why Frontiers

The course climbs from storage engines to agent governance because that is the order in which the thesis can be tested rather than asserted. The claim — the client changed, the physics didn't — is empty until students know what the physics are. So Part I re-reads B-trees, LSM-trees, and column stores as workload bets; Part II shows which bets the new client violates (recall as a first-class axis, branchable storage, learned components); Part III climbs to where accuracy actually lives; Part IV asks whether any of it is genuinely new. Teach it top-down and every claim about vector indexes or semantic layers becomes vendor folklore. Teach it bottom-up and Week 6's "a vector index is an access method, not a product" lands as an observation students can verify with amplification arithmetic they did in Week 2.

The thesis is also your grading instrument. Almost every assessment in the course reduces to one move: identify the classical assumption, show what the agentic workload does to it, measure the consequence. When you grade an exam answer or a project paper and can't find that move, the answer is wrong even when it is fluent — and fluency is now free, which is the point.

This is why everything is graded on Pareto frontiers and ablation tables rather than point estimates. A single headline number ("95% accuracy," "2 ms p99") is exactly the artifact a frontier model can produce on request, and exactly the artifact vendors lead with. A frontier requires the student to have actually swept the trade-off; an ablation requires them to have actually built each configuration. Both are cheap to verify (re-run the harness) and expensive to fake (you'd have to build the system anyway). The grading scheme is the LLM policy, expressed in points.

The grading scheme is the LLM policy, expressed in points.
2 · Pacing

Three Ways to Schedule It

The 14-week semester is the design target: four parts, four labs, a project with real incubation time. The compressions below have been thought through; cutting anything else starts to sever load-bearing dependencies (Week 2's amplification arithmetic feeds Lab 1; Week 9's curve feeds Lab 3 directly).

VariantScheduleWhat changesWhat it costs
Semester
14 weeks
As published. Labs in weeks 2–4, 5–8, 9–10, 10–12; project weeks 9–14; exam after Week 14. Nothing. Nothing. This is the course.
Quarter
10 weeks
Cut Week 3 (column stores / vectorized execution) and Week 13 (self-driving systems); merge Weeks 11 and 12 into one "memory + protocols" week — Mem0/Graphiti on Tuesday, MCP and the lethal trifecta on Thursday. Assign Week 3's two readings as background for Lab 1's quantization milestone; fold Week 13's self-design question into the Week 14 debate. Keep all four labs but overlap Labs 3 and 4 fully. Students meet vectorized execution as a reading, not a lecture; the memory week loses its second lecture's depth. Acceptable. Cutting any Part I week other than 3 is not.
Undergrad
14 weeks
Same lecture sequence, slower lab ramp. Drop Lab 1 Milestones 3–4 (graph-merging compaction and the full frontier harness) — students stop at a working SSTable format plus per-segment index, graded on correctness and a single measured trade-off. Replace the formal parts of exam Q1 (RUM-R) and Q4 (the optimizer's new objective) with structured essays: argue the four-axis conjecture and the multi-objective optimizer in prose, with one worked numeric example instead of a formalization. Reweight Lab 1 to 10% and the project to 40%, or add a fifth problem-set week. Keep Labs 3 and 4 intact — they are the most undergrad-accessible and the most employable. The course loses its hardest systems-building rite of passage. The thesis survives; the calluses don't.
3 · Labs

Running the Labs

Every lab ships with machine-tested starter materials under materials/ — Rust skeleton for Lab 1, Go pageserver for Lab 2, the full warehouse DDL and reference semantic layer for Lab 3, the deterministic simulator for Lab 4. What ships is the instructor edition. Four preparation tasks are yours, and two of them are about keeping answer keys out of student hands.

LabProvidedYou mustGrading load (pairs)
1 · VLSM
Rust, wks 2–4
Cargo skeleton (memtable.rs, sstable.rs, workload.rs), deterministic workload generator, bench.rs harness stub. cargo test passes on the skeleton. Distribute the grading suite — the hidden workload traces and recall ground truth — at the end of Week 2, after Milestone 1 is in. Releasing it with the skeleton invites overfitting the file format to the trace; withholding it past Week 2 makes Milestone 4's frontier unbuildable. ~45 min/team: re-run their harness, eyeball the frontier for swept (not cherry-picked) points, read the ablation.
2 · Mini-Neon
any lang, wks 5–8
Go reference types, WAL generator (cmd/walgen), pageserver and branch skeletons. go vet clean. Stand up MinIO (or any S3-compatible store) before Week 5 — one shared instance with per-team buckets is fine; the lab's GetPage@LSN latencies assume object storage with real network round-trips, so do not let teams substitute the local filesystem. Budget an hour for credentials and a smoke test with walgen. ~60 min/team: the 50-branch demo plus GC must run from their tag against your MinIO, not theirs.
3 · Text-to-SQL
Python, wks 9–10
Full 126-table Vantage DDL, 50 gold questions, reference semantic layer, grader stub. The shipped schema.sql is the instructor edition: it carries 40 -- TRAP comments annotating every salted dysfunction. Strip them (one grep -v) to produce schema_student.sql before distribution, build the DuckDB snapshot, pin its SHA-256 in MANIFEST. Hold back questions.jsonl and the reference semantic_layer.yaml entirely — the gold questions are your held-out set and your leakage detector. ~90 min/team: re-run the ablation grid, run the leakage diff, score their grader against your near-miss cases.
4 · Semantic optimizer
Python, wks 10–12
simulator.py, make_data.py (seeded, byte-identical corpus), operator stubs, 500 sanctioned calibration labels. The simulator students receive necessarily contains ground_truth_filter — but the grading salt and re-labeled held-out configuration stay with you. Generate a fresh salt per cohort, never commit it, and cap teams at five graded submissions so the autograder can't be used as an oracle. The grader re-labels under your salt; gold-mined plans collapse on contact. ~45 min/team: mostly automated; the human time goes to reading the cascade-guarantee argument in the report.

Total per cohort of 15 pairs: roughly 60 TA-hours across the term, front-loaded at lab deadlines. The deterministic harnesses are what make this tractable — every number in every report regenerates from a Git tag, so grading is re-running, not believing.

4 · The Model Policy

Accountability, Not Abstinence, in Practice

The policy — frontier models allowed everywhere, including the exam — reads as permissive. Enforced properly, it is the opposite: it relocates all integrity weight onto artifacts that models cannot fake, then makes faking them deterministically detectable.

Require methods notes. Every lab report and the exam carry a mandatory model-usage section: which models, for what, what they got wrong, what was kept. Grade it for specificity, not contrition — "Claude wrote the first compaction loop; it deadlocked on concurrent flush; here is the fix" is an A-grade sentence. Students learn quickly that the note is free marks if honest and a liability if vague, which is exactly the incentive you want them to carry into industry.

Know what cheating looks like when models are legal. It is never "used an LLM." It is:

Both detections work only because the harnesses are deterministic — pinned model via the course proxy, temperature 0, fixed seeds, pinned snapshots. Preserve that determinism above all else when you adapt the course; it is the entire enforcement mechanism. A policy of "use anything, but every number must reproduce from your tag on our hardware" needs no honor code. The honor code is make reproduce.

What is graded is what models can't fake: Pareto frontiers from your own benchmarks, ablation tables, and designs you can defend in a hallway argument.— the course LLM policy, index page
5 · Adaptation

Re-Skinning the Trap Schema

Vantage Retail Group is a synthetic retailer because retail is universally legible, but the schema works better when it smells like your institution's domain — a hospital network, a university ERP, a logistics firm. Re-skinning is safe if you understand what the 50 gold queries actually depend on: identifiers and data values, not semantics. The rules:

Budget half a day. The renaming is an hour; convincing yourself the 50-for-50 diff is genuinely clean is the rest, and it is not optional.

6 · Provenance

Where This Courseware Came From

Honest provenance note This courseware was authored by AI agents, working from a verified research base — the site's two companion essays, Essay № 01 and Essay № 02, whose claims and citations were checked against primary sources. Every code artifact was machine-tested before publication: the Lab 1 skeleton passes cargo test, the Lab 2 reference passes go vet and builds, the Lab 3 DDL and gold queries load and execute in DuckDB, and the Lab 4 simulator is deterministic by construction and verified seed-for-seed. None of that substitutes for your judgment. Machine-tested means the code runs, not that the pedagogy is right for your students; verified-base means the claims trace to sources, not that they will age well. Read every page you assign, run every lab you grade, and disagree where you disagree — then send the disagreement back. Corrections, adaptations, and new trap schemas are welcome as pull requests; the license is CC BY 4.0 precisely so this can become a course maintained by the people who teach it.