Why Open-Model
You may use anything: every paper, every system, and every frontier model you can reach. We mean this literally, and not as a trap. A closed-book exam in 2027 would certify a skill — unaided recall under time pressure — that no working systems researcher has needed since the field began carrying its literature in its pocket. Pretending otherwise teaches the wrong lesson twice: that the models don’t exist, and that your value lies in what they already do well.
So the exam is calibrated against the models. Each question was given, in draft form, to the strongest frontier models available, and revised until their unaided answers were fluent, confident, and wrong in instructive ways — missing the load-bearing assumption, citing the right paper for the wrong claim, formalizing the easy half of the problem. The questions reward exactly what the models still struggle to fake: holding a storage-layer invariant and a model-layer behavior in your head simultaneously, and noticing where they contradict each other. If you can prompt your way to a strong answer, you will have done the synthesis yourself in the prompting.
Practical advice from previous drafts of this exam: budget your 72 hours for thinking, not typing. A two-page answer that nails the central tension beats a ten-page survey every time, and the rubric below is built to guarantee it.
Grading rubric
| Dimension | Weight | What it measures |
|---|---|---|
| Rigor | 40 | Are the claims precise, the derivations correct, the assumptions stated? A wrong number with shown work outscores a right number asserted. |
| Synthesis | 30 | Does the answer genuinely connect the two layers in tension, or summarize each side and gesture at the gap? |
| Formalization | 20 | Definitions that could survive a hostile reviewer: trade-off spaces with named axes, protocols with failure cases, conjectures stated tightly enough to refute. |
| Writing | 10 | Structure, economy, and the honesty to flag what you don’t know. |
Five Tensions, Seventy-Two Hours
The RUM conjecture says an access method can optimize two of read overhead, update overhead, and memory overhead only at the expense of the third. Approximate access methods break the conjecture’s framing: they buy performance with a currency RUM doesn’t price — recall. Formalize a fourth axis, Recall, for approximate access methods. Define the four-dimensional trade-off space precisely: what does each axis measure, against what workload, and what does it mean for an index to be “optimal” on an axis? Then prove or refute a four-way impossibility result analogous to the original conjecture — does fixing any three axes bound the fourth? Finally, place HNSW, DiskANN, and your Lab 1 engine as points (or curves) in the space you defined, and defend the placement with measurements or derivations.
What a strong answer engages with. The original RUM framing (Athanassoulis et al.) and what its proof technique actually depends on, since “analogous” is doing heavy lifting in this question. The structural reasons HNSW (TPAMI 2018) and DiskANN (NeurIPS 2019) sit at genuinely different points — memory-resident graph navigability versus SSD-aware layouts — rather than at different parameter settings of the same point. And a clear-eyed treatment of whether recall is an axis like the others or a constraint of a different type, which is where most fluent-but-wrong answers quietly collapse.
Aurora’s claim is “the log is the database”: redo processing is pushed to a distributed storage tier and replicas hang off the shared log. Neon’s claim is “the log is the database, and the database is a tree of branches”: copy-on-write pages over a shared WAL make a branch nearly free at creation. Now impose an agentic workload: 10,000 speculative branches per hour, each receiving a small number of writes, most abandoned within minutes, a few merged or promoted. Derive the write-amplification and storage-cost consequences of this workload under each architecture — state your model of page sizes, branch lifetimes, and write skew explicitly — and identify the workload crossover point: at what branch creation rate, lifetime distribution, or write profile does each architecture dominate?
What a strong answer engages with. The actual mechanisms in the Aurora paper (SIGMOD 2017) — what “the log is the database” commits you to when a branch is not a replica — versus Neon’s page-server/safekeeper split and where copy-on-write defers cost rather than eliminating it. The derivation must separate creation cost, carrying cost, and abandonment cost, because the architectures differ on different terms. The best answers notice that “crossover point” is a surface in several variables, not a number, and choose which slice of it to compute honestly.
What Goes Around Comes Around… And Around (Stonebraker & Pavlo, 2024) has a fixed argumentative structure: a data-model insurgency arises, wins mindshare, and is then absorbed by the relational incumbent, which adopts the feature and keeps the market. Write the section of the 2034 sequel that covers semantic operators. Argue one side: either LOTUS-style operators get absorbed into SQL — in which case cite the specific historical precedent whose dynamics you are claiming repeat, and show the absorption mechanism — or the accuracy/cost dimension makes semantic operators the first true escape from the relational gravity well, in which case explain precisely which property of the relational contract they break that XML, objects, and documents did not. Match the essay’s voice: name winners, assign blame, commit to predictions.
What a strong answer engages with. The mechanics of the precedents as the 2024 essay actually tells them — why objects lost, why JSON was absorbed rather than victorious — not folklore versions of them. The LOTUS (VLDB 2025) and Palimpzest (CIDR 2025) treatment of accuracy as an explicit optimizer objective, which is the property any escape argument must rest on and any absorption argument must explain away. Strong answers also confront the strongest counter-case to their chosen side; the essay being imitated never pretends its opponents had no point.
System R’s dynamic programming rests on a principle of optimality: the best plan for a query contains the best plans for its sub-expressions, so the optimizer can prune any sub-plan dominated on cost (modulo interesting orders). Semantic-operator optimizers cannot rank plans by cost alone, because plans now carry a (cost, accuracy) pair and the objective is a trade-off, not a total order. Show formally why System R’s pruning breaks under a (cost, accuracy) partial order: construct a query and a pair of sub-plans where the dominated-on-cost sub-plan participates in the globally preferable complete plan. Then propose a plan-enumeration strategy that does not break — define what your optimizer keeps per equivalence class, bound how much it keeps, and state the conditions under which it remains exact versus heuristic.
What a strong answer engages with. The precise statement of the optimality principle in Selinger et al. (1979), including the interesting-orders mechanism — which is the historical precedent for “keep more than one plan per class” and which weaker answers fail to notice cuts both ways. How accuracy composes across operators (it is neither additive like cost nor preserved like sortedness), drawing on how LOTUS and Palimpzest actually model it. And the size of the Pareto frontier the proposed strategy must carry, with an argument about when it stays tractable.
Two agents run under read-committed. Each reads a table, summarizes it into its own memory store, and later acts on that summary. Between read and action, the table changes; each agent’s behavior is now driven by a belief that was never false at any single instant but is false now — a semantic phantom. Note that no classical anomaly has occurred: every read was committed data. Define an isolation level for derived-belief consistency: state what it guarantees about the relationship between a memory entry and the base data it summarizes, give the anomaly it forbids as precisely as the ANSI levels give theirs, and specify a versioning protocol that enforces it (what gets timestamped, what gets invalidated, who is notified). Then argue where the mechanism belongs — in the DBMS, in the memory layer, or in the protocol between them — and what each placement costs.
What a strong answer engages with. The formal style of the isolation literature — Berenson et al.’s critique of the ANSI levels is the model for how to define an anomaly without hand-waving — extended to reads that pass through a lossy summarization function, which is what makes this not just a stale-materialized-view problem. The actual write paths of Mem0 (arXiv 2504.19413) and Graphiti (arXiv 2501.13956), since a protocol that cannot be retrofitted onto at least one real memory system is a protocol for a system that doesn’t exist. And an honest cost model for the placement argument: invalidation traffic, false invalidations under summarization, and who pays at scale.