DATA 2027 · Week 08 · Part II — New Access Methods & Engines

Transactions & Branching for Agent Swarms

When ten thousand agents want to try something before they mean it, the transaction abstraction stretches in two directions at once.

Lecture 1 — Distributed Transactions: Calvin vs Spanner · Lecture 2 — Copy-on-Write Branching as Product

Lecture 1 · Tuesday

Distributed Transactions: Calvin vs Spanner

Two poles, both published in 2012, still defining the trade space.

L1 · Why agents stress transactions

Agents are greedy transactional clients

L1 · Sixty seconds of serializability

Two promises, one crucial gap

  • Serializable: effects equal some serial order.
  • Existential quantifier — the system picks any order.
  • Strict serializable (external consistency): adds real time.
  • T₁ commits before T₂ begins → T₁ ordered first.
L1 · Sixty seconds of serializability

Why the gap bites agents

L1 · Calvin

Calvin: order first, execute later

L1 · Calvin

Determinism scales

500,000

txns/sec on TPC-C-like workloads, 100 commodity machines — in 2012, while Spanner-style designs paid cross-datacenter round trips per commit.

L1 · Calvin

Calvin’s price is structural

L1 · Spanner

Spanner: trust the clocks

L1 · Spanner

The cost of external consistency

≈ 2ε

expected commit-wait stall — about 5–10 ms per commit, paid so non-overlapping transactions need no communication at all.

L1 · Spanner

What Spanner pays — and buys

  • Pays: 2PC prepare + commit across Paxos groups.
  • Cross-region read-write txns: 10–100 ms.
  • Buys: lock-free snapshot reads at any past timestamp.
  • For read-heavy agent fleets, that beats write latency.
L1 · The poles, side by side

Calvin vs Spanner

DimensionCalvin (SIGMOD ’12)Spanner (OSDI ’12)
OrderingPre-agreed global logTrueTime + 2PL
Distributed commitNone — determinism2PC over Paxos
Interactive txnsNo; recon + re-submitYes, full BEGIN…COMMIT
ConsistencySerializableExternally consistent
Latency tax~10 ms epochs + recon2ε wait + 2PC rounds
Best agent fitDeclared high-throughput writesRead-mostly global snapshots
L1 · MVCC, the bridge to Thursday

Your engine already keeps many realities

Lecture 2 · Thursday

Copy-on-Write Branching as Product

The mechanism from one B-tree page up to a cloud product.

L2 · The agentic workload

Speculation is the defining workload

L2 · CoW from one page

LMDB: every commit is a tiny fork

L2 · CoW at cloud scale

Neon: time travel as a query parameter

L2 · Branch = a new timeline

Forking the WAL at an LSN

main (T1) — WAL → LSN 0/52A0 LSN 0/9F10 agent-7f/try-migration (T2) own WAL: dirty pages only agent-7f/plan-B (T3) abandoned → GC reclaims pages unwritten since fork: resolved from parent @ fork LSN branch create = 1 metadata record: (child, parent, fork LSN). O(metadata), ~10s of ms.
Fig. 8.1 — Branches as timelines forking from the parent’s WAL. Solid child: live delta. Dashed child: abandoned, GC’d; parent history stays pinned to the oldest live fork point.
L2 · Branch = a new timeline

O(metadata), not O(data)

~50 ms

to create a branch — one metadata record (child, parent, fork LSN), no pages copied, whether the parent holds 1 GB or 10 TB. The parent can’t even tell it has children.

L2 · The arithmetic

Ten thousand branches an hour

L2 · The arithmetic

What is not trivial

L2 · Worked example

50 branches, 1% divergence each

StrategyFormulaTotal storageCreate time
50 full copies (pg_dump/restore)50 × 64 GB3,200 GB~40 min each
50 CoW branches @ 1% divergence64 + 50 × 0.64 GB96 GB~50 ms each
Break-even divergenceshared wins until ≈ 98% dirty

Dirty pages are private — divergent versions can’t be shared, so deltas add linearly.

L2 · Worked example

The headline ratio

33×

storage reduction vs full copies. Caveat: at 1% divergence per day, a branch parked three months pins a quarter-year of parent history — branches are a speculation primitive, not an archival one.

L2 · The agent transaction shape

Long think-time, optimistic wins

L2 · Field note, 2025

Prompts are not idempotency keys

L2 · The safe write path

Branch → validate → merge

agent writes never touch main directly 1 · BRANCH fork at LSN L zero blast radius 2 · VALIDATE deterministic gate invariants + tests 3 · MERGE replay onto main OCC check vs L conflict → re-fork from new head, agent reconciles interactive exploration → declared, replayable change set
Fig. 8.2 — Speculation on cheap CoW timelines; commitment through a narrow deterministic gate.
L2 · The safe write path

Calvin, rediscovered at the workflow layer

A branch is a transaction that lived long enough to get a name.
— Week 8 lecture notes
Checkpoint · Discussion

Before you leave

Readings · Week 08

Read before Thursday