Two storage architectures, three amplifications, and one triangle nobody has escaped.
Lecture 1 — B-trees: the disk made me do it · Lecture 2 — LSM-trees and the amplification triangle
The least clever data structure that takes the page seriously.
levels instead of thirty — and levels 1–3 total 1 + 250 + 62,500 ≈ 63 k pages ≈ 250 MB, pinned in memory. A point lookup costs one device read: the leaf.
steady-state page occupancy under random inserts (ln 2) — a permanent ~1.44× space tax for being update-friendly. Monotonic keys + split-at-insertion-point cheat to ~100%-full pages.
VACUUM, OPTIMIZE TABLE.What if we never update in place at all?
write amplification ≈ 1 (WAL) + 1 (flush) + 10 × 4 (descents) — the folklore “~10× per level, 40–50× overall” is just this sum. Size-tiered, 3 tiers: WA ≈ 5.
B-tree WA for a 128-byte update: 4 KB leaf dirtied + full-page write in WAL = 8 KB for 128 bytes. The LSM often amplifies less, and sequentially — its sin is that the cost is deferred, bursty, and eats your p99.
** Compaction Stats [default] **
Level Files Size(GB) Read(GB) Write(GB) W-Amp
L0 4/0 0.25 0.0 62.1 1.0
L1 10/1 0.62 601.7 598.9 9.6
L2 98/3 6.21 580.4 577.8 9.3
L3 940/8 62.05 551.2 549.0 8.9
Sum 1733.3 1787.8 28.8
Friday’s lab: predict the W-Amp column before you run it.
| Dimension | B+-tree | LSM leveled | LSM size-tiered |
|---|---|---|---|
| Write amp | ~30–60×, random | ≈ 40–50×, sequential | ≈ 5×, sequential |
| Point reads | 1 I/O | ≈ 1 I/O w/ Bloom | ≈ 1.1 I/O w/ Bloom |
| Range scans | excellent | good, no Bloom help | poor, no Bloom help |
| Space amp | ~1.44× | ≈ 1.1× | up to ~2× transient |
| Concurrency | latching, lock coupling | immutable SSTables; cost moves to compaction stalls | |