The industry sold approximate nearest-neighbor search as a new category of database. It isn’t one — it’s an index, and we treat it with B-tree rigor.
Lecture 1 — ANN Search: IVF, PQ, HNSW · Lecture 2 — DiskANN and the Recall Axis
Three ideas underlie essentially every ANN index in production — and each trades something away.
of a unit ball’s volume at d = 768 lies in the outer 1% of radius. Everything is “far,” by roughly the same amount — so pruning indexes (kd-trees, R-trees) can’t exclude anything and degenerate to a scan.
nlist ≈ √N centroids (~10,000 for 100M).nprobe lists.nprobe = 10 of 10,000 → scan ~0.1%: a 1000× reduction.nprobe = 1, 95% at nprobe = 64.A 128-dim float vector (512 bytes) becomes 16 bytes: m = 16 subvectors of 8 dims, each encoded by an 8-bit codebook of 256 k-means centroids. The Cartesian-product trick: 25616 = 2128 representable points from ≈131 KB of stored centroids. Jégou, Douze & Schmid, TPAMI 2011.
efConstruction (100–500): build-time beam.efSearch: query-time beam on layer 0.nprobe in graph clothes — every ANN family has one such dial.DRAM for one replica of 100M × 768d: 307 GB of full-precision vectors + ~13 GB of graph edges + allocator overhead. DRAM costs ~20× NVMe per byte — that gap is why Thursday’s lecture exists.
The impolite question of 2019: what if the graph lived on SSD?
DiskANN (Microsoft Research, 2019): a single 64 GB-RAM machine with an NVMe drive serves a billion vectors at 95%+ recall. The 2019 baseline was a DRAM cluster an order of magnitude more expensive.
temperature.min_recall is the same kind of dial.| Property | IVF-PQ | HNSW | DiskANN |
|---|---|---|---|
| Memory, 100M × 768d | ~3–6 GB codes | ~350 GB DRAM | ~6–10 GB RAM + ~340 GB SSD |
| QPS / latency | High QPS, ~1–10 ms | 10⁴–10⁵ QPS, 0.1–1 ms | ~10³ QPS, 2–10 ms |
| Recall | Needs re-rank above ~0.95 | 0.99+ via efSearch | 0.95–0.99 with re-rank |
| Updates | Appends easy; retrain on drift | Deletes poison → segment + rebuild | RAM delta + tombstones + merge |
| Use when | Huge N, tight memory | Latency is king, DRAM paid for | Billion-scale; cost is king |