DATA 2027 · Week 06 · Part II — New Access Methods & Engines

Vector Indexes Are Access Methods, Not Products

The industry sold approximate nearest-neighbor search as a new category of database. It isn’t one — it’s an index, and we treat it with B-tree rigor.

Lecture 1 — ANN Search: IVF, PQ, HNSW · Lecture 2 — DiskANN and the Recall Axis

Lecture 1 · Tuesday

ANN Search: IVF, PQ, HNSW

Three ideas underlie essentially every ANN index in production — and each trades something away.

L1 · The Query

Every agent issues the same query

L1 · Brute Force

The exact-kNN invoice

L1 · Curse of Dimensionality

Concentration of distances

99%

of a unit ball’s volume at d = 768 lies in the outer 1% of radius. Everything is “far,” by roughly the same amount — so pruning indexes (kd-trees, R-trees) can’t exclude anything and degenerate to a scan.

L1 · The Unlock

The great surrender

L1 · IVF

IVF: k-means wearing a database hat

L1 · IVF

The edge problem is the dial

  • Queries near Voronoi boundaries miss neighbors in unprobed cells.
  • Recall@10: 40% at nprobe = 1, 95% at nprobe = 64.
  • Monotone: more probes, more recall, more time.
  • What IVF does not fix: lists hold raw floats.
  • 100M × 768d is still 307 GB somewhere.
  • That’s the problem PQ attacks.
L1 · Product Quantization

PQ compression

32×

A 128-dim float vector (512 bytes) becomes 16 bytes: m = 16 subvectors of 8 dims, each encoded by an 8-bit codebook of 256 k-means centroids. The Cartesian-product trick: 25616 = 2128 representable points from ≈131 KB of stored centroids. Jégou, Douze & Schmid, TPAMI 2011.

L1 · Product Quantization

Distance without multiplies

L1 · Field Note

What an unindexed scan costs

  • 400M × 1536-dim raw embeddings, brute-forced on GPUs.
  • Monthly bill exceeded the replacing engineer’s salary.
  • Nobody asked “what is the access method here?”
  • After IVF-PQ: 2.4 TB → 38 GB of codes.
  • p99: 1.8 s → 19 ms.
  • At recall@10 = 0.96.
L1 · HNSW

A skip list, generalized to a metric space

layer 2 · sparse · highways layer 1 layer 0 · all N nodes · efSearch beam entry ★ query’s true neighbor greedy descent
Fig. 6.1 — HNSW search: enter sparse top layer, greedy-walk highways, drop a layer when stalled, finish with an efSearch-wide beam on layer 0.
L1 · HNSW

Why the walk doesn’t get stuck

L1 · HNSW

Two knobs govern everything

  • efConstruction (100–500): build-time beam.
  • Bigger → better edges, slower builds.
  • efSearch: query-time beam on layer 0.
  • 10 → recall 0.80 in 30 µs; 200 → 0.99 in 400 µs.
  • It’s nprobe in graph clothes — every ANN family has one such dial.
L1 · HNSW’s Sin

Everything lives in RAM

350 GB

DRAM for one replica of 100M × 768d: 307 GB of full-precision vectors + ~13 GB of graph edges + allocator overhead. DRAM costs ~20× NVMe per byte — that gap is why Thursday’s lecture exists.

Lecture 2 · Thursday

DiskANN and the Recall Axis

The impolite question of 2019: what if the graph lived on SSD?

L2 · DiskANN

One node, one billion vectors

~5 ms

DiskANN (Microsoft Research, 2019): a single 64 GB-RAM machine with an NVMe drive serves a billion vectors at 95%+ recall. The 2019 baseline was a DRAM cluster an order of magnitude more expensive.

L2 · Vamana

A flatter graph, built for blocks

L2 · DiskANN

PQ steers in RAM, SSD stores the truth

RAM · ~32 GB for 1B vectors PQ codes · ~32 B each cheap, slightly wrong distances navigation runs here SSD · graph + full vectors 4 KB block = full vector + adjacency list together one read → geometry + next hops consulted along the path beam W = 4–8 reads final candidates → exact re-rank ~10 rounds × 4 parallel 4 KB reads ≈ 40 I/Os ≈ 1–4 ms
Fig. 6.2 — DiskANN’s split: PQ demoted from index to steering mechanism; the exact re-rank at the end repairs its distortion.
L2 · Beam Search

Hiding SSD latency

L2 · Historical Aside

The arc rhymes with 1970s hashing

L2 · Updates

Graphs hate deletes

L2 · Convergence

Read it back slowly: that’s an LSM-tree

L2 · The Recall Axis

RUM grows a fourth axis

L2 · The Recall Axis

Recall is a budget, per query

  • Agent re-ranking 50 candidates via LLM: buy cheap recall-0.85, let the model absorb noise.
  • Exact-match memory lookup (“already filed this ticket?”): pay for 0.99+.
  • Per-query targets are an API surface.
  • Agents already set temperature.
  • min_recall is the same kind of dial.
L2 · Three Families

Pick your trade

PropertyIVF-PQHNSWDiskANN
Memory, 100M × 768d~3–6 GB codes~350 GB DRAM~6–10 GB RAM + ~340 GB SSD
QPS / latencyHigh QPS, ~1–10 ms10⁴–10⁵ QPS, 0.1–1 ms~10³ QPS, 2–10 ms
RecallNeeds re-rank above ~0.950.99+ via efSearch0.95–0.99 with re-rank
UpdatesAppends easy; retrain on driftDeletes poison → segment + rebuildRAM delta + tombstones + merge
Use whenHuge N, tight memoryLatency is king, DRAM paid forBillion-scale; cost is king
L2 · The Agent Angle

Vector-as-feature beat vector-as-product

Recall is the fourth resource. You don’t maximize it — you budget it, query by query, like CPU.
— Week 6 lecture notes, DATA 2027
L2 · Checkpoint

Discussion questions

L2 · Readings

Read before Thursday