Snowflake said the warehouse is rented compute over object storage; Amazon said the log is the database — this week we take both literally.
Lecture 1 — Aurora: the Log Is the Database · Lecture 2 — Snowflake and the Elastic Warehouse
Verbitski et al., SIGMOD 2017 — ship only the redo log, and let storage do the rest.
reduction in write traffic when only redo records cross the network — a few hundred bytes of intent, not megabytes of pages.
more transactions than mirrored MySQL over a 30-minute SysBench write-only run — with 7.7× fewer I/Os per transaction.
to re-replicate a lost 10 GB segment over a 10 Gbps link. A second failure only hurts inside that window, in the same protection group. Big segments would stretch it to hours.
Dageville et al., SIGMOD 2016 — what if compute owned nothing at all?
GetPage@LSN(tenant, timeline, rel, blkno, lsn)
→ 8 KB page image
to branch a 2 TB database — a branch is just (parent timeline, LSN) plus its own WAL after. A hundred branches diverging by megabytes cost a few hundred megabytes total.
| Axis | Aurora (2017) | Snowflake (2016) | Neon (2020s) |
|---|---|---|---|
| Crosses network on write | Redo records only | New micro-partition files | WAL stream |
| Durability | 4/6 quorum, 3 AZs | S3 + replicated metadata | 2/3 Paxos, then S3 |
| Unit of elasticity | Read replicas | Warehouses, per second | Compute → zero per branch |
| Materialization | Storage coalesces redo | Compute writes files | Delta + image layers |
| Cost of a full copy | Volume clone (CoW) | Zero-copy clone | Branch = (parent, LSN) |
ALTER TABLE on the production timeline.