DATA 2027 · Week 11 · Part III — Semantics, Agents, Governance

Memory Is a Database Problem

Every agent memory system shipped since 2023 is a storage engine wearing a trench coat. This week we take the coat off and grade what’s underneath.

Lecture 1 — Agent memory systems, read as database designs · Lecture 2 — Temporal knowledge, consolidation, and forgetting

Lecture 1 · Tuesday

Agent Memory Systems, Read as Database Designs

Strip the branding off any memory product and you find a workload spec.

L1 · The Workload

Four requirements, no branding

L1 · The Workload

You learned these in week two

  • Episodic append → sequential append
  • Semantic recall → secondary indexes
  • Namespace isolation → multi-tenancy
  • Point reads → primary-key lookup

The physics didn’t change. The client did.

L1 · The Workload

Writes are not the hard part

~200k

episodic appends/day for a support agent at 10k conversations × 20 turns — trivial; a single Postgres instance yawns at it. The hard part is recall on every turn.

L1 · The Workload

Why you must index

17s → 1.4s

p95 end-to-end latency: full-history replay vs. retrieval over distilled memories on LOCOMO-length conversations (Mem0 paper) — with >90% fewer tokens billed. Not an optimization; the difference between a product and a demo.

L1 · Semantic Recall

Recall means hybrid retrieval

L1 · Mem0

Mem0: an LLM-driven upsert pipeline

new turn + rolling summary 1 · EXTRACT LLM emits short candidate facts 2 · CONSOLIDATE retrieve similar, LLM picks an op ADD UPDATE DELETE NOOP = a CDC upsert resolver, with an LLM as the comparator index over extracted facts, not raw transcript
Mem0 (arXiv 2504.19413) write path: extraction, then consolidation choosing ADD / UPDATE / DELETE / NOOP.
L1 · Mem0

The payoff is real

+26%

relative improvement over OpenAI’s built-in memory on LOCOMO question answering (LLM-as-judge). The graph variant Mem0ᵍ adds a couple more points on temporal and multi-hop questions.

L1 · Mem0

Now grade it as a database

L1 · MemGPT

MemGPT: the buffer pool, rediscovered

  • Main context = RAM: scarce, fast, fixed-size.
  • External context: recall + archival storage, unbounded.
  • Reachable only via model-issued function calls.
  • Main context = buffer pool.
  • Eviction-with-summary = lossy page replacement.
  • Warning at ~70% occupancy = high-water mark flush.
L1 · MemGPT

The application as its own buffer manager

L1 · Field Note

The eight-month primary key

L1 · Report Card

Three systems, one rubric

PropertyMem0MemGPTZep / GraphitiA DBMS would say
Write pathLLM extract → LLM upsertSelf-directed tool callsEdges into temporal graphLog first, derive later
Read pathVector top-k over factsModel-issued searchCosine + BM25 + graphOptimizer picks the path
DurabilityLossy at extractionLossy at evictionEdges invalidated, keptWAL or it didn’t happen
AuditOverwrites destroy historyEdits unversionedBi-temporal lineageEvery version AS OF
ConsistencyAsync → stale readsSerial until you shardSummaries can lagDefine isolation, enforce it
Lecture 2 · Thursday

Temporal Knowledge, Consolidation, and Forgetting

Mutable key–value memory is the wrong data model. The right one is from the 1990s.

L2 · Temporal Model

Facts are intervals, not values

L2 · Graphiti

Bi-temporal edges: two timelines per fact

  • Valid time: t_valid / t_invalid
  • When the fact held in the world.
  • Transaction time: learned / superseded
  • The database’s own epistemic history.

Textbook bi-temporality: Snodgrass’s TSQL2 (1995), standardized in SQL:2011.

L2 · Graphiti

Contradiction as invalidation

time → Mar 03 Mar 17 contact_for(acme) = Marcus t_valid: Mar 03 t_invalid: Mar 17 closed, not deleted contact_for(acme) = Dana t_valid: Mar 17 t_invalid: ∅ new edge invalidates old AS OF Mar 10 → Marcus AS OF now → Dana
The Dana edge closes the Marcus edge’s interval (t_invalid := Mar 17) but preserves the row; “current truth” is t_invalid IS NULL.
L2 · Graphiti

Three queries overwrite stores can’t answer

L2 · Graphiti

Temporal structure pays rent

94.8%

Zep on Deep Memory Retrieval vs. MemGPT’s 93.4% — plus up to 18.5% accuracy gains on LongMemEval at ~90% lower latency than full-context baselines. It retrieves the right version of a fact.

L2 · Prior Art

The field did this in the 90s

L2 · Consolidation

Consolidation is LSM compaction

L2 · Forgetting

Forgetting is garbage collection

L2 · Open Problem

The semantic phantom

09:00 summary: “contact is Marcus, renewal likely” 09:05 episode: Marcus left. graph edge invalidated ✓ 09:12 agent reads STALE summary → emails Marcus 09:30 lazy consolidation regenerates summary every base read was consistent — the action was still wrong
A belief matching no committed base fact stays readable because it lives in a derived representation — a phantom-read analogue.
L2 · Open Problem

Why no classical answer fits

An agent that cannot say why it believes something is just a cache with opinions.
— Week 11 lecture notes, DATA 2027
Checkpoint · Discussion

Before you leave

Readings

Read before Thursday