Models write fluent SQL against toy schemas and confident nonsense against real warehouses — the gap is a missing contract, not missing intelligence.
Lecture 1 — Why Enterprise Text-to-SQL Fails · Lecture 2 — The Semantic Layer as Schema-for-Models
For three years the problem looked solved. Then the benchmark stopped lying.
Spider 1.0 execution accuracy vs. Spider 2.0 at launch (late 2024). Nothing about the models got worse — the benchmark stopped hiding the task.
student.age.amount, amount_usd, amt_net, total_amount_v2.strftime where FORMAT_TIMESTAMP is needed.INFORMATION_SCHEMA, sample rows, build intermediates.price * quantity”.Accuracy of state-of-the-art text-to-SQL on MIT’s own warehouse, with realistic questions from its actual users. Not adversarial — merely ordinary.
churn-adjacent columns, three analyst generations.The meaning contract already exists. BI invented it to keep dashboards consistent.
# semantic_layer/metrics/net_revenue.yml
metric: net_revenue
model: fct_order_lines # grain: one row per order line
expr: sum(amount_usd) - sum(refund_amount_usd)
filters:
- field: account_type
operator: not_in
values: [internal, test]
dimensions: [fiscal_month, region, plan_tier]
joins:
- { to: dim_customers, type: many_to_one, on: account_id }
owner: finance-analytics # a human team, not a model
Anthropic’s self-service analytics: raw schema access vs. curated “skills” encoding metric definitions, join guidance, and pitfalls. Same model, 4.5× the accuracy — the expensive ingredient was analyst time, not GPUs.
Cube’s paired benchmark: one ~4 KB document of metric and join definitions moved accuracy +17 to +23 percentage points — while a schema dump alone can be 100K tokens of mostly noise.
owner: finance-analytics — the entire governance model in seventeen characters.| Failure mode | Mitigation | Failure becomes… |
|---|---|---|
| Wrong join | Governed join graph + declared grain | Compile-time error (loud) |
| Wrong metric variant | Named, owned metric definitions | A clarifying question (“net or gross?”) |
| Hallucinated column | ≈4 KB curated retrieval surface | Rejected before SQL exists |
| Stale schema | Versioned layer, CI-tested against warehouse | A failed build, pre-production |