Lumen
Natural-language analytics that runs against real warehouse schemas, not toy databases.
The problem
Most NL-to-SQL demos look fine on a five-table sample. Production warehouses
do not cooperate. Table names repeat across schemas. Join paths that worked in
a tutorial fail when the catalog has fifty tables and the model invents
Customer.Email because it
sounds right. The failure mode is not always a loud syntax error — sometimes
the query runs and returns the wrong slice of data.
Ambiguity shows up the same way. "Top customers in California" could mean revenue, order count, or recency. A system that picks one interpretation silently and executes is dangerous. The gap between a slick chat UI and something you would attach to a real catalog is mostly engineering around the model: validation, semantics, and surfacing what was assumed.
The approach
Semantic layer
Business entities, metrics, and relationships live in YAML, declared separately from physical foreign keys. The generator reasons about revenue by region against named metrics, not against whatever join the model guessed from column names alone. The semantic directory is validated against the live warehouse before a question is answered.
Schema-aware validation
Generated SQL is parsed with sqlglot and checked against the introspected
catalog. Unknown tables and columns are collected into a structured issue
list — not a single database error at execution time. When validation fails,
the model gets a correction prompt with the rejected SQL and available column
names. Up to two retry rounds after the first generation. Read-only:
INSERT,
UPDATE, and DDL are
rejected before they reach the warehouse.
Explain-back
Before SQL is drafted, an interpreter returns structured JSON: a one-line
restatement, entities and metrics in play, filters, sort, limit. Underspecified
questions produce discrete options with a suggested default; the CLI can walk
through choices or apply defaults in scripts. The user sees the interpretation
under --- interpretation ---
every time, not only the final result set.
How it works
A request moves through a fixed pipeline. Each stage has a contract; later stages do not paper over failures from earlier ones.
- Question and semantic + physical schema context are sent to the interpreter.
- Structured interpretation is returned; ambiguities are resolved if needed.
- SQL is generated against the same schema block.
- sqlglot validates identifiers against the live catalog.
- On validation failure, a corrective generation runs (bounded retries).
- Valid
SELECTstatements execute; rows return to the CLI or API.
Benchmarks
Lumen is evaluated against Spider, BIRD, and a custom NYC Open Data benchmark, compared to a GPT-4 baseline and Vanna AI.
| System | Spider | BIRD | NYC Open Data |
|---|---|---|---|
| GPT-4 baseline | pending | pending | pending |
| Vanna AI | pending | pending | pending |
| Lumen | pending | pending | pending |
Benchmark runs are being finalized. Numbers and full methodology will be published here and in the repo's BENCHMARKS.md.
Demo
A short walkthrough video is being recorded and will be embedded here.
Demo video coming soon.
Stack
Python, FastAPI, Angular, Postgres, DuckDB, Claude, sqlglot, Pydantic. Backed by a Pytest suite; the Angular UI uses Tailwind.