Lumen

Natural-language analytics that runs against real warehouse schemas, not toy databases.

The problem

Most NL-to-SQL demos look fine on a five-table sample. Production warehouses do not cooperate. Table names repeat across schemas. Join paths that worked in a tutorial fail when the catalog has fifty tables and the model invents Customer.Email because it sounds right. The failure mode is not always a loud syntax error — sometimes the query runs and returns the wrong slice of data.

Ambiguity shows up the same way. "Top customers in California" could mean revenue, order count, or recency. A system that picks one interpretation silently and executes is dangerous. The gap between a slick chat UI and something you would attach to a real catalog is mostly engineering around the model: validation, semantics, and surfacing what was assumed.

The approach

Semantic layer

Business entities, metrics, and relationships live in YAML, declared separately from physical foreign keys. The generator reasons about revenue by region against named metrics, not against whatever join the model guessed from column names alone. The semantic directory is validated against the live warehouse before a question is answered.

Schema-aware validation

Generated SQL is parsed with sqlglot and checked against the introspected catalog. Unknown tables and columns are collected into a structured issue list — not a single database error at execution time. When validation fails, the model gets a correction prompt with the rejected SQL and available column names. Up to two retry rounds after the first generation. Read-only: INSERT, UPDATE, and DDL are rejected before they reach the warehouse.

Explain-back

Before SQL is drafted, an interpreter returns structured JSON: a one-line restatement, entities and metrics in play, filters, sort, limit. Underspecified questions produce discrete options with a suggested default; the CLI can walk through choices or apply defaults in scripts. The user sees the interpretation under --- interpretation --- every time, not only the final result set.

How it works

A request moves through a fixed pipeline. Each stage has a contract; later stages do not paper over failures from earlier ones.

Question and semantic + physical schema context are sent to the interpreter.
Structured interpretation is returned; ambiguities are resolved if needed.
SQL is generated against the same schema block.
sqlglot validates identifiers against the live catalog.
On validation failure, a corrective generation runs (bounded retries).
Valid SELECT statements execute; rows return to the CLI or API.

Benchmarks

Lumen is evaluated against Spider, BIRD, and a custom NYC Open Data benchmark, compared to a GPT-4 baseline and Vanna AI.

Benchmark results for Lumen versus baselines; values not yet published
System	Spider	BIRD	NYC Open Data
GPT-4 baseline	pending	pending	pending
Vanna AI	pending	pending	pending
Lumen	pending	pending	pending

Benchmark runs are being finalized. Numbers and full methodology will be published here and in the repo's BENCHMARKS.md.

Demo

A short walkthrough video is being recorded and will be embedded here.

Demo video coming soon.

Stack

Python, FastAPI, Angular, Postgres, DuckDB, Claude, sqlglot, Pydantic. Backed by a Pytest suite; the Angular UI uses Tailwind.