The router intercepts every AI request and decides: run locally (free) or send to frontier (paid). Here's exactly how it makes that decision.
The router runs three checks in order. First match wins.
Signal matching → Frontier
Scans for high-complexity keywords: architect, design system, security audit, trade-off, compare and contrast, multi-tenant, compliance. If any match → route to frontier.
Signal matching → Local
Scans for routine coding keywords: write a function, fix this, create a, implement, write sql, write a test, dockerfile, bash script. If any match → route to local Brewmode model.
Length fallback
If no keyword match: prompts under 500 chars → local (most short prompts are routine). Prompts over 500 chars → frontier (longer prompts usually need deeper reasoning). Over 1,000 chars always goes frontier regardless of step 1-2.
Your app → POST /api/router { prompt, max_tokens }
↓
classifyComplexity(prompt)
├─ frontier signals? → Claude Sonnet 4 via OpenRouter
├─ local signals? → Brewmode Qwen3-8B on Modal (free)
└─ length fallback → <500 chars local, >500 chars frontier
↓
Response: { text, route, reason, model, time_ms, tokens, cost }
Pass model: "local" or model: "frontier" to bypass the classifier and force a specific route. Default is model: "auto".
$10,000/mo
10 devs × 100% frontier at $0.015-0.06/1K tokens
$1,000-2,000/mo
80-90% handled by Brewmode at $0/token. Only 10-20% hits paid API.
Run the demo to see routing decisions in real time.
Requests
0
Routed Local
0%
Frontier Calls
0
Cost Saved
$0.0000
Annual Savings (10 devs)
$0