Intelligent routing

Intelligent routing analyzes each request and automatically routes it to the most appropriate model based on prompt complexity. Gateway embeds the prompt, scores its complexity from 0 (simple) to 1 (complex), and maps the score to a model tier based on your chosen strategy. This adds ~1-4ms of latency — negligible compared to LLM inference time.

Capability tiers

Models are automatically classified into five tiers based on output token cost:

Tier	Output Cost (per 1M tokens)	Example Models
Frontier	>= $5.00	Claude Opus 4, GPT-4.5
Advanced	$2.00 -$ 5.00	Claude Sonnet 4, GPT-4 Turbo
Standard	$1.50 -$ 2.00	Claude 3.5 Sonnet, GPT-4o
Efficient	$0.10 -$ 1.50	Claude Haiku 3.5, GPT-4o-mini
Basic	< $0.10	Older / small models

You don’t need to manually classify models — Gateway infers tiers from provider pricing data. Any model works, including new releases.

Cost optimized

Maximizes cost savings while maintaining quality for complex tasks. The complexity threshold is set low, so ~70% of traffic routes to cheaper models.

Best for: Customer support chatbots, general-purpose assistants, mixed-complexity workloads. Expected savings: 40-60%.

1 {
2   "name": "Cost Optimized Chat",
3   "default_strategy": {
4     "type": "intelligent",
5     "axis": "cost",
6     "providers": [
7       { "provider": "openai", "model": "gpt-5-mini" },
8       { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9       { "provider": "openai", "model": "gpt-5.2" }
10     ]
11   }
12 }

Balanced

Equal consideration of cost and quality. Complexity scores map linearly to model tiers — roughly a 50/50 split between cheaper and more capable models.

Best for: General production workloads where quality and cost are equally important. Expected savings: 20-35%.

1 {
2   "name": "Balanced Production",
3   "default_strategy": {
4     "type": "intelligent",
5     "axis": "performance",
6     "providers": [
7       { "provider": "openai", "model": "gpt-5-mini" },
8       { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9       { "provider": "openai", "model": "gpt-5.2" }
10     ]
11   }
12 }

Quality first

Prioritizes response quality with cost savings as secondary. Most traffic goes to capable models — only clearly simple prompts route to cheaper tiers.

Best for: Enterprise applications, professional/technical use cases, domains where output quality is critical. Expected savings: 10-20%.

1 {
2   "name": "Enterprise Quality",
3   "default_strategy": {
4     "type": "intelligent",
5     "axis": "intelligence",
6     "providers": [
7       { "provider": "openai", "model": "gpt-5-mini" },
8       { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9       { "provider": "openai", "model": "gpt-5.2" }
10     ]
11   }
12 }

If the complexity scorer fails for any reason, Gateway falls back to the most capable model in your policy — quality is never compromised by a scorer failure.