Cost governance and savings
Gateway gives you a single place to manage AI costs across every provider. Set budgets at the project level, track spend in a unified dashboard, and use intelligent routing and context compression to automatically reduce costs — without changing application code.
Unified billing
Gateway aggregates spend from all providers into one dashboard. At a glance you can see:
- Total spend across all providers and models
- Managed spend vs BYOK spend (bring-your-own-key) tracked separately
- API call count with breakdowns
- Breakdowns by model, by project, and by tag via dashboard tabs
Use Projects to segment spend by team, product, or environment. Use tags for finer-grained tracking within a project.
Project budgets
Budgets are configured per project. Each budget has three components:
- Amount — dollar limit in USD
- Period — daily, weekly, monthly, quarterly, or yearly
- Enforcement mode — what happens at the limit
Alert thresholds
Configurable percentages (default 50%, 80%, 90%). Each triggers a notification when crossed.
How budget enforcement works
Gateway sums daily spend for the current budget period and compares it to the budget amount. When a hard limit is reached, all requests to that project return HTTP 402 until the next period starts or the budget is increased. Soft limits send alerts but never block requests.
Budget progress in the dashboard
The “By Project” tab shows color-coded progress bars for each project: blue (< 80%), yellow (80–100%), red (100%+). Spend is displayed as $X.XX / $Y.YY.
How routing policies save money
Routing directs traffic to lower-cost providers and models without compromising quality. The savings depend on the strategy:
- Cost Optimized (Intelligent) — ML complexity scoring routes ~70% of traffic to cheaper models. Expected savings: 40–60%. See Intelligent Routing for details.
- Balanced (Intelligent) — even cost/quality split. Expected savings: 20–35%. See Intelligent Routing for details.
- Lowest Cost (Performance) — always picks the cheapest provider for the requested model. See Performance for details.
- Tag-based routing — route different request types to different cost tiers. For example, internal requests → cost-optimized policy, customer-facing requests → quality-first policy. See Routing Policies for configuration.
Routing policies can be set at the org level or overridden per project.
How context compression saves money
Compression reduces tokens sent to providers, directly lowering per-request costs. Gateway applies two techniques:
- Lossless compression — minifies JSON in tool schemas, arguments, and results. Achieves 30–60% reduction on JSON-heavy requests with zero quality impact.
- Message trimming — removes middle messages from long conversations, preserving system messages and the most recent messages.
Two modes are available for cost savings:
- Cost Optimization — proactively compresses at a target ratio (default 70%). Reduces costs even when the request fits within the context window.
- Context Window Only — compresses only to prevent context window errors. Acts as a safety net, not proactive savings.
Combine Cost Optimization compression with Cost Optimized routing for maximum savings. The two features are independent and stack. See Context Compression for configuration.
Org-level spend controls
Guardrails above project budgets protect your organization:
- Free tier default — $15 spend cap for new organizations
- Spend caps — configurable org-level caps for Pro/Enterprise tiers
- Velocity alerts — detects unusual spend spikes (10x daily average)
- Payment failure handling — dunning process with notifications and eventual access restriction
Org-level controls are managed from organization billing settings, separate from project-level budgets.
FAQ
How is spend calculated?
Spend is tracked per request based on provider per-token pricing. Managed and BYOK spend are tracked separately. Totals are updated atomically after each request completes.
Can I set a budget without a project?
No. Budgets are per-project. Create a project first, then configure its budget.
What happens when a hard limit is reached?
All requests to that project return HTTP 402 until the next budget period starts or the budget amount is increased.
Do routing policies and compression interact?
They are independent. Compression reduces the number of tokens in a request; routing picks which provider handles it. Using both compounds the savings.
How do I see savings from routing or compression?
The spend dashboard shows actual spend. Compare to single-provider list pricing to estimate savings. Intelligent routing strategies include expected savings estimates in the dashboard.