Provider routing without client rewrites.
Applications call one stable endpoint. FerroGate maps logical models to OpenAI, Anthropic, Gemini, Grok, Azure, or any OpenAI-compatible backend.
- Logical model names stay stable while provider contracts change.
- Fallback can be priority-based or weighted across providers.
- Streaming and non-streaming chat completions share one gateway path.
Virtual keys become the enforcement layer.
FerroGate turns client keys into scoped identities. Tenants, models, providers, rate limits, and monthly token budgets are checked before provider dispatch.
- Scopes separate model listing, chat completions, and admin access.
- Allow and deny rules constrain models and providers per key.
- Token reservations prevent budget overspend during requests.
Every request leaves an operator trail.
Request logs, usage aggregates, billing events, Prometheus metrics, and OTLP exports let teams operate AI traffic like infrastructure, not application glue.
- Request IDs and trace IDs cross gateway and provider boundaries.
- Usage settlement uses provider-reported tokens or gateway estimates.
- Admin endpoints expose status, models, providers, logs, and metrics.
Run the gateway inside your boundary.
Start locally with a Caddyfile-style config, validate every candidate, then deploy with hashed keys, TLS, metrics, and reload discipline.