Whitepaper v0.1 / Self-hosted AI gateway

Route every model through one control plane.

FerroGate is a black-box boundary for AI traffic: one OpenAI-compatible surface in front of providers, keys, policy, budgets, metrics, and TLS.

Read Docs GitHub

one surface many providers

01 / ROUTING

Provider routing without client rewrites.

Applications call one stable endpoint. FerroGate maps logical models to OpenAI, Anthropic, Gemini, Grok, Azure, or any OpenAI-compatible backend.

Logical model names stay stable while provider contracts change.
Fallback can be priority-based or weighted across providers.
Streaming and non-streaming chat completions share one gateway path.

logical model fallback graph

02 / POLICY

Virtual keys become the enforcement layer.

FerroGate turns client keys into scoped identities. Tenants, models, providers, rate limits, and monthly token budgets are checked before provider dispatch.

Scopes separate model listing, chat completions, and admin access.
Allow and deny rules constrain models and providers per key.
Token reservations prevent budget overspend during requests.

identity budget gate

03 / SIGNALS

Every request leaves an operator trail.

Request logs, usage aggregates, billing events, Prometheus metrics, and OTLP exports let teams operate AI traffic like infrastructure, not application glue.

Request IDs and trace IDs cross gateway and provider boundaries.
Usage settlement uses provider-reported tokens or gateway estimates.
Admin endpoints expose status, models, providers, logs, and metrics.

trace ids settlement

04 / DEPLOY

Run the gateway inside your boundary.

Start locally with a Caddyfile-style config, validate every candidate, then deploy with hashed keys, TLS, metrics, and reload discipline.

Quick Start Deployment Checklist

validated config tls boundary