Whitepaper v0.1 / Self-hosted AI gateway

Route every model through one control plane.

FerroGate is a black-box boundary for AI traffic: one OpenAI-compatible surface in front of providers, keys, policy, budgets, metrics, and TLS.

CONTROL
one surface many providers
01 / ROUTING

Provider routing without client rewrites.

Applications call one stable endpoint. FerroGate maps logical models to OpenAI, Anthropic, Gemini, Grok, Azure, or any OpenAI-compatible backend.

  • Logical model names stay stable while provider contracts change.
  • Fallback can be priority-based or weighted across providers.
  • Streaming and non-streaming chat completions share one gateway path.
MODEL P01 P02 P03
logical model fallback graph
02 / POLICY

Virtual keys become the enforcement layer.

FerroGate turns client keys into scoped identities. Tenants, models, providers, rate limits, and monthly token budgets are checked before provider dispatch.

  • Scopes separate model listing, chat completions, and admin access.
  • Allow and deny rules constrain models and providers per key.
  • Token reservations prevent budget overspend during requests.
KEY SCOPE MODEL RATE BUDGET
identity budget gate
03 / SIGNALS

Every request leaves an operator trail.

Request logs, usage aggregates, billing events, Prometheus metrics, and OTLP exports let teams operate AI traffic like infrastructure, not application glue.

  • Request IDs and trace IDs cross gateway and provider boundaries.
  • Usage settlement uses provider-reported tokens or gateway estimates.
  • Admin endpoints expose status, models, providers, logs, and metrics.
TRACE / USAGE / BILLING
trace ids settlement
04 / DEPLOY

Run the gateway inside your boundary.

Start locally with a Caddyfile-style config, validate every candidate, then deploy with hashed keys, TLS, metrics, and reload discipline.

SELF-HOSTED EDGE
validated config tls boundary