FerroGate

Whitepaper notes / v0.1

Minimal operating notes for a self-hosted AI gateway.

FerroGate is a protocol-like control surface for AI traffic. The gateway keeps model access, provider routing, key policy, token budgets, telemetry, and TLS under your infrastructure boundary.

00 / ABSTRACT

Gateway Thesis

AI infrastructure should not require every application to know every provider, key format, model name, billing convention, and failure mode. FerroGate centralizes that boundary behind one OpenAI-compatible interface.

PRIMITIVE Virtual key

Client identity, tenant context, scopes, rate limits, and budget.

PRIMITIVE Logical model

Stable public model name mapped onto one or more providers.

PRIMITIVE Settlement event

Token usage, estimate fallback, request log, and billing trace.

01 / QUICK START

Run the Gateway

Start with the default Caddyfile-style config. Validate the gateway, list models, then send a chat completion through the local OpenAI-compatible endpoint.

cargo run -- run --config Ferrogate/Caddyfile
curl http://127.0.0.1:8080/healthz
curl -H 'Authorization: Bearer dev-secret' http://127.0.0.1:8080/v1/models
02 / CONFIGURATION

Declare Providers, Models, and Keys

Use Caddyfile syntax for a compact operator experience, or TOML for explicit production configuration. Validate every candidate before running or reloading.

:8080 {
    ai_gateway {
        provider openai {
            kind openai-compatible
            base_url https://api.openai.com/v1
            api_key {env.OPENAI_API_KEY}
        }

        model fast-chat -> openai:gpt-4o-mini {
            capabilities chat streaming
        }

        api_key key_dev {
            key {$FERROGATE_DEV_KEY}
            scopes models.read chat.completions admin.read
            allowed_models fast-chat
            request_limit_per_minute 60
            monthly_token_budget 1000000
        }
    }
}
cargo run -- validate --config Ferrogate/Caddyfile
cargo run -- validate --config config/ferrogate.example.toml
03 / API USAGE

Call Through the Control Plane

Applications send standard chat-completion calls. FerroGate authenticates, checks policy, reserves token budget, dispatches to the provider, and settles usage.

curl -X POST http://127.0.0.1:8080/v1/chat/completions \
  -H 'Authorization: Bearer dev-secret' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "fast-chat",
    "messages": [{"role": "user", "content": "hello"}]
  }'
Keep application code stable while moving provider routing, fallback, and budget enforcement into the gateway.
04 / OPERATIONS

Observe and Control Runtime State

  • Use /healthz for gateway readiness checks.
  • Use /admin for the built-in operator dashboard.
  • Use /metrics for Prometheus-compatible telemetry.
  • Use request IDs and trace IDs to correlate provider behavior.
  • Use manual TLS, ACME HTTP-01, or ACME DNS-01 for public listeners.
05 / DEPLOYMENT CHECKLIST

Deployment Checklist

  1. Move provider secrets into environment variables or a secret store.
  2. Generate hashed client keys with ferrogate hash-key.
  3. Define tenant scopes, model allowlists, provider allowlists, and budgets.
  4. Validate Caddyfile and TOML candidates in CI.
  5. Export metrics and logs before routing production traffic.
  6. Publish the static whitepaper site from ferrogate-homepage.