The self-hosted HTTP gateway that sits in front of every OpenAI, Anthropic, Google, and Grok call. Enforce budgets, cache prompts, route intelligently, forecast spend, and give your agents real-time cost awareness.
Works with OpenAI • Anthropic • Google • Grok
340% YoY growth in token spend is silently destroying development budgets. Month-end surprises, no governance, and zero control over which model is being used.
You only find out how much you spent after the bill arrives. No forecasting. No caps.
Autonomous loops keep burning tokens with no circuit breaker or budget awareness.
Switching providers means rewriting code. No unified interface. No portability.
AI Token Guardian is a production-ready, self-hosted HTTP proxy that sits between your code and every LLM provider. It makes every token-spending action visible, governed, and materially cheaper.
Change one baseURL in your OpenAI-compatible SDK and instantly get governance, savings, and observability.
Declarative rules per user, team, or project. Pre-flight checks return HTTP 429 before tokens are spent.
Redis-backed caching with LLMLingua-2 compression. Target 40%+ cache-hit rate and massive input discounts on long contexts.
Routes to the cheapest capable model (Haiku, GPT-4o-mini, etc.) based on task complexity and confidence thresholds.
Holt-Winters forecasting for 7- and 30-day spend. Live usage graphs, exports, and instant Slack or PagerDuty alerts.
JSON-Schema tools so autonomous agents can discover budgets, check forecasts, and self-regulate spend in real time.
JWT + Argon2id API keys, row-level isolation, multi-tenancy, and full audit logs. Deploy on Vercel, Cloudflare Workers, or Docker.
Hono.js proxy intercepts OpenAI-compatible call
Token count calculated. Budget policy validated with Zod
Redis lookup with prompt hash + LLMLingua-2 compression
Chooses cheapest model that meets confidence threshold
Vercel AI SDK dispatches, usage logged to Postgres
Cached result returned. Forecast updated. Alerts fired. MCP tools available to agents.
Works with any OpenAI-compatible SDK. Just change the base URL. Deploy anywhere: Vercel, Cloudflare Workers, AWS, or Docker.
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
});
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://your-guardian.mytech.today/v1"
});
Supports Anthropic, Google Gemini, Grok, and any OpenAI-compatible endpoint.
Node, Bun, Cloudflare Workers, Vercel Edge Functions — same code everywhere.
Built with Hono.js + Drizzle ORM + ioredis. Full source available on GitHub.
Extend with custom policies, exporters, or MCP tools in minutes.
Average monthly inference savings
Caching + smart routing + compression deliver immediate ROI
Surprise bills
Hard budget caps stop runaway spend before it happens
Provider portability
Swap models or providers instantly — no code changes required
"Cut our monthly token bill by 47% in the first week. The dashboard alone is worth its weight in gold. Agents now self-regulate spend."
"Finally, governance for autonomous agents. The MCP tools let our swarm respect budgets without human oversight. Zero runaway loops since day one."
"Switching between providers is now trivial. The forecasting and alerts saved us from a $12k surprise last month. Best dev tool we’ve adopted this year."
Is AI Token Guardian fully self-hosted?
+How much engineering time does it take to deploy?
+Does it support my favorite model provider?
+Can autonomous agents use it?
+Is there a free tier?
+Join the growing community of AI teams who finally have full control over their LLM spend.
No credit card required • Open source core • Production-ready in minutes