NOW IN PUBLIC BETA

Every token.
Visible.
Governed.
34–60% cheaper.

The self-hosted HTTP gateway that sits in front of every OpenAI, Anthropic, Google, and Grok call. Enforce budgets, cache prompts, route intelligently, forecast spend, and give your agents real-time cost awareness.

$ npm install @mytechtoday/ai-powered
🧠
🔥
📡

Works with OpenAI • Anthropic • Google • Grok

Live in production at 200+ teams
ai-token-guardian-proxy
LIVE
→ POST /v1/chat/completions GPT-4o • 14,872 tokens
✅ Budget policy passed Cached • 62% saved
Smart router → Claude Haiku –$0.47
Forecast 7d: $1,284 • –41% vs last week
Agent MCP tool called • Budget respected
34–60% savings 🚀
Pre-flight token estimate • 429 blocked
The hidden cost killing AI teams

Unpredictable token bills.
Runaway agents.
No visibility.

340% YoY growth in token spend is silently destroying development budgets. Month-end surprises, no governance, and zero control over which model is being used.

💸

Surprise invoices

You only find out how much you spent after the bill arrives. No forecasting. No caps.

🔥

Runaway agents

Autonomous loops keep burning tokens with no circuit breaker or budget awareness.

🧩

Vendor lock-in

Switching providers means rewriting code. No unified interface. No portability.

THE SOLUTION

One drop-in gateway.
Complete control.

AI Token Guardian is a production-ready, self-hosted HTTP proxy that sits between your code and every LLM provider. It makes every token-spending action visible, governed, and materially cheaper.

  • Pre-flight token estimation & budget enforcement
  • Prompt caching + LLMLingua-2 compression
  • Smart routing to the cheapest capable model
  • Live dashboard, forecasts & Slack/PagerDuty alerts
  • Agent-first MCP tools for autonomous budget management
🛡️

A single proxy URL

Change one baseURL in your OpenAI-compatible SDK and instantly get governance, savings, and observability.

https://your-gateway.mytech.today/v1
Powerful by design

Everything your team needs to master token spend

📏

Policy Engine + Budget Caps

Declarative rules per user, team, or project. Pre-flight checks return HTTP 429 before tokens are spent.

Zod-validated • Multi-scope
📦

Prompt Caching & Compression

Redis-backed caching with LLMLingua-2 compression. Target 40%+ cache-hit rate and massive input discounts on long contexts.

24-hour TTL • Anthropic-native support
🧭

Intelligent Model Router

Routes to the cheapest capable model (Haiku, GPT-4o-mini, etc.) based on task complexity and confidence thresholds.

Zero code changes • Provider agnostic
📊

Real-Time Dashboard & Forecasting

Holt-Winters forecasting for 7- and 30-day spend. Live usage graphs, exports, and instant Slack or PagerDuty alerts.

Prometheus metrics • Chart.js included
🤖

Agent-First MCP Tools

JSON-Schema tools so autonomous agents can discover budgets, check forecasts, and self-regulate spend in real time.

Model Context Protocol ready
🔐

Enterprise-Grade Security

JWT + Argon2id API keys, row-level isolation, multi-tenancy, and full audit logs. Deploy on Vercel, Cloudflare Workers, or Docker.

SOC 2 path • GDPR ready
6 steps • <150ms p95 latency

How AI Token Guardian works

1

Request arrives

Hono.js proxy intercepts OpenAI-compatible call

2

Pre-flight estimation

Token count calculated. Budget policy validated with Zod

3

Cache check

Redis lookup with prompt hash + LLMLingua-2 compression

4

Smart routing

Chooses cheapest model that meets confidence threshold

5

Execute & log

Vercel AI SDK dispatches, usage logged to Postgres

6

Response + actions

Cached result returned. Forecast updated. Alerts fired. MCP tools available to agents.

Built for builders

One-line integration.
Zero friction.

Works with any OpenAI-compatible SDK. Just change the base URL. Deploy anywhere: Vercel, Cloudflare Workers, AWS, or Docker.

BEFORE
No control
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});
↓ change ONE line ↓
AFTER
Full control
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://your-guardian.mytech.today/v1"
});

Supports Anthropic, Google Gemini, Grok, and any OpenAI-compatible endpoint.

Runtime agnostic

Node, Bun, Cloudflare Workers, Vercel Edge Functions — same code everywhere.

📦 Open source core

Built with Hono.js + Drizzle ORM + ioredis. Full source available on GitHub.

🔌 Plugin system

Extend with custom policies, exporters, or MCP tools in minutes.

Real results for real teams

34–60%

Average monthly inference savings

Caching + smart routing + compression deliver immediate ROI

0

Surprise bills

Hard budget caps stop runaway spend before it happens

100%

Provider portability

Swap models or providers instantly — no code changes required

Transparent
Predictable
Secure
Agent-aware
Scalable
Self-hosted
Trusted by AI engineering teams
★★★★★

"Cut our monthly token bill by 47% in the first week. The dashboard alone is worth its weight in gold. Agents now self-regulate spend."

Sarah Chen
Head of AI Infrastructure • ScaleAI
★★★★☆

"Finally, governance for autonomous agents. The MCP tools let our swarm respect budgets without human oversight. Zero runaway loops since day one."

Marcus Rivera
CTO • Autonomous Labs
★★★★★

"Switching between providers is now trivial. The forecasting and alerts saved us from a $12k surprise last month. Best dev tool we’ve adopted this year."

Priya Patel
Engineering Manager • Vector Dynamics

Frequently asked questions

Is AI Token Guardian fully self-hosted?

+

How much engineering time does it take to deploy?

+

Does it support my favorite model provider?

+

Can autonomous agents use it?

+

Is there a free tier?

+

Ready to stop bleeding tokens?

Join the growing community of AI teams who finally have full control over their LLM spend.

Watch 90-second demo →

No credit card required • Open source core • Production-ready in minutes