LLM Cost Calculator — Compare API Pricing Across Providers
🔒 Runs in your browser — nothing is sent to a serverLLM cost calculator that compares OpenAI GPT-5.5, Anthropic Claude Opus 4.7, Google Gemini 2.5 Pro, Meta Llama 3.3 and Mistral Large side-by-side for the exact token budget you plan to use. Type input and output tokens per request, pick any subset of models, and the table updates with per-request, daily and monthly cost — sorted from cheapest to priciest. Toggle on cached-input share or the asynchronous Batch tier to model how your bill changes with caching or non-urgent jobs. Everything runs in your browser; no API keys, no signup, nothing logged.
| Model | Input | Output | Per request | Per day | Per month | vs cheapest |
|---|---|---|---|---|---|---|
Gemini 2.5 Flash Google · $0.3/M in · $2.5/M out | $0.000600 | $0.001250 | $0.001850 | $1.85 | $55.50 | cheapest |
GPT-5.4 Mini OpenAI · $0.75/M in · $4.5/M out | $0.001500 | $0.002250 | $0.003750 | $3.75 | $112.50 | 2.03× |
Claude Haiku 4.5 Anthropic · $1/M in · $5/M out | $0.002000 | $0.002500 | $0.004500 | $4.50 | $135.00 | 2.43× |
Claude Sonnet 4.6 Anthropic · $3/M in · $15/M out | $0.006000 | $0.007500 | $0.0135 | $13.50 | $405.00 | 7.30× |
GPT-5.5 OpenAI · $5/M in · $30/M out | $0.0100 | $0.0150 | $0.0250 | $25.00 | $750.00 | 13.51× |
Pick the right model for the right workload
There is no universally cheapest LLM. For high-volume customer-support style replies (short input, short output, accuracy-tolerant), Gemini Flash-Lite or Claude Haiku 4.5 typically win. For RAG over moderate context (10–50k input, ~1k output), the long-context discounts on Gemini 2.5 Pro and Claude Sonnet 4.6 often beat the OpenAI flagship. For agentic tool use with structured outputs, GPT-5.5 and Claude Opus 4.7 trade leads depending on the benchmark. Self-hosted Llama 3.3 70B on Together or Anyscale is often 3–10× cheaper at the same workload but with infrastructure overhead. Run this calculator with your real traffic numbers before locking a contract.
Where the savings actually come from
Three levers dominate any LLM bill: (1) shortening output — tightening the system prompt to demand brevity drops cost more than swapping models; (2) caching static prefixes — a long system prompt re-used across 1M requests pays for caching after the second hit; (3) routing easy queries to a cheaper model — a small classifier deciding between Haiku-tier and Opus-tier captures a 10× spread without quality loss on the easy 80%. Model swapping alone is at best a 2–3× win on a homogenous workload; combined with these three, real production deployments see 5–20× cost reduction without quality drop.
Examples
Compare Claude Haiku 4.5, Gemini Flash, GPT-5.4 MiniGemini Flash ≈ $0.0019/req → $570/mo. Claude Haiku 4.5 ≈ $0.0045/req → $1,350/mo. GPT-5.4 Mini ≈ $0.0037/req → $1,110/mo.Compare GPT-5.5, Claude Sonnet 4.6, Gemini 2.5 ProGemini 2.5 Pro ≈ $0.071/req → $213/mo. Claude Sonnet 4.6 ≈ $0.162/req → $486/mo. GPT-5.5 ≈ $0.274/req → $822/mo.Compare GPT-5.4, Mistral Medium, Claude Haiku 4.5At Batch (50% off), Mistral Medium ≈ $0.0018/req → $270/mo. Claude Haiku 4.5 Batch ≈ $0.0049/req → $735/mo.