Prompt Caching Savings Calculator
๐ Runs in your browser โ nothing is sent to a serverPrompt caching savings calculator for Claude, GPT and Gemini. Feed in the size of the cacheable prefix (system prompt, RAG context, few-shot demos), the size of the per-request dynamic input, the cache hit rate you expect, and the daily request volume. The calculator returns per-request, per-day, per-month and per-year savings against the uncached baseline. Cache reads are billed at 10% of standard on Anthropic, 50% on OpenAI, 25% on Google โ this calculator picks the right rate automatically for each model. Useful when deciding whether the engineering effort of structuring a cache-friendly prompt pays off at your traffic volume.
At a 80% cache hit rate, 20.0K static tokens are billed at $0.30/M instead of $3.00/M for 80% of requests. The dynamic 500 tokens per request keep paying the standard input rate. That nets $0.0402 per request, or $6,030.00 per month at 5,000 req/day.
Provider-by-provider cheat sheet
Anthropic Claude: 90% discount on cache reads, 25% premium on cache writes, 5-minute default TTL, breakpoints declared explicitly with `cache_control: { type: "ephemeral" }` markers. OpenAI: automatic caching for prompts โฅ1,024 tokens, 50% discount on cache reads, no write premium, no manual control โ the provider hashes the prefix automatically. Google Gemini: explicit "context caching" API where you create a cached content object and reuse it across requests, 75% discount, configurable TTL from minutes to hours, useful for shared corpora rather than per-user prompts.
When the savings model breaks
Three failure modes worth knowing. First, if your dynamic content sits at the start of the prompt instead of the end, the cache never hits โ fix the prompt structure. Second, if the model context window is large but most prompts in production are short, caching cannot help: there is no shared prefix to amortise. Third, if your prompts include timestamps, request IDs or user IDs at the top of the system message (a common debugging habit), every request looks unique to the cache. Move all such fields to the end of the request before measuring caching ROI.
Examples
Static 8,000 tokens, dynamic 500 tokens, output 600 tokens, hit rate 90%Cache saves ~$0.022/request, ~$1,080/day, ~$32,400/month vs. uncached. Break-even on cache writes: after the second call.Static 30,000 tokens, dynamic 200 tokens, output 500 tokens, hit rate 70%OpenAI cache discount is 90% off โ saves ~$0.047/request, ~$33,800/month at 24K requests/day. Without the cache, same workload costs ~$36,000/mo.Static 4,000 tokens, dynamic 300 tokens, output 200 tokens, hit rate 95%Saves ~$0.0034/request. At 100K req/day โ ~$340/day, ~$10,200/month โ caching is worth it even on the cheapest tier when the prefix is reused this heavily.