RAG Chunk Estimator — Plan Retrieval-Augmented Generation Costs
🔒 Runs in your browser — nothing is sent to a serverRAG chunk estimator that walks a retrieval-augmented generation pipeline end-to-end: source document count and size, chunk size with overlap, embedding model, query volume, top-K retrieval, LLM answer model. It returns the total number of chunks, total tokens to embed, the one-time index-build bill, and the per-query cost broken down into query embedding plus LLM call. A context-fit check verifies that system prompt plus top-K chunks plus answer reservation actually fits the chosen LLM's context window — overflow is the most common bug when scaling retrieval. Useful for sizing a RAG project before signing for a vector store, picking a chunk size that balances retrieval quality and LLM input cost, and estimating monthly burn under realistic traffic.
Pricing snapshot: 2026-05-16
How chunk size moves the cost curve
Two regimes. Small chunks (200–500 tokens): high chunk count, high index build cost, but small per-query LLM input. Best when retrieval precision matters and you can afford to re-rank. Large chunks (1500–5000 tokens): few chunks, cheap index, but each query pulls in a lot of LLM context — making the LLM call expensive and risking context overflow. Best when documents are coherent and the answer must reference whole sections. Most production systems land at 500–1000 tokens with 50–100 overlap, top-K 4–6. Use the calculator to verify the choice fits your context budget on the answer model.
The hidden cost: query growth
The biggest source of RAG bill surprises is query growth, not corpus growth. Doubling the corpus doubles the one-time index cost — a few dollars to a few hundred, paid once. Doubling daily queries doubles the LLM bill — which dominates the budget. When sizing a RAG project, project query volume out 12–24 months and pick LLM and chunking strategy for that scale, not the launch traffic. The cost-per-query line on this calculator times your projected QPS is the number that drives the architecture, not the chunk count.
Examples
1,000 docs × 10 pages × 6,670 tokens, chunk 500, overlap 50, top-K 5~14,800 chunks. Index build ~$0.15 on text-embedding-3-small. Per query: ~$0.012 LLM + ~$0.0000008 embed → $36/month at 100 queries/day.500 docs × 5 pages, chunk 5,000, overlap 500, top-K 3~370 chunks. Heavy index but cheap one-time. Per query injects 15K input tokens → ~$0.020/query on Gemini 2.5 Pro.100,000 files × 200 tokens, chunk 256, overlap 64, top-K 8~104K chunks. Index build ~$0.53 on text-embedding-3-small. 800K embeddings × 6 KB ≈ 5 GB raw fp32 — quantize before storing.