Embeddings Cost Calculator — Plan Your Vector Index Bill
🔒 Runs in your browser — nothing is sent to a serverEmbeddings cost calculator for OpenAI text-embedding-3-small/large, Anthropic-recommended Voyage 3, Google Gemini Embedding and Mistral Embed. Type the number of documents (or chunks) and the average size in tokens, words or characters, and the calculator shows the total token volume, the one-time cost to build the index, and the recurring monthly cost when a share of the corpus is re-embedded. Picks the Batch tier rate when you check the box. Useful for sizing a RAG project before you commit to a vector store contract, or comparing whether the 3-small / 3-large quality jump is worth the 6.5× price.
0.1 = 10% of corpus is updated and re-embedded each month
fp32 is the worst case; production stores use fp16, int8 or product quantization to cut storage 2–32×. A dedicated Vector Storage Calculator is on the MVP-2 roadmap.
Picking the right embedding model
Three criteria dominate the choice: quality (MTEB benchmark score on the closest task type to yours), price per million tokens, and dimensions (storage and search compute scale linearly with this). For most teams, OpenAI text-embedding-3-small is the right default — cheap enough to be irrelevant in the budget, 1536 dimensions storeable in any vector DB, and within 1–3% of best-in-class on retrieval benchmarks. Reach for text-embedding-3-large or Voyage 3 Large only when retrieval quality is mission-critical and you have an evaluation pipeline that can prove a lift. Gemini Embedding wins on Google Cloud setups; Mistral Embed wins on European-data-residency stories.
Where embedding cost actually shows up
Three line items: (1) one-time index build — usually a single-digit-dollar bill for a small corpus, a few hundred for millions of chunks; (2) periodic re-embeds — typically 5–15% of the index per month, scales linearly; (3) per-query embedding — every search call embeds the user query once, free at small volume but adds up to a few dollars per million queries on text-embedding-3-small. Total embedding spend in production RAG is almost always under 5% of the total LLM bill, which is why most teams safely ignore it. The real saving comes from chunking smarter, not embedding cheaper.
Examples
100,000 chunks × 500 tokensTotal 50M tokens. One-time cost: $1.00 standard, $0.50 batch. Re-embedding 10%/month: $0.10/mo.100,000 chunks × 500 tokensOne-time cost: $6.50 standard, $3.25 batch. 3072-dim vectors at fp32 = 1.2 GB raw vs 600 MB for 3-small.1,000,000 chunks × 400 tokensTotal 400M tokens. One-time cost: $72 (no batch tier yet). Re-embedding 5%/month: $3.60/mo.