LLM Token Counter — Count Tokens for Any Model

🔒 Runs in your browser — nothing is sent to a server

LLM token counter for any modern model — paste a prompt, document or transcript and see the token count for OpenAI GPT, Anthropic Claude, Google Gemini, Meta Llama and Mistral side-by-side, plus the live input + output cost for the selected model. Pick a model, type in expected output length, and the page updates instantly without sending a single character to a server. Useful for budgeting an API call, checking whether a long prompt will fit a context window, or comparing how the same text tokenises across providers before switching stacks.

Pricing snapshot: May 16, 2026

0 chars · 0 chars (no spaces) · 0 words · 0 lines

0

Tokenizer family: gpt · approximate ±10%

Context window usage0.00%
0 used1.00M max
Input (0 tokens)$0.00
Output tokens
$0.0150
Total per request$0.0150

Pricing snapshot: 2026-05-16. Standard tier; not including cached or batch discounts.

Same input text counted under each tokenizer family. Useful when switching providers — a prompt that fits GPT-4o may overflow Llama, or cost less under Gemini.

OpenAI (GPT)
0
Anthropic (Claude)
0
Google (Gemini)
0
Meta (Llama)
0
Mistral
0
~35
Tweet / SMS
~500
1-page document
~2,000
10-min transcript
~135,000
Novel (100k words)

How to read the token counter

The top panel shows the count for the model you selected — this is the number you should plug into cost estimates. The cross-tokenizer panel shows the same text under every other family, so you can quickly see if migrating off OpenAI would change your bill. The cost panel multiplies your input tokens by the model's standard input rate, your estimated output tokens by the standard output rate, and sums them. Cached and batch discounts are not applied here — those have dedicated calculators in the AI Tools section.

When the heuristic breaks down

The MVP-1 counter assumes English-heavy prose. Three cases produce noticeably wrong numbers: highly compressed JSON or minified code (the counter under-estimates by 10–20%); CJK scripts like Chinese, Japanese or Korean (each character is 1.5–2 tokens, not 1/4); and prompts with many escape sequences or Unicode emoji combining characters (under-estimate by up to 30%). For exact OpenAI counts the planned MVP-2 release ships the real `gpt-tokenizer` BPE encoder as a lazy-loaded chunk on this page only.

Examples

Input
You are a helpful assistant. Answer in one sentence. What is the capital of France?
Output
~22 tokens (GPT), ~23 tokens (Claude). At GPT-5.5 standard rate $5/M input this is ≈ $0.00011 per request.
Short system prompt — fits any model
Input
~3,000 characters of Markdown (about 500 words)
Output
~750 tokens across all major tokenizers. GPT-5.5 input cost: $0.00375. Claude Sonnet 4.6 input cost: $0.00225.
A 1-page README — typical chat input
Input
~200,000 characters from chunked documents + a short user question
Output
Fills 5% of Gemini 2.5 Pro 1M context, 25% of Claude Haiku 4.5 200K context, overflows any 32K model.
Heavy RAG prompt — 50,000 tokens of retrieved context

FAQ

What is a token in an LLM?

A token is the smallest unit a language model reads or writes. Modern LLMs use byte-pair encoding (BPE) or SentencePiece, which split text into pieces between a single character and a whole word. In English, a token averages about four characters or three quarters of a word — so 1,000 tokens is roughly 750 words or a five-paragraph email. The exact count depends on the tokenizer; OpenAI, Anthropic, Google and Meta all ship slightly different vocabularies.

Why does the same text produce different token counts for GPT, Claude and Gemini?

Each provider trains its own tokenizer on a different corpus and merges different sub-words. A name like "free-converter" splits into 4 tokens under GPT-4o's o200k vocabulary but 3 under Llama 3's tiktoken-based variant. Code, JSON and non-English text amplify the divergence — a Python snippet can be 30% more expensive on one tokenizer than another. Always count under the tokenizer of the model you actually plan to call.

How accurate is this LLM token counter?

Within ~5–10% of the provider's real billing for English text. The MVP uses a chars-per-token heuristic calibrated per tokenizer family (~4 chars for GPT/Gemini/Mistral, ~3.8 for Claude, ~3.5 for Llama). Real BPE tokenizers such as `gpt-tokenizer` ship in a follow-up release for exact OpenAI counts. Anthropic, Google and Meta do not publish official JS tokenizers, so approximation is the most honest answer for those providers.

Does the token count include the system prompt or chat history?

Only the text you paste is counted. In a real chat completion request, your billed input is the sum of the system prompt, all prior conversation turns, the new user message, plus 3–5 control tokens of formatting overhead per message. Paste each component into the counter and add up the numbers — or use the Context Window Fit Checker on this site, which has separate slots for system, history and prompt.

How are output tokens counted before I make the call?

You can't know exactly — the model decides how long to write. Use empirical defaults: a one-line answer is ~30 tokens, a paragraph ~150, a thorough explanation 500–1,000, structured JSON with 10 fields ~250. The counter accepts an estimated output length so you can see the worst case. For reasoning models like GPT-5.5 Pro or o-series, also add hidden "thinking" tokens, which can be several thousand for hard problems.

Why is my code or non-English text more expensive than I expected?

BPE tokenizers are trained mostly on English prose, so they split code, URLs and non-Latin scripts into more tokens per character. Chinese characters often map to 1.5–2 tokens each; Cyrillic Russian text is roughly 2–3× more tokens per character than English; whitespace-heavy formatted JSON is denser than minified JSON. A 1,000-character JSON blob and a 1,000-character paragraph rarely cost the same.

Is my pasted text sent to a server?

No. The token counter runs entirely in your browser tab. Nothing is uploaded, logged or cached on the server. Safe to paste API responses with secrets, internal prompts, JWT payloads or proprietary documents — they never leave your machine. The page is static HTML plus client-side JavaScript; there is no backend that could see your input.

Glossary

Token

A token is the atomic unit a language model reads or generates. Tokens are produced by a tokenizer that splits text into sub-word pieces. In English the average token is about four characters; in code, JSON or non-Latin scripts a single character can take multiple tokens. Billing for LLM APIs is always per-token.

BPE (Byte-Pair Encoding)

Byte-pair encoding is the tokenization algorithm used by OpenAI GPT, Anthropic Claude and Meta Llama. BPE starts from individual bytes and iteratively merges the most frequent adjacent pairs into longer tokens, producing a vocabulary of 50k–200k entries. The result balances "one token per common word" with "always representable for any input".

SentencePiece

SentencePiece is the tokenizer family used by Google Gemini and many open-source models. Unlike BPE it operates on raw Unicode without requiring pre-tokenized text, which makes it more robust for non-English languages. The downside for cost prediction is that there is no official JavaScript port, so client-side counters must approximate.

Context window

The context window is the maximum number of tokens — input plus output combined — a model can process in one request. GPT-5.5 and Claude Opus 4.7 expose 1,000,000 tokens; legacy models like GPT-4o sit at 128,000. Going over the limit returns an HTTP 400; sitting close to the limit costs you headroom for the response.

Tokenizer family

A tokenizer family is a group of models that share a vocabulary. All OpenAI o200k models tokenize identically; all Llama 3 variants share their BPE vocabulary. When you switch within a family the token count stays the same; switching across families (GPT → Claude → Gemini) can change the count by 5–30%.

Related tools