Claude Pricing Calculator for Anthropic API Costs
Estimate Claude API spend by model, input tokens, output tokens, request volume, prompt caching, and batch processing. Use the live AI Pricing Hub model table for planning, then verify final rates on Anthropic's official pricing page.
Quick answer · Pricing data refreshed 2026-03-13 12:45:29
Claude API cost depends on model tier, output length, and reusable context.
Start with the base formula: input tokens times input price plus output tokens times output price. Then adjust for prompt caching when you reuse system prompts, documents, tool definitions, or long conversation history. Batch jobs can reduce token cost when delayed processing is acceptable.
Anthropic Claude API cost estimator
Model a monthly workload with base token pricing, optional cache reads, optional 5-minute cache writes, and batch discounts. This estimates model token cost only.
How to use the Claude pricing calculator
1. Pick a Claude model
Choose Sonnet, Haiku, Opus, or another Claude row from the pricing database. The calculator loads that row's input, output, cached input, and batch prices when available.
2. Enter token counts
Use a typical prompt and response from your app. Include retrieved documents, system prompts, tool definitions, and conversation history in the input token estimate.
3. Add monthly volume
Estimate expected requests per month. For agents, count each model call in a multi-step workflow instead of only counting user sessions.
4. Model discounts
Set cache read, cache write, and batch shares only for the traffic that actually uses those features. Values are clamped to 0-100% in the calculator logic.
Claude model pricing table
These rows come from the local AI Pricing Hub database and are normalized to USD per 1M tokens. Use the official Anthropic page for the final current quote.
| Model | Input | Output | Cached input | Batch | Context | Best fit |
|---|---|---|---|---|---|---|
|
Claude 3 Sonnet
Anthropic
|
Free | Free | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude 3 Sonnet
Anthropic
|
Free | Free | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude 3.5 Sonnet (2024-06-20)
Anthropic
|
Free | Free | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude 3.5 Sonnet (2024-06-20)
Anthropic
|
Free | Free | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude 3.7 Sonnet
Anthropic
|
$3.00 | $15.00 | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude Sonnet 4
Anthropic
|
$3.00 | $15.00 | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude Sonnet 4
Anthropic
|
$3.00 | $15.00 | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude Sonnet 4.5
Anthropic
|
$3.00 | $15.00 | - | Estimate 50% discount | 1.0M | Balanced coding, writing, agents, and production assistants |
|
Claude 3.5 Sonnet
Anthropic
|
$6.00 | $30.00 | - | Estimate 50% discount | 200k | Balanced coding, writing, agents, and production assistants |
|
Claude 3.5 Haiku (2024-10-22)
Anthropic
|
Free | Free | - | Estimate 50% discount | 200k | High-volume support, classification, and low-latency tasks |
|
Claude 3 Haiku
Anthropic
|
$0.25 | $1.25 | - | Estimate 50% discount | 200k | High-volume support, classification, and low-latency tasks |
|
Claude 3 Haiku
Anthropic
|
$0.25 | $1.25 | - | Estimate 50% discount | 200k | High-volume support, classification, and low-latency tasks |
|
Claude 3.5 Haiku (2024-10-22)
Anthropic
|
$0.80 | $4.00 | - | Estimate 50% discount | 200k | High-volume support, classification, and low-latency tasks |
|
Claude 3.5 Haiku
Anthropic
|
$0.80 | $4.00 | - | Estimate 50% discount | 200k | High-volume support, classification, and low-latency tasks |
|
Claude Haiku 4.5
Anthropic
|
$1.00 | $5.00 | - | Estimate 50% discount | 200k | High-volume support, classification, and low-latency tasks |
|
Claude 3 Opus
Anthropic
|
Free | Free | - | Estimate 50% discount | 200k | Complex reasoning, analysis, and high-value review workflows |
Claude API cost examples
Use these examples to choose realistic calculator inputs before estimating your own workload.
Support chatbot
1,000-3,000 input tokens, 300-900 output tokens, many short requests. Cache shared policy and FAQ context when possible.
Claude Code or agent workflow
2,000-10,000 input tokens, 1,000-6,000 output tokens. Output length, tool loops, and retries often dominate cost.
Document analysis
20,000-150,000 input tokens, 500-2,000 output tokens. Context length and prompt caching are more important than raw output price.
Example input and output
| Scenario | Calculator input | Result to read |
|---|---|---|
| Monthly support bot | 2,000 input tokens, 800 output tokens, 50,000 requests, 30% cached input | Monthly token cost, per-request cost, and whether input or output drives the bill. |
| Offline document extraction | 60,000 input tokens, 1,200 output tokens, 8,000 requests, 80% batch share | Estimated batch-adjusted cost before adding storage, review, retry, or queue costs. |
Prompt caching, batch API, and real Claude costs
| Pricing lever | Use when | Cost effect to model |
|---|---|---|
| Prompt caching | You reuse system prompts, examples, long documents, tool schemas, or conversation context. | Cache reads are commonly far cheaper than base input tokens; cache writes cost more than standard input. |
| Batch processing | Classification, enrichment, evals, extraction, and other offline jobs can wait for asynchronous processing. | Batch calls are typically discounted, but they are not a fit for real-time chat. |
| Extended thinking | Reasoning quality matters more than shortest answer length. | Thinking tokens are billed as output and can increase spend when reasoning budgets are high. |
Official references: Anthropic pricing, prompt caching, batch processing, and token counting.
Claude pricing vs other LLM APIs
Claude is often chosen for quality, context handling, and coding or analysis workflows. Use cross-provider pricing only after the capability fit is clear.
When Claude can be cheaper
A stronger Claude model can reduce retries, manual review, prompt length, or tool loops. For quality-sensitive work, measure successful completion cost rather than token sticker price alone.
When to compare alternatives
For simple classification, high-volume chat, or draft generation, compare Claude with budget models from OpenAI, Google, DeepSeek, and Mistral in the model comparison tool.
What this Claude pricing calculator does not include
This page estimates Anthropic API token cost. It does not include application hosting, vector search, logging, evaluation runs, failed retries, human review, data transfer, taxes, enterprise discounts, regional cloud-provider terms, or subscription plan limits. For production budgeting, export a sample of real prompts and responses, count tokens, then model retry rate and response-length limits.
For critical pricing decisions, use this calculator as a planning tool and verify the final model version, current price, feature availability, and billing terms with Anthropic's official pricing documentation.
Claude pricing calculator FAQ
Multiply input tokens by the model input price and output tokens by the model output price, then divide by 1,000,000 when prices are expressed per 1M tokens. Add prompt caching, batch discounts, retries, and extended thinking tokens when they apply.
Yes, when you reuse long context. Cache reads are much cheaper than standard input tokens, while cache writes cost more than regular input. The savings depend on how often later requests hit the same cached content.
Haiku-style models are usually cheapest for high-volume simple tasks. Sonnet is often the balanced choice for coding and assistants. Opus-style models are for high-value reasoning and review workflows where quality can reduce retries.
Use batch processing for delayed jobs such as enrichment, extraction, evaluation, and classification. Avoid it for real-time chat or workflows that need immediate responses.
Common causes include long conversation history, retries, extended thinking tokens, tool loops, large retrieved context, development testing, and workloads that produce longer outputs than expected.
This page focuses on Claude. Use the comparison page and general calculator to compare the same workload across OpenAI, Google, DeepSeek, Mistral, and other providers.