Claude Pricing Calculator for Anthropic API Costs

Estimate Claude API spend by model, input tokens, output tokens, request volume, prompt caching, and batch processing. Use the live AI Pricing Hub model table for planning, then verify final rates on Anthropic's official pricing page.

Estimate Claude cost View Claude prices

Quick answer · Pricing data refreshed 2026-03-13 12:45:29

Claude API cost depends on model tier, output length, and reusable context.

Start with the base formula: input tokens times input price plus output tokens times output price. Then adjust for prompt caching when you reuse system prompts, documents, tool definitions, or long conversation history. Batch jobs can reduce token cost when delayed processing is acceptable.

Tracked Claude rows

Anthropic models in the local pricing database

Default calculator model

Claude 3.7 Sonnet

$3.00 input / $15.00 output

Lowest blended Claude row

Claude 3 Haiku

$1.50 combined per 1M

Anthropic Claude API cost estimator

Model a monthly workload with base token pricing, optional cache reads, optional 5-minute cache writes, and batch discounts. This estimates model token cost only.

Claude model

Input tokens/request

Output tokens/request

Requests/month

Cached input share Percent of input tokens charged as cache reads.

Cache write share Percent of input tokens written to 5-minute cache.

Batch share Percent of requests eligible for batch pricing.

Per request $0.000000

Monthly cost $0.00

Monthly tokens -

Cost driver -

How to use the Claude pricing calculator

1. Pick a Claude model

Choose Sonnet, Haiku, Opus, or another Claude row from the pricing database. The calculator loads that row's input, output, cached input, and batch prices when available.

2. Enter token counts

Use a typical prompt and response from your app. Include retrieved documents, system prompts, tool definitions, and conversation history in the input token estimate.

3. Add monthly volume

Estimate expected requests per month. For agents, count each model call in a multi-step workflow instead of only counting user sessions.

4. Model discounts

Set cache read, cache write, and batch shares only for the traffic that actually uses those features. Values are clamped to 0-100% in the calculator logic.

Claude model pricing table

These rows come from the local AI Pricing Hub database and are normalized to USD per 1M tokens. Use the official Anthropic page for the final current quote.

Model	Input	Output	Cached input	Batch	Context	Best fit
Claude 3 Sonnet Anthropic	Free	Free	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude 3 Sonnet Anthropic	Free	Free	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude 3.5 Sonnet (2024-06-20) Anthropic	Free	Free	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude 3.5 Sonnet (2024-06-20) Anthropic	Free	Free	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude 3.7 Sonnet Anthropic	$3.00	$15.00	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude Sonnet 4 Anthropic	$3.00	$15.00	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude Sonnet 4 Anthropic	$3.00	$15.00	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude Sonnet 4.5 Anthropic	$3.00	$15.00	-	Estimate 50% discount	1.0M	Balanced coding, writing, agents, and production assistants
Claude 3.5 Sonnet Anthropic	$6.00	$30.00	-	Estimate 50% discount	200k	Balanced coding, writing, agents, and production assistants
Claude 3.5 Haiku (2024-10-22) Anthropic	Free	Free	-	Estimate 50% discount	200k	High-volume support, classification, and low-latency tasks
Claude 3 Haiku Anthropic	$0.25	$1.25	-	Estimate 50% discount	200k	High-volume support, classification, and low-latency tasks
Claude 3 Haiku Anthropic	$0.25	$1.25	-	Estimate 50% discount	200k	High-volume support, classification, and low-latency tasks
Claude 3.5 Haiku (2024-10-22) Anthropic	$0.80	$4.00	-	Estimate 50% discount	200k	High-volume support, classification, and low-latency tasks
Claude 3.5 Haiku Anthropic	$0.80	$4.00	-	Estimate 50% discount	200k	High-volume support, classification, and low-latency tasks
Claude Haiku 4.5 Anthropic	$1.00	$5.00	-	Estimate 50% discount	200k	High-volume support, classification, and low-latency tasks
Claude 3 Opus Anthropic	Free	Free	-	Estimate 50% discount	200k	Complex reasoning, analysis, and high-value review workflows

Claude API cost examples

Use these examples to choose realistic calculator inputs before estimating your own workload.

Support chatbot

1,000-3,000 input tokens, 300-900 output tokens, many short requests. Cache shared policy and FAQ context when possible.

Claude Code or agent workflow

2,000-10,000 input tokens, 1,000-6,000 output tokens. Output length, tool loops, and retries often dominate cost.

Document analysis

20,000-150,000 input tokens, 500-2,000 output tokens. Context length and prompt caching are more important than raw output price.

Example input and output

Scenario	Calculator input	Result to read
Monthly support bot	2,000 input tokens, 800 output tokens, 50,000 requests, 30% cached input	Monthly token cost, per-request cost, and whether input or output drives the bill.
Offline document extraction	60,000 input tokens, 1,200 output tokens, 8,000 requests, 80% batch share	Estimated batch-adjusted cost before adding storage, review, retry, or queue costs.

Use Claude 3.7 Sonnet in the general calculator

Prompt caching, batch API, and real Claude costs

Pricing lever	Use when	Cost effect to model
Prompt caching	You reuse system prompts, examples, long documents, tool schemas, or conversation context.	Cache reads are commonly far cheaper than base input tokens; cache writes cost more than standard input.
Batch processing	Classification, enrichment, evals, extraction, and other offline jobs can wait for asynchronous processing.	Batch calls are typically discounted, but they are not a fit for real-time chat.
Extended thinking	Reasoning quality matters more than shortest answer length.	Thinking tokens are billed as output and can increase spend when reasoning budgets are high.

Official references: Anthropic pricing, prompt caching, batch processing, and token counting.

Claude pricing vs other LLM APIs

Claude is often chosen for quality, context handling, and coding or analysis workflows. Use cross-provider pricing only after the capability fit is clear.

When Claude can be cheaper

A stronger Claude model can reduce retries, manual review, prompt length, or tool loops. For quality-sensitive work, measure successful completion cost rather than token sticker price alone.

When to compare alternatives

For simple classification, high-volume chat, or draft generation, compare Claude with budget models from OpenAI, Google, DeepSeek, and Mistral in the model comparison tool.

Compare budget alternatives

What this Claude pricing calculator does not include

This page estimates Anthropic API token cost. It does not include application hosting, vector search, logging, evaluation runs, failed retries, human review, data transfer, taxes, enterprise discounts, regional cloud-provider terms, or subscription plan limits. For production budgeting, export a sample of real prompts and responses, count tokens, then model retry rate and response-length limits.

For critical pricing decisions, use this calculator as a planning tool and verify the final model version, current price, feature availability, and billing terms with Anthropic's official pricing documentation.

Claude pricing calculator FAQ

Multiply input tokens by the model input price and output tokens by the model output price, then divide by 1,000,000 when prices are expressed per 1M tokens. Add prompt caching, batch discounts, retries, and extended thinking tokens when they apply.

Yes, when you reuse long context. Cache reads are much cheaper than standard input tokens, while cache writes cost more than regular input. The savings depend on how often later requests hit the same cached content.

Haiku-style models are usually cheapest for high-volume simple tasks. Sonnet is often the balanced choice for coding and assistants. Opus-style models are for high-value reasoning and review workflows where quality can reduce retries.

Use batch processing for delayed jobs such as enrichment, extraction, evaluation, and classification. Avoid it for real-time chat or workflows that need immediate responses.

Common causes include long conversation history, retries, extended thinking tokens, tool loops, large retrieved context, development testing, and workloads that produce longer outputs than expected.

This page focuses on Claude. Use the comparison page and general calculator to compare the same workload across OpenAI, Google, DeepSeek, Mistral, and other providers.