OpenAI API Pricing Calculator for GPT Costs

Estimate OpenAI API spend by model, input tokens, output tokens, request volume, cached input, batch usage, and optional Responses API tool calls. Use live AI Pricing Hub rows for planning, then verify final billing on OpenAI's official API pricing page.

Quick answer · Pricing data refreshed 2026-03-13 12:45:29

OpenAI API cost is driven by output length, model tier, caching, batch eligibility, and tool calls.

The base estimate is input tokens multiplied by the selected model's input price plus output tokens multiplied by output price. For real products, also separate cached context, batch jobs, web search calls, file search calls, code interpreter or hosted shell sessions, image generation, transcription, and regional processing rules because those line items can change the final bill.

Tracked OpenAI rows
36
19 rows look reasoning-capable or o-series related
Default calculator model
GPT-4o-mini Search Preview
$0.15 input / $0.60 output
Lowest paid blended row
Text Embedding 3 Small
$0.02 combined per 1M

OpenAI API cost estimator

Model a monthly GPT workload with base token pricing, optional cached input, optional batch pricing, and optional tool-call charges. This is a planning estimate, not an invoice.

Percent of input tokens billed at cached input price when the row has one.
Percent of requests eligible for batch prices when available.
Use for web search, file search, or similar billable calls.
OpenAI pricing lists several tool call rates; edit per feature.
Use for Code Interpreter or hosted shell style sessions.
Set a per-session planning value based on memory and duration.
Monthly estimate $0.00
Per request $0.000000
Token cost $0.00
Tools and sessions $0.00

Selected row

GPT-4o-mini Search Preview

Choose an OpenAI row to see the token formula.

How to use the OpenAI API pricing calculator

1. Pick the model row

Choose the GPT, o-series, embedding, or provider row that matches your deployment path. Direct OpenAI, Azure OpenAI, and aggregator rows can have different pricing and limits.

2. Use real token logs

Estimate with prompt samples first, then replace averages with logged input and output tokens. Output tokens should be tracked separately because they often dominate spend.

3. Add modifiers only when used

Set cached input, batch share, tool calls, and container sessions only for workloads that actually use them. Leave those fields at zero for a plain chat estimate.

Live OpenAI API pricing rows

Rows come from the AI Pricing Hub model database. Use official OpenAI pricing as the final billing reference before launch.

Model Provider Input / 1M Output / 1M Cached input Batch Context Capabilities
GPT-4o-mini Search Preview OpenAI $0.15 $0.60 - - 128k chat,tool_use · text · text
GPT-4o-mini OpenAI $0.15 $0.60 - - 128k chat,vision,tool_use · text,image,file · text
GPT-4o-mini (2024-07-18) OpenAI $0.15 $0.60 - - 128k chat,vision,tool_use · text,image,file · text
GPT-4.1 Mini OpenAI $0.40 $1.60 - - 1.0M chat,vision,tool_use · image,text,file · text
GPT-4o Audio OpenAI $2.50 $10.00 - - 128k chat,audio,tts,tool_use · audio,text · text,audio
GPT-4o Search Preview OpenAI $2.50 $10.00 - - 128k chat,tool_use · text · text
GPT-4o (2024-11-20) OpenAI $2.50 $10.00 - - 128k chat,vision,tool_use · text,image,file · text
GPT-4o (2024-08-06) Azure $2.50 $10.00 - - 128k chat,vision,tool_use · text,image,file · text
ChatGPT-4o Free row OpenAI $5.00 $15.00 - - 128k chat,vision · text,image · text
GPT-4o (2024-05-13) OpenAI $5.00 $15.00 - - 128k chat,vision,tool_use · text,image,file · text
GPT-4o (extended) OpenAI $6.00 $18.00 - - 128k chat,vision,tool_use · text,image,file · text
ChatGPT-4o Free row OpenRouter Free Free - - 128k chat,vision · text,image · text
GPT-4.1 Nano OpenAI $0.10 $0.40 - - 1.0M chat,vision,tool_use · image,text,file · text
GPT-4.1 OpenAI $2.00 $8.00 - - 1.0M chat,vision,tool_use · image,text,file · text
o4 Mini High OpenAI $1.10 $4.40 - - 200k chat,vision,reasoning,tool_use · image,text,file · text
o4 Mini OpenAI $1.10 $4.40 - - 200k chat,vision,reasoning,tool_use · image,text,file · text
o3 Mini High OpenAI $1.10 $4.40 - - 200k chat,reasoning,tool_use · text,file · text
o3 Mini OpenAI $1.10 $4.40 - - 200k chat,tool_use,reasoning · text,file · text
o4 Mini Deep Research OpenAI $2.00 $8.00 - - 200k chat,vision,reasoning,tool_use · file,image,text · text
o3 OpenAI $2.00 $8.00 - - 200k chat,vision,reasoning,tool_use · image,text,file · text
o3 Deep Research OpenAI $10.00 $40.00 - - 200k chat,vision,reasoning,tool_use · image,text,file · text
o1 OpenAI $15.00 $60.00 - - 200k chat,vision,tool_use,reasoning · text,image,file · text
o3 Pro OpenAI $20.00 $80.00 - - 200k chat,vision,reasoning,tool_use · text,file,image · text
o1-pro OpenAI $150.00 $600.00 - - 200k chat,vision,reasoning,tool_use · text,image,file · text
o1-mini Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
o1-mini Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
o1-preview Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
o1-preview Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
o1-mini (2024-09-12) Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
o1-mini (2024-09-12) Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
o1-preview (2024-09-12) Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
o1-preview (2024-09-12) Free row OpenRouter Free Free - - 128k chat,reasoning · text · text
Text Embedding 3 Small OpenAI $0.02 Free - - 8k chat,tool_use · text · embeddings
Text Embedding Ada 002 OpenAI $0.10 Free - - 8k chat,tool_use · text · embeddings
gpt-oss-20b Chutes $0.02 $0.10 - - 131k chat,reasoning,tool_use · text · text
Text Embedding 3 Large OpenAI $0.13 Free - - 8k chat,tool_use · text · embeddings

OpenAI pricing factors to check before production

Cost factor Why it matters Planning rule
Output tokens Generated answers, code, and structured JSON can be longer and more expensive than prompts. Track output length by use case instead of applying one average to chat, agents, and extraction.
Cached input Reusable system prompts, policy text, schemas, tool definitions, and long files may qualify for lower cached input rates. Cache stable context with high reuse; do not assume one-off prompts receive cache savings.
Batch API Offline jobs can often use lower rates, but they are not suitable for latency-sensitive product flows. Separate nightly extraction, evaluation, tagging, and backfills from real-time chat traffic.
Responses API tools Web search, file search, and code execution can add per-call, storage, or session charges beyond model tokens. Budget tool calls separately and cap them by route, user tier, or task type.
Provider path Direct OpenAI, Azure OpenAI, OpenRouter, and enterprise agreements can expose different rows, limits, and billing terms. Use this page for OpenAI model planning, then compare deployment paths before committing traffic.

Example OpenAI API cost scenarios

Support chatbot

A support bot using GPT-4o-mini Search Preview might average 1,500 input tokens and 500 output tokens per turn. The biggest budget risk is long conversation history, retries, and tool calls on every answer.

Compare low-cost chat models

Coding assistant

Coding workflows often generate longer outputs than normal chat. Track code output separately and compare GPT rows against Claude, Gemini, DeepSeek, and other code-capable models.

Open model comparison

Document extraction

Extraction jobs may send large input documents and short JSON output. Cached input and batch processing usually matter more than the headline chat price.

See cheapest LLM API guide

Agent with tools

Agents that call search, file retrieval, or code execution need a separate tool budget. A cheap model can still become expensive if every step triggers paid tools.

Browse OpenAI model rows

Alternatives to compare with OpenAI pricing

OpenAI has broad API coverage, but the cheapest production choice depends on quality, latency, context length, output size, tool support, and provider routing.

Alternative Brand Input / 1M Output / 1M Best comparison use
GTE-Base Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
E5-Base-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
paraphrase-MiniLM-L6-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
all-MiniLM-L12-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
bge-base-en-v1.5 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
multi-qa-mpnet-base-dot-v1 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
all-mpnet-base-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
all-MiniLM-L6-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
Qwen3 Embedding 8B Alibaba Qwen $0.01 Free Coding, agentic development, or output-heavy workloads.
Qwen3 Embedding 8B Alibaba Qwen $0.01 Free Coding, agentic development, or output-heavy workloads.

Limitations and billing notes

  • The calculator uses per-1M token rows from this site's database. Official OpenAI API pricing remains the final billing reference.
  • Tool calls, file storage, code execution sessions, image generation, transcription, video generation, fine-tuning, and enterprise commitments may use separate pricing rules.
  • Cached input savings depend on the exact model, endpoint, provider row, and whether the reused context qualifies for caching.
  • Batch pricing is useful for delayed work, not for interactive chat where the user expects an immediate answer.
  • Regional processing, data residency, Azure OpenAI, OpenRouter, and negotiated enterprise terms can change the effective price.

Official references to verify

Use these sources before deploying a budget-sensitive OpenAI workload.

OpenAI API pricing FAQ

Multiply input tokens by the selected model input price, output tokens by output price, then add any cached input, batch, tool-call, storage, or code execution costs that your workflow actually uses. For monthly spend, multiply the per-request token estimate by expected request volume.

No. ChatGPT plans are subscription products, while OpenAI API billing is generally usage based. Do not use a ChatGPT Plus, Business, or Enterprise seat price to estimate API traffic.

Cached input helps when many requests reuse the same prompt, long context, tool definitions, or files and the selected model or provider row exposes a cached input rate. One-off prompts usually should not assume cache savings.

Use batch for delayed jobs such as offline extraction, tagging, evaluation, and backfills. Keep real-time user chat, streaming UX, and latency-sensitive agents on normal endpoints unless delayed responses are acceptable.

Some tools can add call, storage, or session charges in addition to model tokens. Budget web search, file search, and code execution separately instead of hiding them inside an average token cost.

It depends on the workload. OpenAI can be strong for broad ecosystem support, tool use, and reasoning models, while Claude, Gemini, GPT-OSS, or other providers may be cheaper after output length, quality, retries, and context needs are included.