GPT-OSS Pricing Calculator for API and Self-Hosted Costs

Estimate gpt-oss token spend across hosted providers, free routes, and self-hosted deployment planning. Compare gpt-oss-120b and gpt-oss-20b rows from the AI Pricing Hub database, then model monthly input, output, cache, and batch usage.

Quick answer · Pricing data refreshed 2026-03-13 12:45:29

GPT-OSS weights are free, but API and infrastructure costs depend on where you run the model.

OpenAI describes gpt-oss as open-weight reasoning models available in 120B and 20B sizes. The weights can be downloaded and used under the model license, but hosted API usage, GPU time, storage, monitoring, and third-party routing still create real operating cost. This calculator focuses on the hosted token-price side and keeps self-hosted costs visible as a separate planning note.

Tracked gpt-oss rows
7
2 free or zero-price rows found in the pricing database
Default calculator model
gpt-oss-120b (exacto)
Free input / Free output
Lowest paid blended row
gpt-oss-20b
$0.12 combined per 1M

GPT-OSS API cost estimator

Model token spend for a monthly workload. Free provider routes are shown as $0 token cost, but they can still have rate limits, queueing, usage caps, or data-handling tradeoffs.

Percent of input tokens charged at a cache-read price when the row has one.
Percent of traffic eligible for batch input/output prices.
Add a percentage for retries, eval runs, logs, and prompt experiments.
Token cost/month $0.00
Cost with buffer $0.00
Per request $0.000000
Monthly tokens 0

Select a gpt-oss row to calculate token cost.

When this calculator helps

  • Compare gpt-oss-120b and gpt-oss-20b token prices before choosing a hosted provider.
  • Estimate whether a free route is enough for prototypes or whether paid capacity is required.
  • Plan monthly spend for coding assistants, agent workflows, document processing, or reasoning-heavy chat.
  • Separate token billing from self-hosted costs such as GPUs, storage, autoscaling, observability, and engineering time.
Important: token prices are not the same as total cost of ownership. A self-hosted gpt-oss deployment can have $0 model-license cost and still be more expensive than a hosted API if utilization is low or operations work is heavy.

How to calculate GPT-OSS API cost

  1. Choose the gpt-oss row that matches your provider, model size, and pricing source.
  2. Enter average input and output tokens per request. Include system prompts, tool schemas, retrieved context, and hidden chain-of-work style overhead only when your provider bills those tokens.
  3. Enter the expected monthly request count. For agent workloads, count every model call in a multi-step run, not just each user-facing task.
  4. Add cache or batch percentages only when your selected provider exposes those discounts for the specific gpt-oss route.
  5. Use the operational buffer to cover retries, evaluation runs, prompt experiments, and failed calls that may still consume quota or engineering time.

GPT-OSS pricing table

Rows come from the AI Pricing Hub database. Always verify final production pricing with the provider before committing budget.

Model Provider Input / 1M Output / 1M Cache Batch Context Open
gpt-oss-120b (exacto) OpenInference Free Free - - 131k Estimate
gpt-oss-120b (exacto) DeepInfra $0.04 $0.19 - - 131k Estimate
gpt-oss-120b (exacto) DeepInfra $0.04 $0.19 - - 131k Estimate
gpt-oss-20b OpenInference Free Free - - 131k Estimate
gpt-oss-20b Chutes $0.02 $0.10 - - 131k Estimate
gpt-oss-20b DeepInfra $0.03 $0.14 - - 131k Estimate
gpt-oss-safeguard-20b Groq $0.07 $0.30 - - 131k Estimate

Hosted API vs self-hosted GPT-OSS cost

Hosted API pricing

Hosted gpt-oss routes usually bill per million input and output tokens. This is easiest to budget when traffic is spiky, when you need rapid experimentation, or when your team does not want to operate inference infrastructure. The tradeoff is provider-specific rate limits, queueing, routing behavior, data-retention policy, and a price that can change outside your own deployment.

For a hosted API, the main cost levers are model size, output length, prompt reuse, batch eligibility, and failure/retry rate. Output tokens often dominate the invoice, so reducing verbose responses can matter more than shaving a few tokens from the prompt.

Self-hosted pricing

Open-weight does not mean free production inference. Self-hosting shifts cost from token billing to GPUs, memory, disk, deployment engineering, monitoring, scaling, security review, and utilization risk. It can be attractive for steady high-volume workloads, data-control requirements, edge deployments, or customized fine-tuning workflows.

A practical decision rule is to compare your predicted hosted monthly spend with the fully loaded monthly cost of the servers and engineering hours needed to meet the same latency and reliability target.

Example GPT-OSS monthly workloads

Developer coding assistant

50,000 requests · 1,800 input · 700 output tokens

$0.000000 / month

Use this as a pattern, not a benchmark. Real cost changes with provider price, tool calls, retries, and context length.

Document reasoning workflow

12,000 requests · 9,000 input · 1,400 output tokens

$0.000000 / month

Use this as a pattern, not a benchmark. Real cost changes with provider price, tool calls, retries, and context length.

High-volume support triage

350,000 requests · 900 input · 350 output tokens

$0.000000 / month

Use this as a pattern, not a benchmark. Real cost changes with provider price, tool calls, retries, and context length.

Alternatives to compare before choosing GPT-OSS

GPT-OSS is a strong fit when open weights, local deployment options, or provider flexibility matter. If your main priority is lowest hosted token price, compare it against other budget chat, coding, and reasoning rows before finalizing a provider.

Model Provider Input Output Context
GTE-Base Other $0.0050 Free 512
E5-Base-v2 Other $0.0050 Free 512
paraphrase-MiniLM-L6-v2 Other $0.0050 Free 512
all-MiniLM-L12-v2 Other $0.0050 Free 512
bge-base-en-v1.5 Other $0.0050 Free 512
multi-qa-mpnet-base-dot-v1 Other $0.0050 Free 512
all-mpnet-base-v2 Other $0.0050 Free 512
all-MiniLM-L6-v2 Other $0.0050 Free 512

For broader model selection, open the AI API price comparison or use the cheapest LLM API guide.

Limitations and edge cases

  • Provider rows can differ for the same gpt-oss model because routing, free tiers, latency targets, and enterprise plans are different.
  • Free routes can be useful for tests, but production systems should plan for rate limits, availability, and fallback behavior.
  • Self-hosted estimates require hardware-specific throughput, utilization, and operations assumptions that are outside token-price billing.
  • Some providers bill cache reads, batch jobs, failed attempts, or long-context requests differently. Use the provider's official pricing page for final procurement.
  • Prompt caching only reduces cost when repeated prompt segments are actually reused and the selected provider exposes a cache price for the gpt-oss route.

GPT-OSS pricing FAQ

The model weights are open and can be downloaded, but running them still costs money through hosted API billing or infrastructure. Hosted providers may also offer free routes with rate limits or usage policies.

The 120B model is usually more capable and more expensive to host, while the 20B model is easier to run on smaller hardware and often cheaper through API providers. The exact token price depends on the selected provider row.

Use hosted API access when you need quick deployment, low operational work, or bursty traffic. Consider self-hosting when you have steady volume, data-control needs, GPU capacity, or a team that can operate inference reliably.

Free routes can have request caps, slower queues, data-policy tradeoffs, or availability limits. Production teams often keep a paid route or fallback model for reliability.

No. It calculates hosted token cost from the pricing database. For self-hosting, compare the result with your GPU, storage, monitoring, deployment, and engineering costs.