Cheapest LLM API in 2026: Live Picks by Workload

The cheapest LLM API depends on your input/output ratio, quality bar, context window, and whether you can use caching or batch jobs. This guide ranks low-cost options from the live AI Pricing Hub database and explains how to choose without underestimating real usage cost.

Quick answer · Pricing data refreshed 2026-03-13 12:45:29

The cheapest LLM API is not one universal model.

For balanced text workloads, start with the lowest combined input plus output price. For code generation, output price matters more because generated code is usually longer than the prompt. For document analysis, input price and context length are the main constraints. For reasoning, a slightly higher per-token price can still be cheaper if it solves the task in fewer retries.

Balanced cost

GTE-Base
Other · $0.0050 input / Free output

Best first check for general chat and support bots.

Chat workloads

GTE-Base
Other · $0.0050 input / Free output

Use when requests are short and response length is predictable.

Code output

Qwen2.5 Coder 7B Instruct
Alibaba Qwen · $0.03 input / $0.09 output

Output-heavy tasks should prioritize low output token cost.

Reasoning

Llama 3 8B Lunaris
Other · $0.04 input / $0.05 output

Filter for reasoning capability before comparing price.

Cheapest LLM API cost decision flow showing input tokens, output tokens, context window, retries, and quality fit
The cheapest API choice changes when the workload shifts from chat to code, reasoning, or long-document processing.

Live cheapest LLM APIs by blended price

Sorted by input + output price per 1M tokens. Use this list for balanced chat, support, and lightweight content workflows.

Model Provider Input Output Total Context Best fit
GTE-Base
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
E5-Base-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
paraphrase-MiniLM-L6-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
all-MiniLM-L12-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
bge-base-en-v1.5
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
multi-qa-mpnet-base-dot-v1
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
all-mpnet-base-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
all-MiniLM-L6-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
Qwen3 Embedding 8B
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
DeepInfra
$0.01 Free $0.01 32k Code, agents, developer tools
Qwen3 Embedding 8B
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
Nebius
$0.01 Free $0.01 32k Code, agents, developer tools

Cheapest LLM API for chat and support

Choose low total price, predictable latency, and enough context for conversation history.

Model Provider Input Output Total Context Best fit
GTE-Base
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
E5-Base-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
paraphrase-MiniLM-L6-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
all-MiniLM-L12-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
bge-base-en-v1.5
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
multi-qa-mpnet-base-dot-v1
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
all-mpnet-base-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
all-MiniLM-L6-v2
chat,tool_use
Other Other
DeepInfra
$0.0050 Free $0.0050 512 Chat, classification, content ops
Qwen3 Embedding 8B
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
DeepInfra
$0.01 Free $0.01 32k Code, agents, developer tools
Qwen3 Embedding 8B
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
Nebius
$0.01 Free $0.01 32k Code, agents, developer tools

Cheapest LLM API for code generation

Code tasks are output-heavy, so this table prioritizes models with low output price and code capability.

Model Provider Input Output Total Context Best fit
Qwen2.5 Coder 7B Instruct
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
Nebius
$0.03 $0.09 $0.12 33k Code, agents, developer tools
Qwen2.5 7B Instruct
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
Phala
$0.04 $0.10 $0.14 33k Code, agents, developer tools
Qwen3 235B A22B Instruct 2507
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
DeepInfra
$0.07 $0.10 $0.17 262k Code, agents, developer tools
Qwen2.5 Coder 32B Instruct
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
Chutes
$0.03 $0.11 $0.14 33k Code, agents, developer tools
Qwen-Turbo
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
Alibaba
$0.03 $0.13 $0.16 131k Code, agents, developer tools
Qwen3 8B
chat,reasoning,tool_use,code
Alibaba Qwen Alibaba Qwen
Novita
$0.04 $0.14 $0.17 41k Code, agents, developer tools
Qwen2.5 Coder 32B Instruct
chat,tool_use,code
Alibaba Qwen Alibaba Qwen
Hyperbolic
$0.20 $0.20 $0.40 33k Code, agents, developer tools
Qwen2.5-VL 7B Instruct
chat,vision,tool_use,code
Alibaba Qwen Alibaba Qwen
Hyperbolic
$0.20 $0.20 $0.40 33k Code, agents, developer tools
Qwen2.5 VL 32B Instruct
chat,vision,tool_use,code
Alibaba Qwen Alibaba Qwen
Chutes
$0.05 $0.22 $0.27 128k Code, agents, developer tools
Qwen3 14B
chat,reasoning,tool_use,code
Alibaba Qwen Alibaba Qwen
Chutes
$0.05 $0.22 $0.27 41k Code, agents, developer tools

Cheapest reasoning-capable LLM APIs

Reasoning models can be worth a premium when they reduce retries, tool calls, or manual review.

Model Provider Input Output Total Context Best fit
Llama 3 8B Lunaris
chat,tool_use,reasoning
Other Other
DeepInfra
$0.04 $0.05 $0.09 8k Reasoning and multi-step tasks
gpt-oss-20b
chat,reasoning,tool_use
OpenAI OpenAI
Chutes
$0.02 $0.10 $0.12 131k Reasoning and multi-step tasks
R1 Distill Llama 70B
chat,reasoning,tool_use
DeepSeek DeepSeek
Chutes
$0.03 $0.11 $0.14 131k Reasoning and multi-step tasks
gpt-oss-20b
chat,reasoning,tool_use
OpenAI OpenAI
DeepInfra
$0.03 $0.14 $0.17 131k Reasoning and multi-step tasks
Qwen3 8B
chat,reasoning,tool_use,code
Alibaba Qwen Alibaba Qwen
Novita
$0.04 $0.14 $0.17 41k Code, agents, developer tools
Trinity Mini
chat,reasoning,tool_use
Other Other
Clarifai
$0.04 $0.15 $0.20 131k Reasoning and multi-step tasks
Nemotron Nano 9B V2
chat,reasoning,tool_use
NVIDIA NVIDIA
DeepInfra
$0.04 $0.16 $0.20 131k Reasoning and multi-step tasks
gpt-oss-120b (exacto)
chat,reasoning,tool_use
OpenAI OpenAI
DeepInfra
$0.04 $0.19 $0.23 131k Reasoning and multi-step tasks
gpt-oss-120b (exacto)
chat,reasoning,tool_use
OpenAI OpenAI
DeepInfra
$0.04 $0.19 $0.23 131k Reasoning and multi-step tasks
Nemotron 3 Nano 30B A3B
chat,reasoning,tool_use
NVIDIA NVIDIA
DeepInfra
$0.05 $0.20 $0.25 262k Reasoning and multi-step tasks

Budget long-context LLM APIs

For document analysis, compare input price and context length before looking at output price.

Model Provider Input Output Total Context Best fit
Granite 4.0 Micro
chat,tool_use
IBM Granite IBM Granite
Cloudflare
$0.02 $0.11 $0.13 131k Long context analysis
Gemma 3 4B
chat,vision,tool_use
Google Google
Chutes
$0.02 $0.07 $0.09 131k Text plus image analysis
Mistral Nemo
chat,tool_use
Mistral Mistral
DeepInfra
$0.02 $0.04 $0.06 131k Long context analysis
Llama Guard 3 8B
chat,tool_use
Meta Meta
Nebius
$0.02 $0.06 $0.08 131k Long context analysis
gpt-oss-20b
chat,reasoning,tool_use
OpenAI OpenAI
Chutes
$0.02 $0.10 $0.12 131k Reasoning and multi-step tasks
Gemma 3 12B
chat,vision,tool_use
Google Google
Chutes
$0.03 $0.10 $0.13 131k Text plus image analysis
R1 Distill Llama 70B
chat,reasoning,tool_use
DeepSeek DeepSeek
Chutes
$0.03 $0.11 $0.14 131k Reasoning and multi-step tasks
Mistral Small 3.1 24B
chat,vision,tool_use
Mistral Mistral
Chutes
$0.03 $0.11 $0.14 128k Text plus image analysis
Gemma 3 27B
chat,vision,tool_use
Google Google
Chutes
$0.03 $0.11 $0.14 128k Text plus image analysis
gpt-oss-20b
chat,reasoning,tool_use
OpenAI OpenAI
DeepInfra
$0.03 $0.14 $0.17 131k Reasoning and multi-step tasks

Free or free-tier LLM API candidates

Free models are useful for prototypes, tests, and low-risk workflows. Check rate limits before production use.

Model Provider Input Output Total Context Best fit
Sonoma Sky Alpha
chat,vision,reasoning
OpenRouter OpenRouter
OpenRouter
Free Free Free 2.0M Reasoning and multi-step tasks
Sonoma Sky Alpha
chat,vision,reasoning
OpenRouter OpenRouter
OpenRouter
Free Free Free 2.0M Reasoning and multi-step tasks
Sonoma Dusk Alpha
chat,vision
OpenRouter OpenRouter
OpenRouter
Free Free Free 2.0M Text plus image analysis
Sonoma Dusk Alpha
chat,vision
OpenRouter OpenRouter
OpenRouter
Free Free Free 2.0M Text plus image analysis
Gemini 1.5 Pro
chat,vision
Google Google
OpenRouter
Free Free Free 2.0M Text plus image analysis
Gemini 1.5 Pro
chat,vision
Google Google
OpenRouter
Free Free Free 2.0M Text plus image analysis
Auto Router
chat,vision,video,audio,image_gen
OpenRouter OpenRouter
OpenRouter
Free Free Free 2.0M Text plus image analysis
Auto Router
chat,vision,video,audio,image_gen
OpenRouter OpenRouter
OpenRouter
Free Free Free 2.0M Text plus image analysis

How to calculate the real cheapest LLM API

The lowest sticker price can still be expensive if your workload has long outputs, repeated context, retries, or non-token fees. Normalize every provider to the same unit before you decide:

Request cost = input tokens / 1,000,000 × input price + output tokens / 1,000,000 × output price

Then add request fees, image fees, cache discounts, batch discounts, and retry rates when they apply. A support bot with 300 input tokens and 120 output tokens is usually driven by combined price. A coding assistant that emits 2,000 output tokens is driven by output price. A document pipeline that reads 80,000 tokens per request is driven by input price, context length, and caching support.

Input-heavy

Summarization, RAG, legal review, and long document analysis. Prioritize low input price, context length, and prompt caching.

Output-heavy

Code generation, report writing, synthetic data, and multi-step agents. Output price usually dominates monthly spend.

Quality-sensitive

Reasoning, production support, and compliance reviews. Count retries and human escalation, not just token price.

Selection rules before you switch providers

Use these checks to avoid choosing a cheap model that increases your total cost later.

  1. Match capability first. Do not compare a tiny chat model to a reasoning or vision workload if it cannot complete the task.
  2. Estimate input/output ratio. Chat is often balanced, code is output-heavy, and retrieval workflows are input-heavy.
  3. Check context length. A cheap 8K model is not cheap if you must chunk every request into ten calls.
  4. Use caching for repeated prompts. System prompts, policy text, and shared documents can make cached input pricing more important than base input pricing.
  5. Use batch pricing for offline jobs. Classification, enrichment, extraction, and evaluation pipelines often tolerate delayed execution.
  6. Measure retries. If a model needs two retries to match another model's first answer, its effective price can triple.

Practical cheapest LLM API examples

These scenarios show why the same price table leads to different choices.

Workload Cost driver What to optimize Next step
Customer support chatbot Many short requests Total input + output price, latency, safe fallback behavior Estimate monthly support cost
Code assistant Longer output tokens Output price, code capability, retry rate Compare code-capable models
Document summarization Large input context Input price, context length, prompt caching Browse text models
Offline enrichment pipeline High volume, flexible timing Batch discounts and deterministic output limits Check recent pricing updates

When the cheapest LLM API is the wrong choice

Do not optimize only for cents per million tokens when the model is part of a paid product workflow. A very cheap model can be the wrong pick when it has weak instruction following, insufficient context, missing tool use, poor structured output, unstable latency, or limited availability in your region. For production, run a small evaluation set before switching traffic.

A practical evaluation should include successful completion rate, average retries, output length, refusal rate, latency, and human review cost. If the cheapest candidate has a 15% lower completion rate, a slightly more expensive model may still reduce the total cost of the workflow.

Authoritative pricing references

Verify mission-critical decisions against official provider pricing pages: OpenAI, Anthropic, Google Gemini, DeepSeek, and OpenRouter reasoning token notes.

Cheapest LLM API FAQ

For a balanced workload, start with the lowest combined input plus output price in the live table above. For real workloads, verify the winner against your input/output ratio, context length, retries, and quality threshold.

Free tiers are useful for prototypes and tests, but they often have rate limits, model restrictions, or availability constraints. Production systems should compare paid fallback options before relying on a free tier.

Optimize for the larger side of your workload. RAG and summarization are input-heavy. Code generation, agent steps, and long reports are output-heavy. Chat support often needs a balanced total price.

If a stronger model reduces retries, tool calls, hallucination review, or prompt length, its total workflow cost can be lower even when token price is higher.

Re-check before major usage increases, new model launches, and provider migrations. Prices, free tiers, context windows, and cache discounts change often.

Yes. The tables normalize prices to USD per 1M tokens and link to provider/model pages so you can compare across major API providers.