Cheapest LLM API in 2026: Live Picks by Workload

The cheapest LLM API depends on your input/output ratio, quality bar, context window, and whether you can use caching or batch jobs. This guide ranks low-cost options from the live AI Pricing Hub database and explains how to choose without underestimating real usage cost.

Calculate your cost Compare models

Quick answer · Pricing data refreshed 2026-03-13 12:45:29

The cheapest LLM API is not one universal model.

For balanced text workloads, start with the lowest combined input plus output price. For code generation, output price matters more because generated code is usually longer than the prompt. For document analysis, input price and context length are the main constraints. For reasoning, a slightly higher per-token price can still be cheaper if it solves the task in fewer retries.

Balanced cost

GTE-Base

Other · $0.0050 input / Free output

Best first check for general chat and support bots.

Chat workloads

GTE-Base

Other · $0.0050 input / Free output

Use when requests are short and response length is predictable.

Code output

Qwen2.5 Coder 7B Instruct

Alibaba Qwen · $0.03 input / $0.09 output

Output-heavy tasks should prioritize low output token cost.

Reasoning

Llama 3 8B Lunaris

Other · $0.04 input / $0.05 output

Filter for reasoning capability before comparing price.

Cheapest LLM API cost decision flow showing input tokens, output tokens, context window, retries, and quality fit — The cheapest API choice changes when the workload shifts from chat to code, reasoning, or long-document processing.

Live cheapest LLM APIs by blended price

Sorted by input + output price per 1M tokens. Use this list for balanced chat, support, and lightweight content workflows.

Model	Provider	Input	Output	Total	Context	Best fit
GTE-Base chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
E5-Base-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
paraphrase-MiniLM-L6-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
all-MiniLM-L12-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
bge-base-en-v1.5 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
multi-qa-mpnet-base-dot-v1 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
all-mpnet-base-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
all-MiniLM-L6-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
Qwen3 Embedding 8B chat,tool_use,code	Alibaba Qwen DeepInfra	$0.01	Free	$0.01	32k	Code, agents, developer tools
Qwen3 Embedding 8B chat,tool_use,code	Alibaba Qwen Nebius	$0.01	Free	$0.01	32k	Code, agents, developer tools

Cheapest LLM API for chat and support

Choose low total price, predictable latency, and enough context for conversation history.

Model	Provider	Input	Output	Total	Context	Best fit
GTE-Base chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
E5-Base-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
paraphrase-MiniLM-L6-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
all-MiniLM-L12-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
bge-base-en-v1.5 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
multi-qa-mpnet-base-dot-v1 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
all-mpnet-base-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
all-MiniLM-L6-v2 chat,tool_use	Other DeepInfra	$0.0050	Free	$0.0050	512	Chat, classification, content ops
Qwen3 Embedding 8B chat,tool_use,code	Alibaba Qwen DeepInfra	$0.01	Free	$0.01	32k	Code, agents, developer tools
Qwen3 Embedding 8B chat,tool_use,code	Alibaba Qwen Nebius	$0.01	Free	$0.01	32k	Code, agents, developer tools

Cheapest LLM API for code generation

Code tasks are output-heavy, so this table prioritizes models with low output price and code capability.

Model	Provider	Input	Output	Total	Context	Best fit
Qwen2.5 Coder 7B Instruct chat,tool_use,code	Alibaba Qwen Nebius	$0.03	$0.09	$0.12	33k	Code, agents, developer tools
Qwen2.5 7B Instruct chat,tool_use,code	Alibaba Qwen Phala	$0.04	$0.10	$0.14	33k	Code, agents, developer tools
Qwen3 235B A22B Instruct 2507 chat,tool_use,code	Alibaba Qwen DeepInfra	$0.07	$0.10	$0.17	262k	Code, agents, developer tools
Qwen2.5 Coder 32B Instruct chat,tool_use,code	Alibaba Qwen Chutes	$0.03	$0.11	$0.14	33k	Code, agents, developer tools
Qwen-Turbo chat,tool_use,code	Alibaba Qwen Alibaba	$0.03	$0.13	$0.16	131k	Code, agents, developer tools
Qwen3 8B chat,reasoning,tool_use,code	Alibaba Qwen Novita	$0.04	$0.14	$0.17	41k	Code, agents, developer tools
Qwen2.5 Coder 32B Instruct chat,tool_use,code	Alibaba Qwen Hyperbolic	$0.20	$0.20	$0.40	33k	Code, agents, developer tools
Qwen2.5-VL 7B Instruct chat,vision,tool_use,code	Alibaba Qwen Hyperbolic	$0.20	$0.20	$0.40	33k	Code, agents, developer tools
Qwen2.5 VL 32B Instruct chat,vision,tool_use,code	Alibaba Qwen Chutes	$0.05	$0.22	$0.27	128k	Code, agents, developer tools
Qwen3 14B chat,reasoning,tool_use,code	Alibaba Qwen Chutes	$0.05	$0.22	$0.27	41k	Code, agents, developer tools

Cheapest reasoning-capable LLM APIs

Reasoning models can be worth a premium when they reduce retries, tool calls, or manual review.

Model	Provider	Input	Output	Total	Context	Best fit
Llama 3 8B Lunaris chat,tool_use,reasoning	Other DeepInfra	$0.04	$0.05	$0.09	8k	Reasoning and multi-step tasks
gpt-oss-20b chat,reasoning,tool_use	OpenAI Chutes	$0.02	$0.10	$0.12	131k	Reasoning and multi-step tasks
R1 Distill Llama 70B chat,reasoning,tool_use	DeepSeek Chutes	$0.03	$0.11	$0.14	131k	Reasoning and multi-step tasks
gpt-oss-20b chat,reasoning,tool_use	OpenAI DeepInfra	$0.03	$0.14	$0.17	131k	Reasoning and multi-step tasks
Qwen3 8B chat,reasoning,tool_use,code	Alibaba Qwen Novita	$0.04	$0.14	$0.17	41k	Code, agents, developer tools
Trinity Mini chat,reasoning,tool_use	Other Clarifai	$0.04	$0.15	$0.20	131k	Reasoning and multi-step tasks
Nemotron Nano 9B V2 chat,reasoning,tool_use	NVIDIA DeepInfra	$0.04	$0.16	$0.20	131k	Reasoning and multi-step tasks
gpt-oss-120b (exacto) chat,reasoning,tool_use	OpenAI DeepInfra	$0.04	$0.19	$0.23	131k	Reasoning and multi-step tasks
gpt-oss-120b (exacto) chat,reasoning,tool_use	OpenAI DeepInfra	$0.04	$0.19	$0.23	131k	Reasoning and multi-step tasks
Nemotron 3 Nano 30B A3B chat,reasoning,tool_use	NVIDIA DeepInfra	$0.05	$0.20	$0.25	262k	Reasoning and multi-step tasks

Budget long-context LLM APIs

For document analysis, compare input price and context length before looking at output price.

Model	Provider	Input	Output	Total	Context	Best fit
Granite 4.0 Micro chat,tool_use	IBM Granite Cloudflare	$0.02	$0.11	$0.13	131k	Long context analysis
Gemma 3 4B chat,vision,tool_use	Google Chutes	$0.02	$0.07	$0.09	131k	Text plus image analysis
Mistral Nemo chat,tool_use	Mistral DeepInfra	$0.02	$0.04	$0.06	131k	Long context analysis
Llama Guard 3 8B chat,tool_use	Meta Nebius	$0.02	$0.06	$0.08	131k	Long context analysis
gpt-oss-20b chat,reasoning,tool_use	OpenAI Chutes	$0.02	$0.10	$0.12	131k	Reasoning and multi-step tasks
Gemma 3 12B chat,vision,tool_use	Google Chutes	$0.03	$0.10	$0.13	131k	Text plus image analysis
R1 Distill Llama 70B chat,reasoning,tool_use	DeepSeek Chutes	$0.03	$0.11	$0.14	131k	Reasoning and multi-step tasks
Mistral Small 3.1 24B chat,vision,tool_use	Mistral Chutes	$0.03	$0.11	$0.14	128k	Text plus image analysis
Gemma 3 27B chat,vision,tool_use	Google Chutes	$0.03	$0.11	$0.14	128k	Text plus image analysis
gpt-oss-20b chat,reasoning,tool_use	OpenAI DeepInfra	$0.03	$0.14	$0.17	131k	Reasoning and multi-step tasks

Free or free-tier LLM API candidates

Free models are useful for prototypes, tests, and low-risk workflows. Check rate limits before production use.

Model	Provider	Input	Output	Total	Context	Best fit
Sonoma Sky Alpha chat,vision,reasoning	OpenRouter OpenRouter	Free	Free	Free	2.0M	Reasoning and multi-step tasks
Sonoma Sky Alpha chat,vision,reasoning	OpenRouter OpenRouter	Free	Free	Free	2.0M	Reasoning and multi-step tasks
Sonoma Dusk Alpha chat,vision	OpenRouter OpenRouter	Free	Free	Free	2.0M	Text plus image analysis
Sonoma Dusk Alpha chat,vision	OpenRouter OpenRouter	Free	Free	Free	2.0M	Text plus image analysis
Gemini 1.5 Pro chat,vision	Google OpenRouter	Free	Free	Free	2.0M	Text plus image analysis
Gemini 1.5 Pro chat,vision	Google OpenRouter	Free	Free	Free	2.0M	Text plus image analysis
Auto Router chat,vision,video,audio,image_gen	OpenRouter OpenRouter	Free	Free	Free	2.0M	Text plus image analysis
Auto Router chat,vision,video,audio,image_gen	OpenRouter OpenRouter	Free	Free	Free	2.0M	Text plus image analysis

How to calculate the real cheapest LLM API

The lowest sticker price can still be expensive if your workload has long outputs, repeated context, retries, or non-token fees. Normalize every provider to the same unit before you decide:

Request cost = input tokens / 1,000,000 × input price + output tokens / 1,000,000 × output price

Then add request fees, image fees, cache discounts, batch discounts, and retry rates when they apply. A support bot with 300 input tokens and 120 output tokens is usually driven by combined price. A coding assistant that emits 2,000 output tokens is driven by output price. A document pipeline that reads 80,000 tokens per request is driven by input price, context length, and caching support.

Input-heavy

Summarization, RAG, legal review, and long document analysis. Prioritize low input price, context length, and prompt caching.

Output-heavy

Code generation, report writing, synthetic data, and multi-step agents. Output price usually dominates monthly spend.

Quality-sensitive

Reasoning, production support, and compliance reviews. Count retries and human escalation, not just token price.

Selection rules before you switch providers

Use these checks to avoid choosing a cheap model that increases your total cost later.

Match capability first. Do not compare a tiny chat model to a reasoning or vision workload if it cannot complete the task.
Estimate input/output ratio. Chat is often balanced, code is output-heavy, and retrieval workflows are input-heavy.
Check context length. A cheap 8K model is not cheap if you must chunk every request into ten calls.
Use caching for repeated prompts. System prompts, policy text, and shared documents can make cached input pricing more important than base input pricing.
Use batch pricing for offline jobs. Classification, enrichment, extraction, and evaluation pipelines often tolerate delayed execution.
Measure retries. If a model needs two retries to match another model's first answer, its effective price can triple.

Practical cheapest LLM API examples

These scenarios show why the same price table leads to different choices.

Workload	Cost driver	What to optimize	Next step
Customer support chatbot	Many short requests	Total input + output price, latency, safe fallback behavior	Estimate monthly support cost
Code assistant	Longer output tokens	Output price, code capability, retry rate	Compare code-capable models
Document summarization	Large input context	Input price, context length, prompt caching	Browse text models
Offline enrichment pipeline	High volume, flexible timing	Batch discounts and deterministic output limits	Check recent pricing updates

When the cheapest LLM API is the wrong choice

Do not optimize only for cents per million tokens when the model is part of a paid product workflow. A very cheap model can be the wrong pick when it has weak instruction following, insufficient context, missing tool use, poor structured output, unstable latency, or limited availability in your region. For production, run a small evaluation set before switching traffic.

A practical evaluation should include successful completion rate, average retries, output length, refusal rate, latency, and human review cost. If the cheapest candidate has a 15% lower completion rate, a slightly more expensive model may still reduce the total cost of the workflow.

Authoritative pricing references

Verify mission-critical decisions against official provider pricing pages: OpenAI, Anthropic, Google Gemini, DeepSeek, and OpenRouter reasoning token notes.

Cheapest LLM API FAQ

For a balanced workload, start with the lowest combined input plus output price in the live table above. For real workloads, verify the winner against your input/output ratio, context length, retries, and quality threshold.

Free tiers are useful for prototypes and tests, but they often have rate limits, model restrictions, or availability constraints. Production systems should compare paid fallback options before relying on a free tier.

Optimize for the larger side of your workload. RAG and summarization are input-heavy. Code generation, agent steps, and long reports are output-heavy. Chat support often needs a balanced total price.

If a stronger model reduces retries, tool calls, hallucination review, or prompt length, its total workflow cost can be lower even when token price is higher.

Re-check before major usage increases, new model launches, and provider migrations. Prices, free tiers, context windows, and cache discounts change often.

Yes. The tables normalize prices to USD per 1M tokens and link to provider/model pages so you can compare across major API providers.