Gemini API Pricing Calculator for Google AI Costs
Estimate Gemini API spend by model, input tokens, output tokens, request volume, cache usage, batch processing, audio share, and optional Google Search grounding. Use live AI Pricing Hub rows for planning, then verify production rates in Google's official Gemini API pricing table.
Quick answer · Pricing data refreshed 2026-03-13 12:45:29
Gemini API cost is usually driven by output length, audio input, cache strategy, and grounding.
For a text chat or extraction workload, the basic estimate is input tokens multiplied by the model input price plus output tokens multiplied by output price. Gemini workloads become harder to budget when you add audio, long context, Search grounding, or cached documents. This calculator keeps those knobs visible so you can model a realistic monthly bill before moving traffic to Google AI Studio, Vertex AI, OpenRouter, or another provider row.
Gemini API cost estimator
Model token spend for a monthly workload. The estimate covers model token cost plus optional grounding prompts; it does not include Cloud logging, storage, network, fine-tuning, or enterprise support charges.
Selected row
Gemini 2.5 Flash Lite Preview 09-2025
Choose a Gemini row to see the token formula.
How to use the Gemini API pricing calculator
1. Pick the model row
Choose the Gemini model and provider row that matches your deployment path, such as Google AI Studio, Vertex AI, OpenRouter, or a zero-price testing route.
2. Estimate tokens
Use request logs, the Gemini countTokens API, or a rough text estimate. Google's token guide says Gemini tokens are roughly four characters, but production requests should be measured.
3. Add real modifiers
Set cache share, batch share, audio share, and grounded prompt count only when your application actually uses those features. Leave them at zero for a plain chat estimate.
Live Gemini API pricing rows
Rows come from the AI Pricing Hub model database. Use official Google pricing as the source of truth before launch.
| Model | Provider | Input / 1M | Output / 1M | Cached input | Batch | Context | Modalities |
|---|---|---|---|---|---|---|---|
| Gemini 2.5 Flash Preview 09-2025 Free row | OpenRouter | Free | Free | - | - | 1.0M | image,file,text,audio,video → text |
| Gemini 2.5 Flash Image Preview (Nano Banana) Free row | OpenRouter | Free | Free | - | - | 33k | image,text → image,text |
| Gemini 2.5 Flash Lite Preview 09-2025 | Google AI Studio | $0.10 | $0.40 | - | - | 1.0M | text,image,file,audio,video → text |
| Gemini 2.5 Flash Lite | $0.10 | $0.40 | - | - | 1.0M | text,image,file,audio,video → text | |
| Gemini 2.5 Flash Preview 09-2025 Free row | Google AI Studio | $0.30 | $2.50 | - | - | 1.0M | image,file,text,audio,video → text |
| Gemini 2.5 Flash | $0.30 | $2.50 | - | - | 1.0M | file,image,text,audio,video → text | |
| Gemini 2.5 Flash Image Preview (Nano Banana) Free row | Google AI Studio | $0.30 | $2.50 | - | - | 33k | image,text → image,text |
| Gemini 2.5 Flash Image (Nano Banana) | Google AI Studio | $0.30 | $2.50 | - | - | 33k | image,text → image,text |
| Gemini 2.0 Flash Experimental (free) Free row | Free | Free | - | - | 1.0M | text,image → text | |
| Gemini 2.0 Flash Experimental (free) Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 1.5 Flash 8B Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 1.5 Flash 8B Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 1.5 Flash Experimental Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 1.5 Flash Experimental Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 1.5 Flash Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 1.5 Flash Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 2.0 Flash Lite | $0.07 | $0.30 | - | - | 1.0M | text,image,file,audio,video → text | |
| Gemini 2.0 Flash | Google AI Studio | $0.10 | $0.40 | - | - | 1.0M | text,image,file,audio,video → text |
| Gemini 3 Flash Preview | $0.50 | $3.00 | - | - | 1.0M | text,image,file,audio,video → text | |
| Gemini 3 Flash Preview | Google AI Studio | $0.50 | $3.00 | - | - | 1.0M | text,image,file,audio,video → text |
| Gemini 1.5 Pro Free row | OpenRouter | Free | Free | - | - | 2.0M | text,image → text |
| Gemini 1.5 Pro Free row | OpenRouter | Free | Free | - | - | 2.0M | text,image → text |
| Gemini 2.5 Pro Experimental Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image,file → text |
| Gemini 2.5 Pro Experimental Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image,file → text |
| Gemini 1.5 Pro Experimental Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 1.5 Pro Experimental Free row | OpenRouter | Free | Free | - | - | 1.0M | text,image → text |
| Gemini 2.5 Pro | $1.25 | $10.00 | - | - | 1.0M | text,image,file,audio,video → text | |
| Gemini 2.5 Pro Preview 06-05 | $1.25 | $10.00 | - | - | 1.0M | file,image,text,audio → text | |
| Gemini 2.5 Pro Preview 05-06 | $1.25 | $10.00 | - | - | 1.0M | text,image,file,audio,video → text | |
| Gemini 3 Pro Preview | $2.00 | $12.00 | - | - | 1.0M | text,image,file,audio,video → text | |
| Nano Banana Pro (Gemini 3 Pro Image Preview) | $2.00 | $12.00 | - | - | 66k | image,text → image,text | |
| Nano Banana Pro (Gemini 3 Pro Image Preview) | Google AI Studio | $2.00 | $12.00 | - | - | 66k | image,text → image,text |
Gemini pricing factors to check before production
| Cost factor | Why it matters | Planning rule |
|---|---|---|
| Output tokens | Generated text can be longer than the prompt and is often priced higher than input. | Track output length separately for chat, coding, extraction, and summarization workloads. |
| Audio input | Google's public Gemini tables commonly distinguish text/image/video input from audio input. | Use a separate audio share estimate instead of applying a pure text token budget to voice products. |
| Context caching | Caching can reduce repeated-context token cost, but explicit cache storage duration can add another line item. | Cache stable system prompts, manuals, policies, and tool definitions only when reuse is high enough. |
| Batch processing | Delayed workloads may qualify for lower batch token prices on supported providers. | Route offline evaluation, tagging, and nightly extraction jobs to batch instead of real-time endpoints. |
| Grounding | Google Search grounding can be priced per grounded prompt or search query after a free allowance. | Only ground prompts that need fresh web context; avoid grounding every chat turn by default. |
Example Gemini API cost scenarios
Support chatbot
A support bot using Gemini 2.5 Flash Lite Preview 09-2025 might average 1,200 input tokens and 450 output tokens per turn. The biggest budget risk is not the prompt; it is retries, long conversation history, and grounding every answer when only a fraction of turns need fresh web context.
Compare low-cost chat modelsDocument extraction
A document extraction workflow may send 20,000 input tokens and only 700 output tokens per request. For this pattern, input price, context window, cache hits, and batch eligibility usually matter more than headline output price.
See cheapest LLM API guideVoice or meeting analysis
Voice workloads should not copy a text-only estimate. Audio tokens, transcript length, summarization output, and whether you keep cached meeting context can move the bill even when request count is stable.
Browse Google AI modelsGrounded research assistant
Research assistants often need Search grounding, but not every step needs it. Split grounded prompts from normal reasoning prompts and give each one a separate usage target.
Use general AI cost calculatorAlternatives to compare with Gemini pricing
Gemini is often strong for multimodal and long-context work, but the cheapest production choice depends on quality, latency, output length, and provider routing.
| Alternative | Brand | Input / 1M | Output / 1M | Best comparison use |
|---|---|---|---|---|
| GTE-Base | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| E5-Base-v2 | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| paraphrase-MiniLM-L6-v2 | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| all-MiniLM-L12-v2 | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| bge-base-en-v1.5 | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| multi-qa-mpnet-base-dot-v1 | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| all-mpnet-base-v2 | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| all-MiniLM-L6-v2 | Other | $0.0050 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| Qwen3 Embedding 8B | Alibaba Qwen | $0.01 | Free | Baseline price and latency comparison for chat or extraction workloads. |
| Qwen3 Embedding 8B | Alibaba Qwen | $0.01 | Free | Baseline price and latency comparison for chat or extraction workloads. |
Limitations and billing notes
- The calculator uses per-1M token rows from this site's database. Official Google Gemini pricing remains the final billing reference.
- Free tier availability is not the same as unlimited free usage. Rate limits, regional availability, paid-tier setup, and feature restrictions can apply.
- Search grounding, image generation, live audio, embeddings, fine-tuning, cache storage, and cloud infrastructure may have separate pricing rules.
- Token estimates based on characters are useful for planning, but production budgets should use actual token counts from request logs or the API.
- Provider rows can differ. A Gemini model through Google AI Studio, Vertex AI, OpenRouter, or another provider may have different limits, prices, and terms.
Official references to verify
Use these sources before deploying a budget-sensitive Gemini workload.
- Gemini Developer API pricing for current model rates, free tier notes, caching, and grounding charges.
- Gemini token counting guide for token estimation and countTokens guidance.
- Gemini API rate limits for quota and traffic planning.