Gemini API Pricing Calculator for Google AI Costs

Estimate Gemini API spend by model, input tokens, output tokens, request volume, cache usage, batch processing, audio share, and optional Google Search grounding. Use live AI Pricing Hub rows for planning, then verify production rates in Google's official Gemini API pricing table.

Quick answer · Pricing data refreshed 2026-03-13 12:45:29

Gemini API cost is usually driven by output length, audio input, cache strategy, and grounding.

For a text chat or extraction workload, the basic estimate is input tokens multiplied by the model input price plus output tokens multiplied by output price. Gemini workloads become harder to budget when you add audio, long context, Search grounding, or cached documents. This calculator keeps those knobs visible so you can model a realistic monthly bill before moving traffic to Google AI Studio, Vertex AI, OpenRouter, or another provider row.

Tracked Gemini rows
32
18 free or zero-price rows found in the pricing database
Default calculator model
Gemini 2.5 Flash Lite Preview 09-2025
$0.10 input / $0.40 output
Lowest paid blended row
Gemini 2.0 Flash Lite
$0.38 combined per 1M

Gemini API cost estimator

Model token spend for a monthly workload. The estimate covers model token cost plus optional grounding prompts; it does not include Cloud logging, storage, network, fine-tuning, or enterprise support charges.

Percent of input tokens charged at a cache-read price when the row has one.
Percent of requests eligible for batch prices when available.
Use when audio tokens are priced above text/image/video input.
Official Gemini rows often price audio input higher than text input; adjust per model.
Only count prompts using Google Search grounding.
Google's public tables vary by model family; edit this value.
Monthly estimate $0.00
Per request $0.000000
Token cost $0.00
Grounding cost $0.00

Selected row

Gemini 2.5 Flash Lite Preview 09-2025

Choose a Gemini row to see the token formula.

How to use the Gemini API pricing calculator

1. Pick the model row

Choose the Gemini model and provider row that matches your deployment path, such as Google AI Studio, Vertex AI, OpenRouter, or a zero-price testing route.

2. Estimate tokens

Use request logs, the Gemini countTokens API, or a rough text estimate. Google's token guide says Gemini tokens are roughly four characters, but production requests should be measured.

3. Add real modifiers

Set cache share, batch share, audio share, and grounded prompt count only when your application actually uses those features. Leave them at zero for a plain chat estimate.

Live Gemini API pricing rows

Rows come from the AI Pricing Hub model database. Use official Google pricing as the source of truth before launch.

Model Provider Input / 1M Output / 1M Cached input Batch Context Modalities
Gemini 2.5 Flash Preview 09-2025 Free row OpenRouter Free Free - - 1.0M image,file,text,audio,video → text
Gemini 2.5 Flash Image Preview (Nano Banana) Free row OpenRouter Free Free - - 33k image,text → image,text
Gemini 2.5 Flash Lite Preview 09-2025 Google AI Studio $0.10 $0.40 - - 1.0M text,image,file,audio,video → text
Gemini 2.5 Flash Lite Google $0.10 $0.40 - - 1.0M text,image,file,audio,video → text
Gemini 2.5 Flash Preview 09-2025 Free row Google AI Studio $0.30 $2.50 - - 1.0M image,file,text,audio,video → text
Gemini 2.5 Flash Google $0.30 $2.50 - - 1.0M file,image,text,audio,video → text
Gemini 2.5 Flash Image Preview (Nano Banana) Free row Google AI Studio $0.30 $2.50 - - 33k image,text → image,text
Gemini 2.5 Flash Image (Nano Banana) Google AI Studio $0.30 $2.50 - - 33k image,text → image,text
Gemini 2.0 Flash Experimental (free) Free row Google Free Free - - 1.0M text,image → text
Gemini 2.0 Flash Experimental (free) Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 1.5 Flash 8B Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 1.5 Flash 8B Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 1.5 Flash Experimental Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 1.5 Flash Experimental Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 1.5 Flash Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 1.5 Flash Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 2.0 Flash Lite Google $0.07 $0.30 - - 1.0M text,image,file,audio,video → text
Gemini 2.0 Flash Google AI Studio $0.10 $0.40 - - 1.0M text,image,file,audio,video → text
Gemini 3 Flash Preview Google $0.50 $3.00 - - 1.0M text,image,file,audio,video → text
Gemini 3 Flash Preview Google AI Studio $0.50 $3.00 - - 1.0M text,image,file,audio,video → text
Gemini 1.5 Pro Free row OpenRouter Free Free - - 2.0M text,image → text
Gemini 1.5 Pro Free row OpenRouter Free Free - - 2.0M text,image → text
Gemini 2.5 Pro Experimental Free row OpenRouter Free Free - - 1.0M text,image,file → text
Gemini 2.5 Pro Experimental Free row OpenRouter Free Free - - 1.0M text,image,file → text
Gemini 1.5 Pro Experimental Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 1.5 Pro Experimental Free row OpenRouter Free Free - - 1.0M text,image → text
Gemini 2.5 Pro Google $1.25 $10.00 - - 1.0M text,image,file,audio,video → text
Gemini 2.5 Pro Preview 06-05 Google $1.25 $10.00 - - 1.0M file,image,text,audio → text
Gemini 2.5 Pro Preview 05-06 Google $1.25 $10.00 - - 1.0M text,image,file,audio,video → text
Gemini 3 Pro Preview Google $2.00 $12.00 - - 1.0M text,image,file,audio,video → text
Nano Banana Pro (Gemini 3 Pro Image Preview) Google $2.00 $12.00 - - 66k image,text → image,text
Nano Banana Pro (Gemini 3 Pro Image Preview) Google AI Studio $2.00 $12.00 - - 66k image,text → image,text

Gemini pricing factors to check before production

Cost factor Why it matters Planning rule
Output tokens Generated text can be longer than the prompt and is often priced higher than input. Track output length separately for chat, coding, extraction, and summarization workloads.
Audio input Google's public Gemini tables commonly distinguish text/image/video input from audio input. Use a separate audio share estimate instead of applying a pure text token budget to voice products.
Context caching Caching can reduce repeated-context token cost, but explicit cache storage duration can add another line item. Cache stable system prompts, manuals, policies, and tool definitions only when reuse is high enough.
Batch processing Delayed workloads may qualify for lower batch token prices on supported providers. Route offline evaluation, tagging, and nightly extraction jobs to batch instead of real-time endpoints.
Grounding Google Search grounding can be priced per grounded prompt or search query after a free allowance. Only ground prompts that need fresh web context; avoid grounding every chat turn by default.

Example Gemini API cost scenarios

Support chatbot

A support bot using Gemini 2.5 Flash Lite Preview 09-2025 might average 1,200 input tokens and 450 output tokens per turn. The biggest budget risk is not the prompt; it is retries, long conversation history, and grounding every answer when only a fraction of turns need fresh web context.

Compare low-cost chat models

Document extraction

A document extraction workflow may send 20,000 input tokens and only 700 output tokens per request. For this pattern, input price, context window, cache hits, and batch eligibility usually matter more than headline output price.

See cheapest LLM API guide

Voice or meeting analysis

Voice workloads should not copy a text-only estimate. Audio tokens, transcript length, summarization output, and whether you keep cached meeting context can move the bill even when request count is stable.

Browse Google AI models

Grounded research assistant

Research assistants often need Search grounding, but not every step needs it. Split grounded prompts from normal reasoning prompts and give each one a separate usage target.

Use general AI cost calculator

Alternatives to compare with Gemini pricing

Gemini is often strong for multimodal and long-context work, but the cheapest production choice depends on quality, latency, output length, and provider routing.

Alternative Brand Input / 1M Output / 1M Best comparison use
GTE-Base Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
E5-Base-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
paraphrase-MiniLM-L6-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
all-MiniLM-L12-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
bge-base-en-v1.5 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
multi-qa-mpnet-base-dot-v1 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
all-mpnet-base-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
all-MiniLM-L6-v2 Other $0.0050 Free Baseline price and latency comparison for chat or extraction workloads.
Qwen3 Embedding 8B Alibaba Qwen $0.01 Free Baseline price and latency comparison for chat or extraction workloads.
Qwen3 Embedding 8B Alibaba Qwen $0.01 Free Baseline price and latency comparison for chat or extraction workloads.

Limitations and billing notes

  • The calculator uses per-1M token rows from this site's database. Official Google Gemini pricing remains the final billing reference.
  • Free tier availability is not the same as unlimited free usage. Rate limits, regional availability, paid-tier setup, and feature restrictions can apply.
  • Search grounding, image generation, live audio, embeddings, fine-tuning, cache storage, and cloud infrastructure may have separate pricing rules.
  • Token estimates based on characters are useful for planning, but production budgets should use actual token counts from request logs or the API.
  • Provider rows can differ. A Gemini model through Google AI Studio, Vertex AI, OpenRouter, or another provider may have different limits, prices, and terms.

Official references to verify

Use these sources before deploying a budget-sensitive Gemini workload.

Gemini API pricing FAQ

Multiply input tokens by the model input price, output tokens by the output price, and then add feature-specific costs such as Search grounding or cache storage when used. For monthly spend, multiply the per-request estimate by expected request volume.

Some Gemini models or tiers may include free usage, but free tier access is constrained by rate limits, region, product tier, and feature availability. Treat free rows as testing capacity, not as a production budget guarantee.

Google pricing tables often separate text/image/video input from audio input. If your app processes calls, meetings, or voice notes, add an audio input share instead of using a text-only estimate.

Caching helps when many requests reuse the same long prompt, document, policy, or tool context. It is less useful for one-off prompts, and explicit cache storage can add cost if the cached context is large or kept for too long.

Use batch pricing for offline work that can wait, such as nightly extraction, evaluation, tagging, or backfills. Do not route latency-sensitive chat turns through batch just to reduce token cost.

It depends on the task. Gemini Flash rows can be cost-effective for high-volume and multimodal workloads, but OpenAI, Claude, GPT-OSS, or other models may be cheaper after quality, retries, output length, and routing constraints are included.