Major LLM API Pricing Comparison [July 2026]
We reference each provider's official pricing page directly and list only figures that could be re-verified from that page at the time of writing.
Models or figures we could not verify are omitted from the tables; please refer to the official pricing page instead.
All prices are in USD / 1M tokens. Use our Token Cost Calculator to estimate costs from your own token counts.
OpenAI
📋 Conditions: Standard processing rate for context length under 270K tokens
| Model | Input | Cached Input | Output | Notes |
|---|---|---|---|---|
| GPT-5.5 New | $5.00 | $0.50 | $30.00 | Flagship model. Standard rate for context length under 270K |
| GPT-5.4 | $2.50 | $0.25 | $15.00 | Balanced model. Standard rate for context length under 270K |
| GPT-5.4 mini | $0.75 | $0.075 | $4.50 | Lightweight, low-cost tier. Standard rate for context length under 270K |
For other models (GPT-5 nano, GPT-5.4 Pro, etc.), see the official pricing page.
Primary source: developers.openai.com/api/docs/pricing / Screenshot: captured 2026-07-03 (retrieved 2026-07-03 JST. Web-search re-verification was not possible due to bot protection; the screenshot-fact exception rule approved by okamo applies)
Anthropic (Claude)
📋 Conditions: Standard API (direct first-party API) / text input / standard rate without cache
| Model | Input (Base Input) |
Cache Write 5m | Cache Hit | Output | Notes |
|---|---|---|---|---|---|
| Claude Fable 5 New | $10.00 | $12.50 | $1.00 | $50.00 | Released 2026-06-09. Uses a new tokenizer (see note below). Flat standard rate across the full 1M-token context (no long-context premium) |
| Claude Sonnet 5 New |
$2.00 until 2026-08-31 $3.00 (from 2026-09-01) |
$2.50 $3.75 (from 2026-09-01) |
$0.20 $0.30 (from 2026-09-01) |
$10.00 until 2026-08-31 $15.00 (from 2026-09-01) |
Released 2026-06-30. Opus 4.7+ Opus models, Fable 5, Mythos 5, Mythos Preview and Sonnet 5 use a new tokenizer. The same text can yield roughly 1.0–1.35x the token count depending on content type |
| Claude Opus 4.8 | $5.00 | $6.25 | $0.50 | $25.00 | Top-tier flagship model |
| Claude Sonnet 4.6 | $3.00 | $3.75 | $0.30 | $15.00 | Balanced model. Flat rate across the full 1M context |
| Claude Haiku 4.5 | $1.00 | $1.25 | $0.10 | $5.00 | For high-volume processing and classification tasks |
For other models (Opus 4.7 and earlier, Haiku 3.5, Mythos 5, etc.), see the official pricing page.
Primary source:
platform.claude.com/docs/en/about-claude/pricing
(retrieved 2026-07-03 JST)
※ Fable 5 cache pricing matches Anthropic's published multipliers relative to base input (5m write 1.25x, cache hit 0.1x)
Official notes we could directly confirm
- Batch API (Message Batches API) gives a 50% discount across all models
- Cache write (5 min) = 1.25x standard input; cache write (1 hour) = 2x
- Cache hit = 0.1x standard input (90% discount)
- Fable 5, Mythos 5, Opus 4.8, Sonnet 5 and Sonnet 4.6 use a flat rate across the full 1M-token context (no long-context premium)
- Opus 4.7+ Opus models, Fable 5, Mythos 5, Mythos Preview and Sonnet 5 use a new tokenizer. The same text can yield roughly 1.0–1.35x the token count depending on content type (per platform.claude.com/docs/en/about-claude/pricing and anthropic.com/news/claude-opus-4-7)
- US-only data residency adds a 1.1x multiplier to standard pricing
Google (Gemini Developer API)
📋 Conditions: Gemini Developer API paid tier / Standard tier / text, image and video input / standard API
| Model | Input (text/image/video) |
Input (prompt >200K) |
Output | Output (prompt >200K) |
Notes |
|---|---|---|---|---|---|
| Gemini 3.5 Flash New | $1.50 | — | $9.00 | — | GA since May 2026. Built for agentic/coding use cases. Context cache read $0.15 |
| Gemini 3.1 Pro Preview Preview | $2.00 | $4.00 | $12.00 | $18.00 | Preview pricing, subject to change |
| Gemini 3.1 Flash-Lite | $0.25 | — | $1.50 | — | Audio input $0.50/1M. Cost-focused fast model |
| Gemini 2.5 Pro | $1.25 | $2.50 | $10.00 | $15.00 | |
| Gemini 2.5 Flash | $0.30 | — | $2.50 | — | Audio input $1.00/1M |
| Gemini 2.5 Flash-Lite | $0.10 | — | $0.40 | — | Audio input $0.30/1M |
For other models (Gemini 3 Flash, etc.), see the official pricing page.
Primary source: ai.google.dev/gemini-api/docs/pricing / Screenshot: captured 2026-07-03 (retrieved 2026-07-04 JST). Preview pricing, free-tier terms and per-modality rates may change — always check the official page before contracting.
Official notes we could directly confirm
- Gemini 3.5 Flash reached GA in May 2026. Reported to outperform 3.1 Pro on some coding/agentic benchmarks
- Gemini 3.1 Pro Preview is a preview model; pricing and specs may change
- Batch API usage gives roughly a 50% discount (all models)
- Context caching has separate read pricing per model (see official pricing page)
- Audio input is priced higher than text/image/video input
- On paid tiers, prompts are not used to improve Google's products (free tier prompts may be used)
- Gemini 2.0 Flash-Lite was retired on June 1, 2026 (per official page)
DeepSeek
📋 Conditions: DeepSeek API standard pricing (current promotional pricing). Input rate differs between cache miss (first pass) and cache hit (reuse)
| Model | Input (cache miss) |
Input (cache hit) |
Output | Notes |
|---|---|---|---|---|
| DeepSeek V4 Pro New | $0.435 | $0.003625 | $0.87 | 1M context, max output 384K. Current promotional pricing (standard pricing $1.74 / $0.0145 / $3.48) |
| DeepSeek V4 Flash | $0.14 | $0.0028 | $0.28 | 1M context, max output 384K. Lightweight, high-speed tier |
For setup steps, real-world costs and a Claude Sonnet comparison in VS Code GitHub Copilot Chat, see our DeepSeek V4 Pro practical guide (Japanese).
Primary source: api-docs.deepseek.com/quick_start/pricing (re-verified via web search, retrieved 2026-07-05 JST)
DeepSeek V4 Pro — Sonnet-tier benchmarks at a fraction of the price
Per the official pricing page (api-docs.deepseek.com/quick_start/pricing, retrieved 2026-07-05 JST), input is $0.435 / output is $0.87 per 1M tokens (current promotional pricing; standard pricing is $1.74 / $3.48). Our operator (okamoちゃんねる) has used it directly in VS Code GitHub Copilot Chat and reports that for text-centric coding tasks it feels close to Claude Sonnet in practice, at a much lower cost. It does not support Vision (image input), which is where Sonnet still has the edge. See our DeepSeek V4 Pro practical guide (Japanese) for setup steps and measured data, or use the Token Calculator to estimate cost differences vs. other models.
💡 Cost-Saving Basics
Three fundamental strategies to significantly reduce API costs with proper optimization
⚡ Batch API (asynchronous)
Up to 50% OFF
For tasks that don't need a real-time response (batch evaluation, data processing, summarization), the Batch API gives roughly 50% off across all models. Confirmed on official pages for both Anthropic and Google.
🗃️ Prompt caching
Up to 90% off cached input reads
Putting shared system prompts or documents into cache substantially discounts input cost on repeated calls. Anthropic offers up to 90% off (confirmed on official page). Especially effective for use cases with a long fixed prefix.
🔀 Model routing
Match model tier to task difficulty
Route simple classification/extraction tasks to lightweight models (Haiku 4.5, Flash-Lite, GPT-5.4 mini) and complex reasoning to flagship models to keep quality high at lower cost. Use the Token Calculator to estimate the cost difference.
⚠️ Disclaimer
- Pricing on this page references each provider's official page directly and lists only figures we could re-verify (retrieved: Anthropic 2026-07-03 JST / Google 2026-07-04 JST / OpenAI 2026-07-03 JST via screenshot / DeepSeek 2026-07-05 JST).
- Pricing may change without notice. Always check the official pricing page before contracting or production use.
- Preview, limited-time pricing and beta features may change or be discontinued.
- This site is for informational purposes only and does not endorse or represent any specific provider.