OpenAI API Pricing 2026: Every Model & Token Cost Guide

TL;DR: OpenAI now offers 12+ API models ranging from $0.10 to $150 per million tokens. GPT-5.4 (the newest, March 2026) costs $2.50/$15 per 1M tokens with a 1.05M context window. GPT-5 remains the best value flagship at $1.25/$10. GPT-4.1 Nano is the cheapest capable model at $0.10/$0.40. Batch API saves 50%, prompt caching saves 75-90%, and combining both can reduce your total bill by over 90%. Full pricing table and cost optimization strategies below.

OpenAI’s API pricing has changed significantly since 2025. With GPT-5.4 launching on March 5, 2026, the model lineup now spans four families, multiple context window sizes, and several discount tiers that most developers don’t fully use.

This guide covers every current model’s pricing, the discount mechanisms that actually save money in production, and a decision framework for picking the right model for your use case.

All prices are verified against OpenAI’s official pricing page as of March 2026.

Complete OpenAI API Pricing Table (March 2026)

Flagship Models

Model	Input (per 1M tokens)	Cached Input	Output (per 1M tokens)	Context Window
GPT-5.4	$2.50	$0.25	$15.00	1.05M (922K in / 128K out)
GPT-5.4 Pro	$15.00	$1.50	$60.00	1.05M
GPT-5.2	$1.75	$0.175	$14.00	1.05M
GPT-5.2 Pro	$21.00	$2.10	$168.00	1.05M
GPT-5	$1.25	$0.125	$10.00	400K

Mid-Tier Models

Model	Input (per 1M tokens)	Cached Input	Output (per 1M tokens)	Context Window
GPT-4.1	$2.00	$0.50	$8.00	1M
GPT-4o	$2.50	$1.25	$10.00	128K
o3	$2.00	$0.50	$8.00	200K
o4-mini	$1.10	$0.275	$4.40	200K

Budget Models

Model	Input (per 1M tokens)	Cached Input	Output (per 1M tokens)	Context Window
GPT-5 Mini	$0.25	$0.025	$2.00	128K
GPT-4.1 Mini	$0.40	$0.10	$1.60	1M
GPT-4o Mini	$0.15	$0.075	$0.60	128K
GPT-4.1 Nano	$0.10	$0.025	$0.40	1M

Specialized Models

Model	Pricing	Notes
gpt-image-1.5	~$0.01-$0.17 per image	Varies by quality (low/medium/high) and size
Whisper (speech-to-text)	$0.006 per minute
TTS (text-to-speech)	$15.00 per 1M characters
TTS HD	$30.00 per 1M characters
text-embedding-3-small	$0.02 per 1M tokens
text-embedding-3-large	$0.13 per 1M tokens

Important pricing note for GPT-5.4 and GPT-5.2: Prompts exceeding 272K input tokens are billed at 2x input and 1.5x output for the entire session. Keep your prompts under this threshold to avoid the surcharge.

GPT-5.4: What’s New and What It Costs

GPT-5.4 launched March 5, 2026 and is now OpenAI’s most capable model. At $2.50 per million input tokens and $15.00 per million output tokens, it’s more expensive than GPT-5 but brings several features that justify the premium for specific use cases.

Key capabilities:

Native computer use (browse, click, type — built into the API via computer_use_preview tool)
1.05M token context window (the largest OpenAI has offered commercially)
Tool Search for agent-heavy workflows (47% fewer tokens when using many tools)
Reasoning effort control (none, low, medium, high, xhigh)
33% fewer false claims compared to GPT-5.2

When to use GPT-5.4 vs GPT-5:

Use GPT-5.4 when you need computer use, the 1M+ context window, or tool search in agentic workflows. Use GPT-5 ($1.25/$10) for everything else — it’s 50% cheaper on input and 33% cheaper on output, and still handles coding, generation, and reasoning extremely well.

Model string for API calls:

gpt-5.4          # Standard
gpt-5.4-pro      # Maximum capability

How OpenAI Discount Tiers Work

Most developers leave money on the table by not using OpenAI’s built-in discount mechanisms. Here’s how each one works:

1. Prompt Caching (75-90% Off Input Tokens)

When you send the same prompt prefix across multiple requests, OpenAI automatically caches it and charges a reduced rate on subsequent calls.

Model Family	Cache Discount
GPT-5 family	90% off cached input
GPT-4.1 family	75% off cached input
GPT-4o family	50% off cached input

This is automatic — you don’t need to enable it. If your application uses a consistent system prompt, few-shot examples, or document context, you’re already saving on repeat calls.

Real example: GPT-5 drops from $1.25 to $0.125 per million cached tokens. That’s cheaper than GPT-4.1 Nano’s standard rate.

2. Batch API (50% Off Everything)

The Batch API processes requests asynchronously and returns results within 24 hours. The tradeoff is latency, but the reward is a flat 50% discount on all token costs — input and output — across every model.

Model	Standard	Batch
GPT-5	$1.25 / $10.00	$0.625 / $5.00
GPT-4.1	$2.00 / $8.00	$1.00 / $4.00
GPT-5 Mini	$0.25 / $2.00	$0.125 / $1.00

Best for: content generation, data processing, bulk analysis, nightly report generation — anything that doesn’t need real-time responses.

3. Flex Processing (Variable Discount)

For requests that aren’t time-sensitive but need faster turnaround than Batch, Flex processing offers lower prices with higher latency. It’s a middle ground between standard and batch pricing.

4. Stacking Discounts (The 90%+ Strategy)

You can combine batch pricing with prompt caching:

GPT-4.1 with caching + batch:

Cached input: $0.25 per 1M tokens (87.5% off standard)
Output: $4.00 per 1M tokens (50% off standard)

GPT-5 with caching + batch:

Cached input: $0.0625 per 1M tokens (95% off standard)
Output: $5.00 per 1M tokens (50% off standard)

This is how production systems processing millions of tokens daily keep costs under control.

How to Pick the Right Model

Here’s a straightforward decision framework based on what you’re actually building:

High-quality chat, content, or code generation → GPT-5 ($1.25/$10) The best balance of quality and cost. Handles text, vision, structured output, and function calling. Start here if you’re unsure.

Processing large documents, codebases, or datasets → GPT-4.1 ($2/$8) The 1M token context window is its defining feature. Use it when your input data won’t fit in other models.

Complex reasoning, math, multi-step logic → o3 ($2/$8) Chain-of-thought reasoning model. Excels at problems that need internal deliberation before responding.

Maximum capability, hardest problems → GPT-5.4 ($2.50/$15) Computer use, tool search, 1.05M context. Use for agent workflows and tasks requiring frontier performance.

High-volume classification or extraction → GPT-4.1 Nano ($0.10/$0.40) The cheapest capable model. Good for routing, tagging, entity extraction, and any task where you need millions of calls per day.

Lightweight chat or customer support → GPT-5 Mini ($0.25/$2) Strong quality at 5x less than GPT-5. Good enough for most customer-facing chat applications.

Real-World Cost Examples

Example 1: Customer Support Chatbot (10,000 conversations/day)

Average conversation: 800 input tokens, 400 output tokens.

Model	Monthly Cost
GPT-5	$150 input + $120 output = $270/month
GPT-5 Mini	$30 input + $24 output = $54/month
GPT-5 Mini + Caching	$3 input + $24 output = $27/month

Example 2: RAG Pipeline Processing 1M Queries/Month

Average query: 2,000 input tokens (including retrieved context), 500 output tokens.

Model	Monthly Cost
GPT-4.1	$4,000 input + $4,000 output = $8,000/month
GPT-4.1 + Caching	$1,000 input + $4,000 output = $5,000/month
GPT-4.1 + Caching + Batch	$500 input + $2,000 output = $2,500/month

Example 3: Code Review Agent (500 PRs/day)

Average PR: 15,000 input tokens (code diff + context), 2,000 output tokens.

Model	Monthly Cost
GPT-5.4	$562 input + $450 output = $1,012/month
GPT-5	$281 input + $300 output = $581/month
o3	$450 input + $240 output = $690/month

OpenAI vs Competitors: Quick Price Comparison

How does OpenAI stack up against the alternatives as of March 2026?

Model	Input (per 1M)	Output (per 1M)	Context	Best For
GPT-5	$1.25	$10.00	400K	General purpose
GPT-5.4	$2.50	$15.00	1.05M	Agents, computer use
Claude Opus 4.6	$5.00	$25.00	1M	Nuanced writing, coding
Claude Sonnet 4.6	$3.00	$15.00	200K	Balanced quality/cost
Gemini 3.1 Pro	$2.00	$12.00	1M	Multimodal, search integration
Gemini 3 Flash	$0.50	$3.00	1M	Budget multimodal
DeepSeek V4	$0.27	$1.10	128K	Budget general purpose
Grok 4.1	$0.20	$0.50	128K	Cheapest option

Key takeaways:

GPT-5 and Gemini 3.1 Pro are closely price-matched. Google’s advantage is the 1M context window; OpenAI’s is the broader tool ecosystem.
Claude models cost 2-4x more per token but have strong advantages in instruction following and creative tasks.
DeepSeek and Grok are significantly cheaper but have smaller context windows and fewer built-in tools.
For pure cost optimization, GPT-5 at batch pricing ($0.625/$5) beats standard Claude Sonnet 4.6 pricing ($3/$15) while delivering comparable quality.

Rate Limits by Tier

Every OpenAI API account has usage limits based on how much you’ve spent on the platform:

Tier	Qualification	GPT-5 RPM	GPT-5 TPM
Free	Sign up	3	40,000
Tier 1	$5 spend	500	200,000
Tier 2	$50 spend	5,000	2,000,000
Tier 3	$100 spend	5,000	4,000,000
Tier 4	$250 spend	10,000	10,000,000
Tier 5	$1,000 spend	10,000	30,000,000

RPM = requests per minute. TPM = tokens per minute. Most production apps need Tier 2+ to avoid throttling.

Built-in Tool Pricing

OpenAI’s built-in tools are billed separately from model tokens:

Tool	Pricing
Web Search	Per-call fee (varies by model) + search content tokens at model’s input rate
Code Interpreter	Included in token costs (no separate fee)
File Search	$0.10/GB storage/day + token costs
Image Generation	$0.01-$0.17 per image (varies by quality/size)
Computer Use	Token costs only (no separate tool fee)

For web search specifically: tool calls are billed per 1,000 calls based on the model and tool version. Search content tokens (the data retrieved from the web) are billed at the model’s input token rate.

7 Ways to Cut Your OpenAI API Bill

Use prompt caching — If your system prompt is consistent, you’re automatically saving 75-90% on those tokens. Structure your prompts with the static context first.
Route to cheaper models — Send simple tasks to GPT-4.1 Nano ($0.10/$0.40) and only escalate to GPT-5 when needed. A waterfall approach handles 70-80% of traffic at a fraction of the cost.
Use the Batch API — For anything that doesn’t need real-time results, batch processing cuts costs by 50% across the board.
Control output length — Output tokens cost 4-8x more than input tokens. Asking for concise responses and setting max_tokens limits directly reduces your biggest cost driver.
Use reasoning effort levels — GPT-5.4 and GPT-5.2 support reasoning.effort parameter. Set it to none or low for simple tasks instead of defaulting to full reasoning.
Trim your input context — Don’t send entire documents when a relevant excerpt will do. RAG pipelines that retrieve focused chunks save significantly over stuffing the full context window.
Monitor and set alerts — Use OpenAI’s usage dashboard and billing APIs to track token consumption. Set spending limits to avoid surprises.

ChatGPT Subscription Plans (Not API)

If you’re using ChatGPT directly (not the API), here are the current subscription options:

Plan	Price	What You Get
Free	$0	Access to GPT-4o Mini, basic features, limited usage
Plus	$20/month	GPT-5, DALL·E, browsing, advanced data analysis
Pro	$200/month	Maximum usage limits, priority access, o3 Pro mode
Team	$25-30/user/month	Workspace, admin controls, higher limits
Enterprise	Custom pricing	SSO, audit logs, unlimited access, no data training

Key Changes from 2025 to 2026

If you’re updating from last year’s pricing, here’s what changed:

GPT-5.4 launched (March 2026) — New flagship with computer use and 1.05M context at $2.50/$15
GPT-5.2 price stable — Still $1.75/$14, but now marked as “previous frontier” since GPT-5.4
GPT-5 remains best value — At $1.25/$10 with 90% cache discount, it’s the production workhorse
Container pricing change — Starting March 31, 2026, containers are billed per 20-minute session
Regional processing surcharge — GPT-5.4 has a 10% uplift for data residency endpoints
Free tier access expanded — GPT-5 Mini now available on the free tier (with strict rate limits)

Bottom Line

OpenAI’s 2026 pricing gives developers more options than ever. The key insight is that model selection combined with discount stacking matters more than raw per-token rates.

A well-optimized GPT-5 setup using prompt caching and batch processing can cost less than $0.10 per million input tokens — cheaper than almost any competitor’s standard pricing, including open-source API providers.

Start with GPT-5 for most use cases. Upgrade to GPT-5.4 only when you need computer use, the 1.05M context window, or tool search. Route simple tasks to GPT-4.1 Nano. And always use caching.

Categorized in:

ChatGPT

Tagged in:

OpenAI API Pricing 2026: Every Model, Per-Token Cost & How to Cut Your Bill by 90%

Complete OpenAI API Pricing Table (March 2026)

Flagship Models

Mid-Tier Models

Budget Models

Specialized Models

GPT-5.4: What’s New and What It Costs

How OpenAI Discount Tiers Work

1. Prompt Caching (75-90% Off Input Tokens)

2. Batch API (50% Off Everything)

3. Flex Processing (Variable Discount)

4. Stacking Discounts (The 90%+ Strategy)

How to Pick the Right Model

Real-World Cost Examples

Example 1: Customer Support Chatbot (10,000 conversations/day)

Example 2: RAG Pipeline Processing 1M Queries/Month

Example 3: Code Review Agent (500 PRs/day)

OpenAI vs Competitors: Quick Price Comparison

Rate Limits by Tier

Built-in Tool Pricing

7 Ways to Cut Your OpenAI API Bill

ChatGPT Subscription Plans (Not API)

Key Changes from 2025 to 2026

Bottom Line

Comments

Leave a Reply Cancel reply

Previous Article

GPT-5.4 Developer Guide: Computer Use, 1M Context Window & API Migration

GPT-5.4 Developer Guide: Computer Use, 1M Context Window & API Migration

Claude Code vs Cursor vs Windsurf vs Antigravity: Which AI Coding Tool Should You Use in 2026?

How to Use Claude Code: Boris Cherny (Creator of Claude Code) Shares His Exact Workflow

What I Learned Writing 500K+ Lines with Claude Code (90 Days)

Press ESC to close

Or check our Popular Categories...

Complete OpenAI API Pricing Table (March 2026)

Flagship Models

Mid-Tier Models

Budget Models

Specialized Models

GPT-5.4: What’s New and What It Costs

How OpenAI Discount Tiers Work

1. Prompt Caching (75-90% Off Input Tokens)

2. Batch API (50% Off Everything)

3. Flex Processing (Variable Discount)

4. Stacking Discounts (The 90%+ Strategy)

How to Pick the Right Model

Real-World Cost Examples

Example 1: Customer Support Chatbot (10,000 conversations/day)

Example 2: RAG Pipeline Processing 1M Queries/Month

Example 3: Code Review Agent (500 PRs/day)

OpenAI vs Competitors: Quick Price Comparison

Rate Limits by Tier

Built-in Tool Pricing

7 Ways to Cut Your OpenAI API Bill

ChatGPT Subscription Plans (Not API)

Key Changes from 2025 to 2026

Bottom Line

Comments

Leave a Reply Cancel reply

Related Articles

Previous Article