TL;DR: OpenAI now offers 12+ API models ranging from $0.10 to $150 per million tokens. GPT-5.4 (the newest, March 2026) costs $2.50/$15 per 1M tokens with a 1.05M context window. GPT-5 remains the best value flagship at $1.25/$10. GPT-4.1 Nano is the cheapest capable model at $0.10/$0.40. Batch API saves 50%, prompt caching saves 75-90%, and combining both can reduce your total bill by over 90%. Full pricing table and cost optimization strategies below.
OpenAI’s API pricing has changed significantly since 2025. With GPT-5.4 launching on March 5, 2026, the model lineup now spans four families, multiple context window sizes, and several discount tiers that most developers don’t fully use.
This guide covers every current model’s pricing, the discount mechanisms that actually save money in production, and a decision framework for picking the right model for your use case.
All prices are verified against OpenAI’s official pricing page as of March 2026.
Complete OpenAI API Pricing Table (March 2026)
Flagship Models
| Model | Input (per 1M tokens) | Cached Input | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| GPT-5.4 | $2.50 | $0.25 | $15.00 | 1.05M (922K in / 128K out) |
| GPT-5.4 Pro | $15.00 | $1.50 | $60.00 | 1.05M |
| GPT-5.2 | $1.75 | $0.175 | $14.00 | 1.05M |
| GPT-5.2 Pro | $21.00 | $2.10 | $168.00 | 1.05M |
| GPT-5 | $1.25 | $0.125 | $10.00 | 400K |
Mid-Tier Models
| Model | Input (per 1M tokens) | Cached Input | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| GPT-4.1 | $2.00 | $0.50 | $8.00 | 1M |
| GPT-4o | $2.50 | $1.25 | $10.00 | 128K |
| o3 | $2.00 | $0.50 | $8.00 | 200K |
| o4-mini | $1.10 | $0.275 | $4.40 | 200K |
Budget Models
| Model | Input (per 1M tokens) | Cached Input | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| GPT-5 Mini | $0.25 | $0.025 | $2.00 | 128K |
| GPT-4.1 Mini | $0.40 | $0.10 | $1.60 | 1M |
| GPT-4o Mini | $0.15 | $0.075 | $0.60 | 128K |
| GPT-4.1 Nano | $0.10 | $0.025 | $0.40 | 1M |
Specialized Models
| Model | Pricing | Notes |
|---|---|---|
| gpt-image-1.5 | ~$0.01-$0.17 per image | Varies by quality (low/medium/high) and size |
| Whisper (speech-to-text) | $0.006 per minute | |
| TTS (text-to-speech) | $15.00 per 1M characters | |
| TTS HD | $30.00 per 1M characters | |
| text-embedding-3-small | $0.02 per 1M tokens | |
| text-embedding-3-large | $0.13 per 1M tokens |
Important pricing note for GPT-5.4 and GPT-5.2: Prompts exceeding 272K input tokens are billed at 2x input and 1.5x output for the entire session. Keep your prompts under this threshold to avoid the surcharge.
GPT-5.4: What’s New and What It Costs
GPT-5.4 launched March 5, 2026 and is now OpenAI’s most capable model. At $2.50 per million input tokens and $15.00 per million output tokens, it’s more expensive than GPT-5 but brings several features that justify the premium for specific use cases.
Key capabilities:
- Native computer use (browse, click, type — built into the API via
computer_use_previewtool) - 1.05M token context window (the largest OpenAI has offered commercially)
- Tool Search for agent-heavy workflows (47% fewer tokens when using many tools)
- Reasoning effort control (
none,low,medium,high,xhigh) - 33% fewer false claims compared to GPT-5.2
When to use GPT-5.4 vs GPT-5:
Use GPT-5.4 when you need computer use, the 1M+ context window, or tool search in agentic workflows. Use GPT-5 ($1.25/$10) for everything else — it’s 50% cheaper on input and 33% cheaper on output, and still handles coding, generation, and reasoning extremely well.
Model string for API calls:
gpt-5.4 # Standard
gpt-5.4-pro # Maximum capability
How OpenAI Discount Tiers Work
Most developers leave money on the table by not using OpenAI’s built-in discount mechanisms. Here’s how each one works:
1. Prompt Caching (75-90% Off Input Tokens)
When you send the same prompt prefix across multiple requests, OpenAI automatically caches it and charges a reduced rate on subsequent calls.
| Model Family | Cache Discount |
|---|---|
| GPT-5 family | 90% off cached input |
| GPT-4.1 family | 75% off cached input |
| GPT-4o family | 50% off cached input |
This is automatic — you don’t need to enable it. If your application uses a consistent system prompt, few-shot examples, or document context, you’re already saving on repeat calls.
Real example: GPT-5 drops from $1.25 to $0.125 per million cached tokens. That’s cheaper than GPT-4.1 Nano’s standard rate.
2. Batch API (50% Off Everything)
The Batch API processes requests asynchronously and returns results within 24 hours. The tradeoff is latency, but the reward is a flat 50% discount on all token costs — input and output — across every model.
| Model | Standard | Batch |
|---|---|---|
| GPT-5 | $1.25 / $10.00 | $0.625 / $5.00 |
| GPT-4.1 | $2.00 / $8.00 | $1.00 / $4.00 |
| GPT-5 Mini | $0.25 / $2.00 | $0.125 / $1.00 |
Best for: content generation, data processing, bulk analysis, nightly report generation — anything that doesn’t need real-time responses.
3. Flex Processing (Variable Discount)
For requests that aren’t time-sensitive but need faster turnaround than Batch, Flex processing offers lower prices with higher latency. It’s a middle ground between standard and batch pricing.
4. Stacking Discounts (The 90%+ Strategy)
You can combine batch pricing with prompt caching:
GPT-4.1 with caching + batch:
- Cached input: $0.25 per 1M tokens (87.5% off standard)
- Output: $4.00 per 1M tokens (50% off standard)
GPT-5 with caching + batch:
- Cached input: $0.0625 per 1M tokens (95% off standard)
- Output: $5.00 per 1M tokens (50% off standard)
This is how production systems processing millions of tokens daily keep costs under control.
How to Pick the Right Model
Here’s a straightforward decision framework based on what you’re actually building:
High-quality chat, content, or code generation → GPT-5 ($1.25/$10) The best balance of quality and cost. Handles text, vision, structured output, and function calling. Start here if you’re unsure.
Processing large documents, codebases, or datasets → GPT-4.1 ($2/$8) The 1M token context window is its defining feature. Use it when your input data won’t fit in other models.
Complex reasoning, math, multi-step logic → o3 ($2/$8) Chain-of-thought reasoning model. Excels at problems that need internal deliberation before responding.
Maximum capability, hardest problems → GPT-5.4 ($2.50/$15) Computer use, tool search, 1.05M context. Use for agent workflows and tasks requiring frontier performance.
High-volume classification or extraction → GPT-4.1 Nano ($0.10/$0.40) The cheapest capable model. Good for routing, tagging, entity extraction, and any task where you need millions of calls per day.
Lightweight chat or customer support → GPT-5 Mini ($0.25/$2) Strong quality at 5x less than GPT-5. Good enough for most customer-facing chat applications.
Real-World Cost Examples
Example 1: Customer Support Chatbot (10,000 conversations/day)
Average conversation: 800 input tokens, 400 output tokens.
| Model | Monthly Cost |
|---|---|
| GPT-5 | $150 input + $120 output = $270/month |
| GPT-5 Mini | $30 input + $24 output = $54/month |
| GPT-5 Mini + Caching | $3 input + $24 output = $27/month |
Example 2: RAG Pipeline Processing 1M Queries/Month
Average query: 2,000 input tokens (including retrieved context), 500 output tokens.
| Model | Monthly Cost |
|---|---|
| GPT-4.1 | $4,000 input + $4,000 output = $8,000/month |
| GPT-4.1 + Caching | $1,000 input + $4,000 output = $5,000/month |
| GPT-4.1 + Caching + Batch | $500 input + $2,000 output = $2,500/month |
Example 3: Code Review Agent (500 PRs/day)
Average PR: 15,000 input tokens (code diff + context), 2,000 output tokens.
| Model | Monthly Cost |
|---|---|
| GPT-5.4 | $562 input + $450 output = $1,012/month |
| GPT-5 | $281 input + $300 output = $581/month |
| o3 | $450 input + $240 output = $690/month |
OpenAI vs Competitors: Quick Price Comparison
How does OpenAI stack up against the alternatives as of March 2026?
| Model | Input (per 1M) | Output (per 1M) | Context | Best For |
|---|---|---|---|---|
| GPT-5 | $1.25 | $10.00 | 400K | General purpose |
| GPT-5.4 | $2.50 | $15.00 | 1.05M | Agents, computer use |
| Claude Opus 4.6 | $5.00 | $25.00 | 1M | Nuanced writing, coding |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | Balanced quality/cost |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | Multimodal, search integration |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | Budget multimodal |
| DeepSeek V4 | $0.27 | $1.10 | 128K | Budget general purpose |
| Grok 4.1 | $0.20 | $0.50 | 128K | Cheapest option |
Key takeaways:
- GPT-5 and Gemini 3.1 Pro are closely price-matched. Google’s advantage is the 1M context window; OpenAI’s is the broader tool ecosystem.
- Claude models cost 2-4x more per token but have strong advantages in instruction following and creative tasks.
- DeepSeek and Grok are significantly cheaper but have smaller context windows and fewer built-in tools.
- For pure cost optimization, GPT-5 at batch pricing ($0.625/$5) beats standard Claude Sonnet 4.6 pricing ($3/$15) while delivering comparable quality.
Rate Limits by Tier
Every OpenAI API account has usage limits based on how much you’ve spent on the platform:
| Tier | Qualification | GPT-5 RPM | GPT-5 TPM |
|---|---|---|---|
| Free | Sign up | 3 | 40,000 |
| Tier 1 | $5 spend | 500 | 200,000 |
| Tier 2 | $50 spend | 5,000 | 2,000,000 |
| Tier 3 | $100 spend | 5,000 | 4,000,000 |
| Tier 4 | $250 spend | 10,000 | 10,000,000 |
| Tier 5 | $1,000 spend | 10,000 | 30,000,000 |
RPM = requests per minute. TPM = tokens per minute. Most production apps need Tier 2+ to avoid throttling.
Built-in Tool Pricing
OpenAI’s built-in tools are billed separately from model tokens:
| Tool | Pricing |
|---|---|
| Web Search | Per-call fee (varies by model) + search content tokens at model’s input rate |
| Code Interpreter | Included in token costs (no separate fee) |
| File Search | $0.10/GB storage/day + token costs |
| Image Generation | $0.01-$0.17 per image (varies by quality/size) |
| Computer Use | Token costs only (no separate tool fee) |
For web search specifically: tool calls are billed per 1,000 calls based on the model and tool version. Search content tokens (the data retrieved from the web) are billed at the model’s input token rate.
7 Ways to Cut Your OpenAI API Bill
- Use prompt caching — If your system prompt is consistent, you’re automatically saving 75-90% on those tokens. Structure your prompts with the static context first.
- Route to cheaper models — Send simple tasks to GPT-4.1 Nano ($0.10/$0.40) and only escalate to GPT-5 when needed. A waterfall approach handles 70-80% of traffic at a fraction of the cost.
- Use the Batch API — For anything that doesn’t need real-time results, batch processing cuts costs by 50% across the board.
- Control output length — Output tokens cost 4-8x more than input tokens. Asking for concise responses and setting
max_tokenslimits directly reduces your biggest cost driver. - Use reasoning effort levels — GPT-5.4 and GPT-5.2 support
reasoning.effortparameter. Set it tononeorlowfor simple tasks instead of defaulting to full reasoning. - Trim your input context — Don’t send entire documents when a relevant excerpt will do. RAG pipelines that retrieve focused chunks save significantly over stuffing the full context window.
- Monitor and set alerts — Use OpenAI’s usage dashboard and billing APIs to track token consumption. Set spending limits to avoid surprises.
ChatGPT Subscription Plans (Not API)
If you’re using ChatGPT directly (not the API), here are the current subscription options:
| Plan | Price | What You Get |
|---|---|---|
| Free | $0 | Access to GPT-4o Mini, basic features, limited usage |
| Plus | $20/month | GPT-5, DALL·E, browsing, advanced data analysis |
| Pro | $200/month | Maximum usage limits, priority access, o3 Pro mode |
| Team | $25-30/user/month | Workspace, admin controls, higher limits |
| Enterprise | Custom pricing | SSO, audit logs, unlimited access, no data training |
Key Changes from 2025 to 2026
If you’re updating from last year’s pricing, here’s what changed:
- GPT-5.4 launched (March 2026) — New flagship with computer use and 1.05M context at $2.50/$15
- GPT-5.2 price stable — Still $1.75/$14, but now marked as “previous frontier” since GPT-5.4
- GPT-5 remains best value — At $1.25/$10 with 90% cache discount, it’s the production workhorse
- Container pricing change — Starting March 31, 2026, containers are billed per 20-minute session
- Regional processing surcharge — GPT-5.4 has a 10% uplift for data residency endpoints
- Free tier access expanded — GPT-5 Mini now available on the free tier (with strict rate limits)
Bottom Line
OpenAI’s 2026 pricing gives developers more options than ever. The key insight is that model selection combined with discount stacking matters more than raw per-token rates.
A well-optimized GPT-5 setup using prompt caching and batch processing can cost less than $0.10 per million input tokens — cheaper than almost any competitor’s standard pricing, including open-source API providers.
Start with GPT-5 for most use cases. Upgrade to GPT-5.4 only when you need computer use, the 1.05M context window, or tool search. Route simple tasks to GPT-4.1 Nano. And always use caching.

Comments