TL;DR: OpenAI now offers 12+ API models ranging from $0.10 to $150 per million tokens. GPT-5.4 (the newest, March 2026) costs $2.50/$15 per 1M tokens with a 1.05M context window. GPT-5 remains the best value flagship at $1.25/$10. GPT-4.1 Nano is the cheapest capable model at $0.10/$0.40. Batch API saves 50%, prompt caching saves 75-90%, and combining both can reduce your total bill by over 90%. Full pricing table and cost optimization strategies below.


OpenAI’s API pricing has changed significantly since 2025. With GPT-5.4 launching on March 5, 2026, the model lineup now spans four families, multiple context window sizes, and several discount tiers that most developers don’t fully use.

This guide covers every current model’s pricing, the discount mechanisms that actually save money in production, and a decision framework for picking the right model for your use case.

All prices are verified against OpenAI’s official pricing page as of March 2026.


Complete OpenAI API Pricing Table (March 2026)

Flagship Models

ModelInput (per 1M tokens)Cached InputOutput (per 1M tokens)Context Window
GPT-5.4$2.50$0.25$15.001.05M (922K in / 128K out)
GPT-5.4 Pro$15.00$1.50$60.001.05M
GPT-5.2$1.75$0.175$14.001.05M
GPT-5.2 Pro$21.00$2.10$168.001.05M
GPT-5$1.25$0.125$10.00400K

Mid-Tier Models

ModelInput (per 1M tokens)Cached InputOutput (per 1M tokens)Context Window
GPT-4.1$2.00$0.50$8.001M
GPT-4o$2.50$1.25$10.00128K
o3$2.00$0.50$8.00200K
o4-mini$1.10$0.275$4.40200K

Budget Models

ModelInput (per 1M tokens)Cached InputOutput (per 1M tokens)Context Window
GPT-5 Mini$0.25$0.025$2.00128K
GPT-4.1 Mini$0.40$0.10$1.601M
GPT-4o Mini$0.15$0.075$0.60128K
GPT-4.1 Nano$0.10$0.025$0.401M

Specialized Models

ModelPricingNotes
gpt-image-1.5~$0.01-$0.17 per imageVaries by quality (low/medium/high) and size
Whisper (speech-to-text)$0.006 per minute
TTS (text-to-speech)$15.00 per 1M characters
TTS HD$30.00 per 1M characters
text-embedding-3-small$0.02 per 1M tokens
text-embedding-3-large$0.13 per 1M tokens

Important pricing note for GPT-5.4 and GPT-5.2: Prompts exceeding 272K input tokens are billed at 2x input and 1.5x output for the entire session. Keep your prompts under this threshold to avoid the surcharge.


GPT-5.4: What’s New and What It Costs

GPT-5.4 launched March 5, 2026 and is now OpenAI’s most capable model. At $2.50 per million input tokens and $15.00 per million output tokens, it’s more expensive than GPT-5 but brings several features that justify the premium for specific use cases.

Key capabilities:

  • Native computer use (browse, click, type — built into the API via computer_use_preview tool)
  • 1.05M token context window (the largest OpenAI has offered commercially)
  • Tool Search for agent-heavy workflows (47% fewer tokens when using many tools)
  • Reasoning effort control (none, low, medium, high, xhigh)
  • 33% fewer false claims compared to GPT-5.2

When to use GPT-5.4 vs GPT-5:

Use GPT-5.4 when you need computer use, the 1M+ context window, or tool search in agentic workflows. Use GPT-5 ($1.25/$10) for everything else — it’s 50% cheaper on input and 33% cheaper on output, and still handles coding, generation, and reasoning extremely well.

Model string for API calls:

gpt-5.4          # Standard
gpt-5.4-pro      # Maximum capability

How OpenAI Discount Tiers Work

Most developers leave money on the table by not using OpenAI’s built-in discount mechanisms. Here’s how each one works:

1. Prompt Caching (75-90% Off Input Tokens)

When you send the same prompt prefix across multiple requests, OpenAI automatically caches it and charges a reduced rate on subsequent calls.

Model FamilyCache Discount
GPT-5 family90% off cached input
GPT-4.1 family75% off cached input
GPT-4o family50% off cached input

This is automatic — you don’t need to enable it. If your application uses a consistent system prompt, few-shot examples, or document context, you’re already saving on repeat calls.

Real example: GPT-5 drops from $1.25 to $0.125 per million cached tokens. That’s cheaper than GPT-4.1 Nano’s standard rate.

2. Batch API (50% Off Everything)

The Batch API processes requests asynchronously and returns results within 24 hours. The tradeoff is latency, but the reward is a flat 50% discount on all token costs — input and output — across every model.

ModelStandardBatch
GPT-5$1.25 / $10.00$0.625 / $5.00
GPT-4.1$2.00 / $8.00$1.00 / $4.00
GPT-5 Mini$0.25 / $2.00$0.125 / $1.00

Best for: content generation, data processing, bulk analysis, nightly report generation — anything that doesn’t need real-time responses.

3. Flex Processing (Variable Discount)

For requests that aren’t time-sensitive but need faster turnaround than Batch, Flex processing offers lower prices with higher latency. It’s a middle ground between standard and batch pricing.

4. Stacking Discounts (The 90%+ Strategy)

You can combine batch pricing with prompt caching:

GPT-4.1 with caching + batch:

  • Cached input: $0.25 per 1M tokens (87.5% off standard)
  • Output: $4.00 per 1M tokens (50% off standard)

GPT-5 with caching + batch:

  • Cached input: $0.0625 per 1M tokens (95% off standard)
  • Output: $5.00 per 1M tokens (50% off standard)

This is how production systems processing millions of tokens daily keep costs under control.


How to Pick the Right Model

Here’s a straightforward decision framework based on what you’re actually building:

High-quality chat, content, or code generation → GPT-5 ($1.25/$10) The best balance of quality and cost. Handles text, vision, structured output, and function calling. Start here if you’re unsure.

Processing large documents, codebases, or datasets → GPT-4.1 ($2/$8) The 1M token context window is its defining feature. Use it when your input data won’t fit in other models.

Complex reasoning, math, multi-step logic → o3 ($2/$8) Chain-of-thought reasoning model. Excels at problems that need internal deliberation before responding.

Maximum capability, hardest problems → GPT-5.4 ($2.50/$15) Computer use, tool search, 1.05M context. Use for agent workflows and tasks requiring frontier performance.

High-volume classification or extraction → GPT-4.1 Nano ($0.10/$0.40) The cheapest capable model. Good for routing, tagging, entity extraction, and any task where you need millions of calls per day.

Lightweight chat or customer support → GPT-5 Mini ($0.25/$2) Strong quality at 5x less than GPT-5. Good enough for most customer-facing chat applications.


Real-World Cost Examples

Example 1: Customer Support Chatbot (10,000 conversations/day)

Average conversation: 800 input tokens, 400 output tokens.

ModelMonthly Cost
GPT-5$150 input + $120 output = $270/month
GPT-5 Mini$30 input + $24 output = $54/month
GPT-5 Mini + Caching$3 input + $24 output = $27/month

Example 2: RAG Pipeline Processing 1M Queries/Month

Average query: 2,000 input tokens (including retrieved context), 500 output tokens.

ModelMonthly Cost
GPT-4.1$4,000 input + $4,000 output = $8,000/month
GPT-4.1 + Caching$1,000 input + $4,000 output = $5,000/month
GPT-4.1 + Caching + Batch$500 input + $2,000 output = $2,500/month

Example 3: Code Review Agent (500 PRs/day)

Average PR: 15,000 input tokens (code diff + context), 2,000 output tokens.

ModelMonthly Cost
GPT-5.4$562 input + $450 output = $1,012/month
GPT-5$281 input + $300 output = $581/month
o3$450 input + $240 output = $690/month

OpenAI vs Competitors: Quick Price Comparison

How does OpenAI stack up against the alternatives as of March 2026?

ModelInput (per 1M)Output (per 1M)ContextBest For
GPT-5$1.25$10.00400KGeneral purpose
GPT-5.4$2.50$15.001.05MAgents, computer use
Claude Opus 4.6$5.00$25.001MNuanced writing, coding
Claude Sonnet 4.6$3.00$15.00200KBalanced quality/cost
Gemini 3.1 Pro$2.00$12.001MMultimodal, search integration
Gemini 3 Flash$0.50$3.001MBudget multimodal
DeepSeek V4$0.27$1.10128KBudget general purpose
Grok 4.1$0.20$0.50128KCheapest option

Key takeaways:

  • GPT-5 and Gemini 3.1 Pro are closely price-matched. Google’s advantage is the 1M context window; OpenAI’s is the broader tool ecosystem.
  • Claude models cost 2-4x more per token but have strong advantages in instruction following and creative tasks.
  • DeepSeek and Grok are significantly cheaper but have smaller context windows and fewer built-in tools.
  • For pure cost optimization, GPT-5 at batch pricing ($0.625/$5) beats standard Claude Sonnet 4.6 pricing ($3/$15) while delivering comparable quality.

Rate Limits by Tier

Every OpenAI API account has usage limits based on how much you’ve spent on the platform:

TierQualificationGPT-5 RPMGPT-5 TPM
FreeSign up340,000
Tier 1$5 spend500200,000
Tier 2$50 spend5,0002,000,000
Tier 3$100 spend5,0004,000,000
Tier 4$250 spend10,00010,000,000
Tier 5$1,000 spend10,00030,000,000

RPM = requests per minute. TPM = tokens per minute. Most production apps need Tier 2+ to avoid throttling.


Built-in Tool Pricing

OpenAI’s built-in tools are billed separately from model tokens:

ToolPricing
Web SearchPer-call fee (varies by model) + search content tokens at model’s input rate
Code InterpreterIncluded in token costs (no separate fee)
File Search$0.10/GB storage/day + token costs
Image Generation$0.01-$0.17 per image (varies by quality/size)
Computer UseToken costs only (no separate tool fee)

For web search specifically: tool calls are billed per 1,000 calls based on the model and tool version. Search content tokens (the data retrieved from the web) are billed at the model’s input token rate.


7 Ways to Cut Your OpenAI API Bill

  1. Use prompt caching — If your system prompt is consistent, you’re automatically saving 75-90% on those tokens. Structure your prompts with the static context first.
  2. Route to cheaper models — Send simple tasks to GPT-4.1 Nano ($0.10/$0.40) and only escalate to GPT-5 when needed. A waterfall approach handles 70-80% of traffic at a fraction of the cost.
  3. Use the Batch API — For anything that doesn’t need real-time results, batch processing cuts costs by 50% across the board.
  4. Control output length — Output tokens cost 4-8x more than input tokens. Asking for concise responses and setting max_tokens limits directly reduces your biggest cost driver.
  5. Use reasoning effort levels — GPT-5.4 and GPT-5.2 support reasoning.effort parameter. Set it to none or low for simple tasks instead of defaulting to full reasoning.
  6. Trim your input context — Don’t send entire documents when a relevant excerpt will do. RAG pipelines that retrieve focused chunks save significantly over stuffing the full context window.
  7. Monitor and set alerts — Use OpenAI’s usage dashboard and billing APIs to track token consumption. Set spending limits to avoid surprises.

ChatGPT Subscription Plans (Not API)

If you’re using ChatGPT directly (not the API), here are the current subscription options:

PlanPriceWhat You Get
Free$0Access to GPT-4o Mini, basic features, limited usage
Plus$20/monthGPT-5, DALL·E, browsing, advanced data analysis
Pro$200/monthMaximum usage limits, priority access, o3 Pro mode
Team$25-30/user/monthWorkspace, admin controls, higher limits
EnterpriseCustom pricingSSO, audit logs, unlimited access, no data training

Key Changes from 2025 to 2026

If you’re updating from last year’s pricing, here’s what changed:

  • GPT-5.4 launched (March 2026) — New flagship with computer use and 1.05M context at $2.50/$15
  • GPT-5.2 price stable — Still $1.75/$14, but now marked as “previous frontier” since GPT-5.4
  • GPT-5 remains best value — At $1.25/$10 with 90% cache discount, it’s the production workhorse
  • Container pricing change — Starting March 31, 2026, containers are billed per 20-minute session
  • Regional processing surcharge — GPT-5.4 has a 10% uplift for data residency endpoints
  • Free tier access expanded — GPT-5 Mini now available on the free tier (with strict rate limits)

Bottom Line

OpenAI’s 2026 pricing gives developers more options than ever. The key insight is that model selection combined with discount stacking matters more than raw per-token rates.

A well-optimized GPT-5 setup using prompt caching and batch processing can cost less than $0.10 per million input tokens — cheaper than almost any competitor’s standard pricing, including open-source API providers.

Start with GPT-5 for most use cases. Upgrade to GPT-5.4 only when you need computer use, the 1.05M context window, or tool search. Route simple tasks to GPT-4.1 Nano. And always use caching.

Categorized in: