OpenAI released GPT-5.4 on March 5, 2026 — and this isn’t just another incremental update. For the first time, a general-purpose model ships with native computer-use capabilities, a 1 million token context window, and a tool search mechanism that cuts token costs by 47% in agent-heavy workflows.
If you’re building AI-powered applications, this changes how you architect agents, handle large codebases, and manage API costs.
This guide covers everything you need to know as a developer: what’s new, how the API works, pricing breakdown, migration from GPT-5.2/5.3-Codex, and practical Python examples you can run today.
What’s New in GPT-5.4
GPT-5.4 unifies the GPT and Codex lines into a single frontier model. Here’s what that means in practice:
Native Computer Use
GPT-5.4 can operate computers directly — clicking buttons, typing text, reading screenshots, and navigating between applications. This was previously limited to specialized models, but now it’s built into the general-purpose API.
The numbers speak for themselves: GPT-5.4 scores 75.0% on OSWorld-Verified, surpassing human performance at 72.4% on desktop navigation tasks. That’s a jump from GPT-5.2’s 47.3%.
In practice, this means you can build agents that:
- Navigate web applications and fill forms
- Debug frontend UIs by visually inspecting them
- Automate multi-step workflows across different software
- Replace fragile Selenium/Playwright scripts with intelligent navigation
1 Million Token Context Window
The context window jumps to 1.05 million tokens (922K input + 128K output). For reference, that’s roughly:
- An entire medium-sized codebase
- 15-20 full-length technical documents
- Hours of transcribed conversation
- Complete contract review packages
This is accessible via the API and Codex. ChatGPT users get the standard context limits based on their plan.
Tool Search
When your agent has access to dozens or hundreds of tools, loading every tool definition into the prompt wastes tokens and increases latency. Tool search lets GPT-5.4 receive a lightweight tool list and look up full definitions on demand.
The result: 47% fewer tokens in tool-heavy workflows with zero loss in accuracy. If you’re building agents with MCP servers or large function libraries, this is significant.
Reasoning Effort Control
GPT-5.4 introduces the reasoning.effort parameter that controls how much internal compute the model allocates before responding:
- none — No chain-of-thought. Fastest, cheapest. Good for simple formatting or extraction tasks.
- low — Minimal reasoning. Good for classification and straightforward Q&A.
- medium — Balanced. Solid default for most development tasks.
- high — Deep reasoning. Use for complex code generation and multi-step analysis.
- xhigh — Maximum compute. Reserved for hard benchmarks, legal analysis, and complex debugging.
Improved Token Efficiency
GPT-5.4 uses up to 47% fewer tokens on complex tasks compared to GPT-5.2. Combined with the reasoning effort control, this means you can get better results for less money if you tune the parameters correctly.
Better Factual Accuracy
OpenAI reports that GPT-5.4’s claims are 33% less likely to be false and full responses are 18% less likely to contain any errors compared to GPT-5.2. For production applications where hallucinations cost real money and trust, that’s a meaningful improvement.
GPT-5.4 API: Key Changes
Model String
gpt-5.4 # Standard
gpt-5.4-pro # Higher compute for hardest problems
GPT-5.4 is available via the Responses API. You need a Tier 1+ API account (minimum $5 prior spend).
Basic API Call
python
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5.4",
input="Explain the differences between REST and GraphQL for a production API.",
reasoning={
"effort": "medium"
}
)
print(response.output_text)
Using Reasoning Effort
python
# For a simple extraction task — use minimal reasoning
response = client.responses.create(
model="gpt-5.4",
input="Extract all email addresses from this text: ...",
reasoning={"effort": "none"}
)
# For complex code generation — use high reasoning
response = client.responses.create(
model="gpt-5.4",
input="Refactor this FastAPI application to use the repository pattern with dependency injection.",
reasoning={"effort": "high"}
)
Computer Use
Computer use is accessed through the computer_use_preview tool in the Responses API:
python
response = client.responses.create(
model="gpt-5.4",
tools=[{
"type": "computer_use_preview",
"display_width": 1920,
"display_height": 1080,
"environment": "browser"
}],
input="Go to GitHub and create a new repository named 'my-project' with a Python .gitignore.",
reasoning={"effort": "medium"}
)
Safety note: Always run computer-use agents in isolated environments (containers, VMs, sandboxed browsers). Keep a human in the loop for high-impact actions like payments, account changes, or data deletion.
Tool Search
If your agent has access to many tools, use tool search to avoid loading all definitions upfront:
python
response = client.responses.create(
model="gpt-5.4",
tools=[
{
"type": "function",
"name": "search_tools",
"description": "Search available tools by keyword",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
}
],
input="Find the tool for sending Slack notifications and use it to post a deployment update.",
reasoning={"effort": "medium"}
)
Verbosity Control
GPT-5.4 introduces a text.verbosity parameter to control output length:
python
response = client.responses.create(
model="gpt-5.4",
input="Summarize this 50-page contract.",
text={"verbosity": "concise"} # Options: concise, default, verbose
)
Pricing Breakdown
Here’s the complete pricing structure — and the critical threshold you need to know about.
Standard Pricing (Under 272K Input Tokens)
| Component | Price per 1M Tokens |
|---|---|
| Input | $2.50 |
| Cached Input | $1.25 |
| Output | $15.00 |
Long Context Pricing (Over 272K Input Tokens)
| Component | Price per 1M Tokens |
|---|---|
| Input | $5.00 (2x) |
| Output | $22.50 (1.5x) |
GPT-5.4 Pro Pricing
| Component | Price per 1M Tokens |
|---|---|
| Input | $30.00 |
| Output | $180.00 |
The 272K Threshold — Read This Carefully
The 272K token boundary is the most important pricing detail in GPT-5.4. Once your input exceeds 272K tokens, the higher rate applies to the entire session, not just the overflow. This means crossing from 271K to 273K tokens doesn’t just make those 2K extra tokens more expensive — it doubles the cost of your entire input.
Practical advice: For most applications, stay under 272K. Use prompt compaction, summarization, or chunking strategies to keep inputs lean. The 1M context window exists for when you genuinely need it (full codebase analysis, multi-document legal review), not as a default.
Cost Saving Strategies
- Use cached input pricing — Keep your system prompt and common context identical across requests. Cached tokens cost $1.25/M vs $2.50/M — a 50% saving.
- Tune reasoning effort — Don’t use
highorxhighfor simple tasks. A classification task atnonecosts a fraction of what it costs athigh. - Use Batch API — For non-time-sensitive tasks, batch processing runs at 50% of standard pricing.
- Use Flex processing — Similar to batch, offers lower prices with higher latency tolerance.
- Stay under 272K — Structure your prompts to avoid the long-context surcharge unless the task genuinely requires it.
GPT-5.4 vs GPT-5.2 vs GPT-5.3-Codex
| Feature | GPT-5.2 | GPT-5.3-Codex | GPT-5.4 |
|---|---|---|---|
| Context Window | 400K | 1M (code-focused) | 1.05M (general) |
| Computer Use | No | No | Yes (native) |
| Tool Search | No | No | Yes |
| Reasoning Effort | Yes | Yes | Yes (improved) |
| OSWorld Score | 47.3% | N/A | 75.0% |
| SWE-Bench Pro | Lower | Strong | Matches/exceeds |
| Input Price/1M | $1.75 | Varies | $2.50 |
| Output Price/1M | $7.00 | Varies | $15.00 |
| Token Efficiency | Baseline | High (code) | 47% fewer tokens |
When to use GPT-5.4: Most new development. It combines the strengths of both GPT-5.2 and 5.3-Codex.
When GPT-5.2 still makes sense: Budget-constrained applications where you don’t need computer use or the 1M context window. At $1.75/$7.00, it’s significantly cheaper for simpler tasks.
Migration Guide: Moving from GPT-5.2 to GPT-5.4
Step 1: Change the Model String
python
# Before
response = client.responses.create(model="gpt-5.2", ...)
# After
response = client.responses.create(model="gpt-5.4", ...)
Step 2: Re-Evaluate Reasoning Effort
The same effort levels exist (none through xhigh), but GPT-5.4 may produce different quality tradeoffs at each level. Test your existing defaults — you may be able to drop from high to medium without quality loss.
Step 3: Review Output Token Budgets
GPT-5.4 is more concise — up to 47% fewer tokens on complex tasks. If you’re setting max_output_tokens, you may be able to lower it and save on output costs.
Step 4: Evaluate Computer Use
If you had workarounds for UI automation (Selenium scripts, custom Playwright setups, RPA tools), GPT-5.4’s native computer use may replace them entirely. Test your automation workflows against the new capability.
Step 5: Watch the 272K Boundary
If your GPT-5.2 workloads used large contexts but stayed within its pricing structure, recalculate costs with GPT-5.4’s tiered pricing. Prompts that were affordable at GPT-5.2 rates might be significantly more expensive if they cross the 272K threshold.
Step 6: Test Preambles
GPT-5.4 supports preambles — brief explanations the model generates before tool calls. Enable them for better debugging and user confidence:
python
response = client.responses.create(
model="gpt-5.4",
instructions="Before you call a tool, explain why you are calling it.",
tools=[...],
input="Analyze this month's sales data and create a summary report."
)
Practical Example: Building an Agent with GPT-5.4
Here’s a complete example of a FastAPI application that uses GPT-5.4 as an intelligent code review agent:
python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI()
client = OpenAI()
class ReviewRequest(BaseModel):
code: str
language: str = "python"
focus: str = "security" # security, performance, readability
@app.post("/review")
async def review_code(request: ReviewRequest):
# Choose reasoning effort based on code length
effort = "medium"
if len(request.code) > 5000:
effort = "high"
try:
response = client.responses.create(
model="gpt-5.4",
reasoning={"effort": effort},
instructions=f"""You are a senior {request.language} developer
performing a code review focused on {request.focus}.
Return your review as JSON with this structure:
{{
"summary": "Brief overall assessment",
"issues": [
{{
"severity": "critical|warning|info",
"line": "approximate line number or range",
"description": "what's wrong",
"suggestion": "how to fix it"
}}
],
"score": 1-10
}}""",
input=f"Review this {request.language} code:\n\n```{request.language}\n{request.code}\n```",
text={"format": {"type": "json_object"}}
)
return {"review": response.output_text}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
This example demonstrates several GPT-5.4 features working together: reasoning effort tuned to input complexity, structured JSON output, and a practical developer tool that could be integrated into a CI/CD pipeline.
What GPT-5.4 Means for Developers
The release of GPT-5.4 signals a clear direction: AI is becoming standard infrastructure, not an add-on feature.
Native computer use means agents can now interact with software the way humans do — through the UI. Tool search means agents can work with massive tool ecosystems without drowning in token costs. The 1M context window means entire codebases and document collections fit in a single request.
For developers building AI-powered products, the practical implications are:
- UI automation is now a model capability, not a separate toolchain. If you’re maintaining Selenium or Puppeteer scripts for AI-driven automation, evaluate whether GPT-5.4 can replace them.
- Agent architectures get simpler. Tool search and improved reasoning mean less scaffolding code and fewer retry loops.
- Cost optimization requires active management. The 272K pricing threshold and tiered reasoning effort mean your API costs are directly tied to how well you configure each request. Default settings will cost more than tuned ones.
- The GPT-5.3-Codex niche is absorbed. GPT-5.4 matches or exceeds Codex performance while adding computer use and broader capabilities. For new projects, there’s little reason to target Codex specifically.
Start building. The model is available now at gpt-5.4 for Tier 1+ API accounts.

Comments