GPT-5.4 API Guide: Computer Use, 1M Context & Pricing (2026)

OpenAI released GPT-5.4 on March 5, 2026 — and this isn’t just another incremental update. For the first time, a general-purpose model ships with native computer-use capabilities, a 1 million token context window, and a tool search mechanism that cuts token costs by 47% in agent-heavy workflows.

If you’re building AI-powered applications, this changes how you architect agents, handle large codebases, and manage API costs.

This guide covers everything you need to know as a developer: what’s new, how the API works, pricing breakdown, migration from GPT-5.2/5.3-Codex, and practical Python examples you can run today.

What’s New in GPT-5.4

GPT-5.4 unifies the GPT and Codex lines into a single frontier model. Here’s what that means in practice:

Native Computer Use

GPT-5.4 can operate computers directly — clicking buttons, typing text, reading screenshots, and navigating between applications. This was previously limited to specialized models, but now it’s built into the general-purpose API.

The numbers speak for themselves: GPT-5.4 scores 75.0% on OSWorld-Verified, surpassing human performance at 72.4% on desktop navigation tasks. That’s a jump from GPT-5.2’s 47.3%.

In practice, this means you can build agents that:

Navigate web applications and fill forms
Debug frontend UIs by visually inspecting them
Automate multi-step workflows across different software
Replace fragile Selenium/Playwright scripts with intelligent navigation

1 Million Token Context Window

The context window jumps to 1.05 million tokens (922K input + 128K output). For reference, that’s roughly:

An entire medium-sized codebase
15-20 full-length technical documents
Hours of transcribed conversation
Complete contract review packages

This is accessible via the API and Codex. ChatGPT users get the standard context limits based on their plan.

Tool Search

When your agent has access to dozens or hundreds of tools, loading every tool definition into the prompt wastes tokens and increases latency. Tool search lets GPT-5.4 receive a lightweight tool list and look up full definitions on demand.

The result: 47% fewer tokens in tool-heavy workflows with zero loss in accuracy. If you’re building agents with MCP servers or large function libraries, this is significant.

Reasoning Effort Control

GPT-5.4 introduces the reasoning.effort parameter that controls how much internal compute the model allocates before responding:

none — No chain-of-thought. Fastest, cheapest. Good for simple formatting or extraction tasks.
low — Minimal reasoning. Good for classification and straightforward Q&A.
medium — Balanced. Solid default for most development tasks.
high — Deep reasoning. Use for complex code generation and multi-step analysis.
xhigh — Maximum compute. Reserved for hard benchmarks, legal analysis, and complex debugging.

Improved Token Efficiency

GPT-5.4 uses up to 47% fewer tokens on complex tasks compared to GPT-5.2. Combined with the reasoning effort control, this means you can get better results for less money if you tune the parameters correctly.

Better Factual Accuracy

OpenAI reports that GPT-5.4’s claims are 33% less likely to be false and full responses are 18% less likely to contain any errors compared to GPT-5.2. For production applications where hallucinations cost real money and trust, that’s a meaningful improvement.

GPT-5.4 API: Key Changes

Model String

gpt-5.4          # Standard
gpt-5.4-pro      # Higher compute for hardest problems

GPT-5.4 is available via the Responses API. You need a Tier 1+ API account (minimum $5 prior spend).

Basic API Call

python

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.4",
    input="Explain the differences between REST and GraphQL for a production API.",
    reasoning={
        "effort": "medium"
    }
)

print(response.output_text)

Using Reasoning Effort

python

# For a simple extraction task — use minimal reasoning
response = client.responses.create(
    model="gpt-5.4",
    input="Extract all email addresses from this text: ...",
    reasoning={"effort": "none"}
)

# For complex code generation — use high reasoning
response = client.responses.create(
    model="gpt-5.4",
    input="Refactor this FastAPI application to use the repository pattern with dependency injection.",
    reasoning={"effort": "high"}
)

Computer Use

Computer use is accessed through the computer_use_preview tool in the Responses API:

python

response = client.responses.create(
    model="gpt-5.4",
    tools=[{
        "type": "computer_use_preview",
        "display_width": 1920,
        "display_height": 1080,
        "environment": "browser"
    }],
    input="Go to GitHub and create a new repository named 'my-project' with a Python .gitignore.",
    reasoning={"effort": "medium"}
)

Safety note: Always run computer-use agents in isolated environments (containers, VMs, sandboxed browsers). Keep a human in the loop for high-impact actions like payments, account changes, or data deletion.

Tool Search

If your agent has access to many tools, use tool search to avoid loading all definitions upfront:

python

response = client.responses.create(
    model="gpt-5.4",
    tools=[
        {
            "type": "function",
            "name": "search_tools",
            "description": "Search available tools by keyword",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                }
            }
        }
    ],
    input="Find the tool for sending Slack notifications and use it to post a deployment update.",
    reasoning={"effort": "medium"}
)

Verbosity Control

GPT-5.4 introduces a text.verbosity parameter to control output length:

python

response = client.responses.create(
    model="gpt-5.4",
    input="Summarize this 50-page contract.",
    text={"verbosity": "concise"}  # Options: concise, default, verbose
)

Pricing Breakdown

Here’s the complete pricing structure — and the critical threshold you need to know about.

Standard Pricing (Under 272K Input Tokens)

Component	Price per 1M Tokens
Input	$2.50
Cached Input	$1.25
Output	$15.00

Long Context Pricing (Over 272K Input Tokens)

Component	Price per 1M Tokens
Input	$5.00 (2x)
Output	$22.50 (1.5x)

GPT-5.4 Pro Pricing

Component	Price per 1M Tokens
Input	$30.00
Output	$180.00

The 272K Threshold — Read This Carefully

The 272K token boundary is the most important pricing detail in GPT-5.4. Once your input exceeds 272K tokens, the higher rate applies to the entire session, not just the overflow. This means crossing from 271K to 273K tokens doesn’t just make those 2K extra tokens more expensive — it doubles the cost of your entire input.

Practical advice: For most applications, stay under 272K. Use prompt compaction, summarization, or chunking strategies to keep inputs lean. The 1M context window exists for when you genuinely need it (full codebase analysis, multi-document legal review), not as a default.

Cost Saving Strategies

Use cached input pricing — Keep your system prompt and common context identical across requests. Cached tokens cost $1.25/M vs $2.50/M — a 50% saving.
Tune reasoning effort — Don’t use high or xhigh for simple tasks. A classification task at none costs a fraction of what it costs at high.
Use Batch API — For non-time-sensitive tasks, batch processing runs at 50% of standard pricing.
Use Flex processing — Similar to batch, offers lower prices with higher latency tolerance.
Stay under 272K — Structure your prompts to avoid the long-context surcharge unless the task genuinely requires it.

GPT-5.4 vs GPT-5.2 vs GPT-5.3-Codex

Feature	GPT-5.2	GPT-5.3-Codex	GPT-5.4
Context Window	400K	1M (code-focused)	1.05M (general)
Computer Use	No	No	Yes (native)
Tool Search	No	No	Yes
Reasoning Effort	Yes	Yes	Yes (improved)
OSWorld Score	47.3%	N/A	75.0%
SWE-Bench Pro	Lower	Strong	Matches/exceeds
Input Price/1M	$1.75	Varies	$2.50
Output Price/1M	$7.00	Varies	$15.00
Token Efficiency	Baseline	High (code)	47% fewer tokens

When to use GPT-5.4: Most new development. It combines the strengths of both GPT-5.2 and 5.3-Codex.

When GPT-5.2 still makes sense: Budget-constrained applications where you don’t need computer use or the 1M context window. At $1.75/$7.00, it’s significantly cheaper for simpler tasks.

Migration Guide: Moving from GPT-5.2 to GPT-5.4

Step 1: Change the Model String

python

# Before
response = client.responses.create(model="gpt-5.2", ...)

# After
response = client.responses.create(model="gpt-5.4", ...)

Step 2: Re-Evaluate Reasoning Effort

The same effort levels exist (none through xhigh), but GPT-5.4 may produce different quality tradeoffs at each level. Test your existing defaults — you may be able to drop from high to medium without quality loss.

Step 3: Review Output Token Budgets

GPT-5.4 is more concise — up to 47% fewer tokens on complex tasks. If you’re setting max_output_tokens, you may be able to lower it and save on output costs.

Step 4: Evaluate Computer Use

If you had workarounds for UI automation (Selenium scripts, custom Playwright setups, RPA tools), GPT-5.4’s native computer use may replace them entirely. Test your automation workflows against the new capability.

Step 5: Watch the 272K Boundary

If your GPT-5.2 workloads used large contexts but stayed within its pricing structure, recalculate costs with GPT-5.4’s tiered pricing. Prompts that were affordable at GPT-5.2 rates might be significantly more expensive if they cross the 272K threshold.

Step 6: Test Preambles

GPT-5.4 supports preambles — brief explanations the model generates before tool calls. Enable them for better debugging and user confidence:

python

response = client.responses.create(
    model="gpt-5.4",
    instructions="Before you call a tool, explain why you are calling it.",
    tools=[...],
    input="Analyze this month's sales data and create a summary report."
)

Practical Example: Building an Agent with GPT-5.4

Here’s a complete example of a FastAPI application that uses GPT-5.4 as an intelligent code review agent:

python

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI()

class ReviewRequest(BaseModel):
    code: str
    language: str = "python"
    focus: str = "security"  # security, performance, readability

@app.post("/review")
async def review_code(request: ReviewRequest):
    # Choose reasoning effort based on code length
    effort = "medium"
    if len(request.code) > 5000:
        effort = "high"

    try:
        response = client.responses.create(
            model="gpt-5.4",
            reasoning={"effort": effort},
            instructions=f"""You are a senior {request.language} developer 
            performing a code review focused on {request.focus}. 
            
            Return your review as JSON with this structure:
            {{
                "summary": "Brief overall assessment",
                "issues": [
                    {{
                        "severity": "critical|warning|info",
                        "line": "approximate line number or range",
                        "description": "what's wrong",
                        "suggestion": "how to fix it"
                    }}
                ],
                "score": 1-10
            }}""",
            input=f"Review this {request.language} code:\n\n```{request.language}\n{request.code}\n```",
            text={"format": {"type": "json_object"}}
        )
        
        return {"review": response.output_text}
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This example demonstrates several GPT-5.4 features working together: reasoning effort tuned to input complexity, structured JSON output, and a practical developer tool that could be integrated into a CI/CD pipeline.

What GPT-5.4 Means for Developers

The release of GPT-5.4 signals a clear direction: AI is becoming standard infrastructure, not an add-on feature.

Native computer use means agents can now interact with software the way humans do — through the UI. Tool search means agents can work with massive tool ecosystems without drowning in token costs. The 1M context window means entire codebases and document collections fit in a single request.

For developers building AI-powered products, the practical implications are:

UI automation is now a model capability, not a separate toolchain. If you’re maintaining Selenium or Puppeteer scripts for AI-driven automation, evaluate whether GPT-5.4 can replace them.
Agent architectures get simpler. Tool search and improved reasoning mean less scaffolding code and fewer retry loops.
Cost optimization requires active management. The 272K pricing threshold and tiered reasoning effort mean your API costs are directly tied to how well you configure each request. Default settings will cost more than tuned ones.
The GPT-5.3-Codex niche is absorbed. GPT-5.4 matches or exceeds Codex performance while adding computer use and broader capabilities. For new projects, there’s little reason to target Codex specifically.

Start building. The model is available now at gpt-5.4 for Tier 1+ API accounts.

Categorized in:

ChatGPT

Tagged in:

GPT-5.4 API, GPT-5.4 computer use, GPT-5.4 pricing, GPT-5.4 vs GPT-5.2

Press ESC to close

Or check our Popular Categories...

What’s New in GPT-5.4

Native Computer Use

1 Million Token Context Window

Tool Search

Reasoning Effort Control

Improved Token Efficiency

Better Factual Accuracy

GPT-5.4 API: Key Changes

Model String

Basic API Call

Using Reasoning Effort

Computer Use

Tool Search

Verbosity Control

Pricing Breakdown

Standard Pricing (Under 272K Input Tokens)

Long Context Pricing (Over 272K Input Tokens)

GPT-5.4 Pro Pricing

The 272K Threshold — Read This Carefully

Cost Saving Strategies

GPT-5.4 vs GPT-5.2 vs GPT-5.3-Codex

Migration Guide: Moving from GPT-5.2 to GPT-5.4

Step 1: Change the Model String

Step 2: Re-Evaluate Reasoning Effort

Step 3: Review Output Token Budgets

Step 4: Evaluate Computer Use

Step 5: Watch the 272K Boundary

Step 6: Test Preambles

Practical Example: Building an Agent with GPT-5.4

What GPT-5.4 Means for Developers

Comments

Leave a Reply Cancel reply

Related Articles

Previous Article

Next Article