High Performance API Design: Handle 1M Requests/Second

Understanding the Challenge
Architecture Fundamentals
Load Balancing
Caching Strategies
Database Optimization
Asynchronous Processing
Rate Limiting
Horizontal Scaling
Monitoring and Observability
Real-World Implementation

Understanding the Challenge

What Does 1 Million RPS Mean?

1 million requests per second (RPS) means your API needs to handle:

1,000,000 requests every second
60 million requests per minute
86.4 billion requests per day

To put this in perspective:

A single server typically handles 1,000-10,000 RPS
Twitter handles around 6,000 tweets per second at peak
Netflix serves millions of concurrent streams

Why Single Server Approach Fails

A typical Flask or Laravel application on one server can handle:

Flask: 500-2,000 RPS (with Gunicorn/uWSGI)
Laravel: 300-1,500 RPS (with PHP-FPM)

Bottlenecks:

CPU processing limits
Memory constraints
Network bandwidth
Database connections
Disk I/O

The solution: Distributed architecture, not bigger servers.

Architecture Fundamentals

The High-Level Architecture

Internet → CDN → Load Balancer → API Servers → Cache → Database
                                      ↓
                               Message Queue → Workers

Key Principles

Statelessness: Each request is independent
Horizontal Scaling: Add more servers, not bigger ones
Caching: Store frequently accessed data
Async Processing: Don’t make users wait
Database Optimization: Your database is often the bottleneck

Load Balancing

What is a Load Balancer?

A load balancer distributes incoming requests across multiple servers, preventing any single server from being overwhelmed.

Load Balancing Algorithms

1. Round Robin

Distributes requests evenly in sequence
Simple but doesn’t consider server load

2. Least Connections

Sends requests to server with fewest active connections
Better for long-running requests

3. IP Hash

Routes based on client IP
Ensures same client goes to same server

Implementation with NGINX

# /etc/nginx/nginx.conf
http {
    upstream api_backend {
        least_conn;  # Use least connections algorithm
        
        server api1.example.com:5000 weight=3;
        server api2.example.com:5000 weight=2;
        server api3.example.com:5000 weight=2;
        server api4.example.com:5000 backup;  # Backup server
        
        keepalive 32;  # Connection pooling
    }
    
    server {
        listen 80;
        
        location /api/ {
            proxy_pass http://api_backend;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            
            # Timeouts
            proxy_connect_timeout 5s;
            proxy_send_timeout 10s;
            proxy_read_timeout 10s;
        }
    }
}

Hardware Load Balancers vs Software

Hardware (F5, Citrix):

Expensive ($10,000+)
High performance
Dedicated hardware

Software (NGINX, HAProxy, AWS ALB):

Cost-effective
Flexible and programmable
Easy to scale

Caching Strategies

Why Caching Matters

Caching can reduce database load by 80-95% and response time from 200ms to 5ms.

Caching Layers

Browser Cache → CDN → Application Cache → Database Cache

1. In-Memory Caching (Redis/Memcached)

Flask Example with Redis:

from flask import Flask, jsonify
from redis import Redis
import json
import time

app = Flask(__name__)
redis_client = Redis(host='localhost', port=6379, decode_responses=True)

@app.route('/api/user/<int:user_id>')
def get_user(user_id):
    # Try cache first
    cache_key = f'user:{user_id}'
    cached_user = redis_client.get(cache_key)
    
    if cached_user:
        print("Cache HIT")
        return jsonify(json.loads(cached_user))
    
    # Cache MISS - fetch from database
    print("Cache MISS")
    user = fetch_user_from_db(user_id)  # Expensive DB query
    
    # Store in cache for 1 hour
    redis_client.setex(
        cache_key,
        3600,  # TTL in seconds
        json.dumps(user)
    )
    
    return jsonify(user)

def fetch_user_from_db(user_id):
    time.sleep(0.1)  # Simulate slow DB query
    return {
        'id': user_id,
        'name': 'John Doe',
        'email': 'john@example.com'
    }

Laravel Example with Redis:

<?php
// routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\Redis;
use Illuminate\Support\Facades\DB;

Route::get('/user/{id}', function ($id) {
    $cacheKey = "user:{$id}";
    
    // Try cache first
    $cached = Redis::get($cacheKey);
    
    if ($cached) {
        Log::info("Cache HIT");
        return response()->json(json_decode($cached));
    }
    
    // Cache MISS - fetch from database
    Log::info("Cache MISS");
    $user = DB::table('users')->find($id);
    
    // Store in cache for 1 hour
    Redis::setex($cacheKey, 3600, json_encode($user));
    
    return response()->json($user);
});

2. Cache Invalidation Strategies

Problem: When do you update/delete cached data?

Strategy 1: Time-based (TTL – Time To Live)

# Cache expires after 1 hour
redis_client.setex('key', 3600, value)

Strategy 2: Event-based

@app.route('/api/user/<int:user_id>', methods=['PUT'])
def update_user(user_id):
    # Update database
    update_user_in_db(user_id, request.json)
    
    # Invalidate cache
    redis_client.delete(f'user:{user_id}')
    
    return jsonify({'message': 'Updated'})

Strategy 3: Cache Aside Pattern

def get_user_with_cache_aside(user_id):
    cache_key = f'user:{user_id}'
    
    # 1. Try cache
    user = redis_client.get(cache_key)
    if user:
        return json.loads(user)
    
    # 2. Fetch from DB
    user = fetch_user_from_db(user_id)
    
    # 3. Populate cache
    redis_client.setex(cache_key, 3600, json.dumps(user))
    
    return user

3. CDN for Static Assets

Use CDNs (CloudFlare, AWS CloudFront, Fastly) for:

Images, CSS, JavaScript
API responses for public data
Reduces server load by 60-80%

Database Optimization

The Database is Your Bottleneck

At high scale, 80% of performance issues come from the database.

1. Connection Pooling

Problem: Creating database connections is expensive (50-100ms each).

Solution: Reuse connections from a pool.

Flask Example:

from flask import Flask
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__)

# Connection pooling configuration
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://user:pass@localhost/db'
app.config['SQLALCHEMY_POOL_SIZE'] = 20        # Max connections
app.config['SQLALCHEMY_POOL_TIMEOUT'] = 10     # Wait time for connection
app.config['SQLALCHEMY_POOL_RECYCLE'] = 3600   # Recycle connections every hour
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 10     # Extra connections when pool full

db = SQLAlchemy(app)

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100))
    email = db.Column(db.String(100))

Laravel Example:

<?php
// config/database.php
'mysql' => [
    'driver' => 'mysql',
    'host' => env('DB_HOST', '127.0.0.1'),
    'database' => env('DB_DATABASE', 'forge'),
    'username' => env('DB_USERNAME', 'forge'),
    'password' => env('DB_PASSWORD', ''),
    
    // Connection pooling
    'pool' => [
        'min' => 5,           // Minimum connections
        'max' => 20,          // Maximum connections
    ],
    
    // Connection options
    'options' => [
        PDO::ATTR_PERSISTENT => true,  // Persistent connections
        PDO::ATTR_TIMEOUT => 5,         // Connection timeout
    ],
],

2. Database Indexing

Without Index: Database scans all rows (slow) With Index: Database uses B-tree lookup (fast)

-- Check query performance
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- Add index
CREATE INDEX idx_users_email ON users(email);

-- Composite index for multiple columns
CREATE INDEX idx_users_status_created ON users(status, created_at);

Performance Impact:

Query without index: 2000ms (scanning 10M rows)
Query with index: 5ms (B-tree lookup)

3. Read Replicas

Concept: Split reads and writes across different database servers.

Master DB (writes only) → replicates to → Replica 1, 2, 3 (reads only)

Flask Example:

from flask_sqlalchemy import SQLAlchemy

app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://master/db'  # Write
app.config['SQLALCHEMY_BINDS'] = {
    'read_replica': 'postgresql://replica/db'  # Read
}

db = SQLAlchemy(app)

# Write operation
@app.route('/api/user', methods=['POST'])
def create_user():
    user = User(name=request.json['name'])
    db.session.add(user)
    db.session.commit()  # Goes to master
    return jsonify({'id': user.id})

# Read operation
@app.route('/api/users')
def get_users():
    # Force read from replica
    users = User.query.with_bind('read_replica').all()
    return jsonify([u.to_dict() for u in users])

Laravel Example:

<?php
// config/database.php
'mysql' => [
    'read' => [
        'host' => [
            '192.168.1.2',  // Replica 1
            '192.168.1.3',  // Replica 2
        ],
    ],
    'write' => [
        'host' => [
            '192.168.1.1',  // Master
        ],
    ],
    'driver' => 'mysql',
    'database' => 'myapp',
],

4. Database Sharding

Concept: Split data across multiple databases.

Users 1-100K    → DB Shard 1
Users 100K-200K → DB Shard 2
Users 200K-300K → DB Shard 3

Simple Sharding Logic:

def get_shard_for_user(user_id):
    shard_count = 4
    shard_id = user_id % shard_count
    return f'shard_{shard_id}'

def get_user_from_shard(user_id):
    shard = get_shard_for_user(user_id)
    # Connect to specific shard
    connection = get_db_connection(shard)
    user = connection.execute(
        'SELECT * FROM users WHERE id = ?', 
        (user_id,)
    ).fetchone()
    return user

Asynchronous Processing

Why Async Matters

Synchronous (Bad):

Request → Process (200ms) → Wait for Email (500ms) → Response (Total: 700ms)

Asynchronous (Good):

Request → Process (200ms) → Queue Email Task → Response (Total: 200ms)
                                ↓
                         Background Worker sends email

Task Queues (Celery for Flask, Queues for Laravel)

Flask + Celery Example:

# celery_app.py
from celery import Celery

celery_app = Celery(
    'tasks',
    broker='redis://localhost:6379/0',
    backend='redis://localhost:6379/1'
)

@celery_app.task
def send_welcome_email(user_email):
    """This runs in background worker"""
    import time
    time.sleep(2)  # Simulate email sending
    print(f"Email sent to {user_email}")
    return True

# app.py
from flask import Flask, request, jsonify
from celery_app import send_welcome_email

app = Flask(__name__)

@app.route('/api/register', methods=['POST'])
def register():
    email = request.json['email']
    
    # Save user to database (fast)
    user = create_user(email)
    
    # Queue email task (don't wait)
    send_welcome_email.delay(email)
    
    # Return immediately
    return jsonify({
        'message': 'Registration successful',
        'user_id': user.id
    }), 201

Start Celery Worker:

celery -A celery_app worker --loglevel=info

Laravel Queue Example:

<?php
// app/Jobs/SendWelcomeEmail.php
namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class SendWelcomeEmail implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
    
    protected $email;
    
    public function __construct($email)
    {
        $this->email = $email;
    }
    
    public function handle()
    {
        // Send email (runs in background)
        sleep(2);
        Log::info("Email sent to {$this->email}");
    }
}

// routes/api.php
Route::post('/register', function (Request $request) {
    $email = $request->input('email');
    
    // Save user (fast)
    $user = User::create(['email' => $email]);
    
    // Dispatch job to queue (don't wait)
    SendWelcomeEmail::dispatch($email);
    
    // Return immediately
    return response()->json([
        'message' => 'Registration successful',
        'user_id' => $user->id
    ], 201);
});

Start Laravel Worker:

php artisan queue:work redis

Rate Limiting

Why Rate Limit?

Protect your API from:

Abuse and DDoS attacks
Single user consuming all resources
Scraping and unauthorized access

Implementation Strategies

1. Fixed Window

Allow N requests per time window
Simple but allows bursts

2. Sliding Window

Smoother rate limiting
More complex to implement

3. Token Bucket

Allows burst traffic
Industry standard

Flask Rate Limiting Example

from flask import Flask, jsonify, request
from redis import Redis
import time

app = Flask(__name__)
redis_client = Redis(host='localhost', port=6379, decode_responses=True)

def rate_limit(max_requests=100, window_seconds=60):
    """Token bucket rate limiter"""
    def decorator(f):
        def wrapper(*args, **kwargs):
            # Use IP or API key as identifier
            identifier = request.headers.get('X-API-Key') or request.remote_addr
            key = f'rate_limit:{identifier}'
            
            # Get current request count
            current = redis_client.get(key)
            
            if current is None:
                # First request in window
                redis_client.setex(key, window_seconds, 1)
                return f(*args, **kwargs)
            
            if int(current) >= max_requests:
                # Rate limit exceeded
                return jsonify({
                    'error': 'Rate limit exceeded',
                    'retry_after': redis_client.ttl(key)
                }), 429
            
            # Increment counter
            redis_client.incr(key)
            return f(*args, **kwargs)
        
        wrapper.__name__ = f.__name__
        return wrapper
    return decorator

@app.route('/api/data')
@rate_limit(max_requests=100, window_seconds=60)
def get_data():
    return jsonify({'data': 'Your data here'})

Laravel Rate Limiting Example

<?php
// routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\RateLimiter;
use Illuminate\Http\Request;

// Define rate limiter
RateLimiter::for('api', function (Request $request) {
    return Limit::perMinute(100)
        ->by($request->header('X-API-Key') ?: $request->ip())
        ->response(function () {
            return response()->json([
                'error' => 'Rate limit exceeded'
            ], 429);
        });
});

// Apply to routes
Route::middleware('throttle:api')->group(function () {
    Route::get('/data', function () {
        return response()->json(['data' => 'Your data here']);
    });
});

// Custom rate limit for specific endpoint
Route::get('/heavy-operation', function () {
    return response()->json(['result' => 'Done']);
})->middleware('throttle:10,1');  // 10 requests per minute

Horizontal Scaling

Vertical vs Horizontal Scaling

Vertical (Scaling Up):

Add more CPU/RAM to one server
Limited by hardware
Expensive
Single point of failure

Horizontal (Scaling Out):

Add more servers
Unlimited scaling
Cost-effective
High availability

Auto-Scaling Strategy

# pseudo-code for auto-scaling logic
if cpu_usage > 70% for 5 minutes:
    add_server()

if cpu_usage < 30% for 15 minutes:
    remove_server()

Stateless Application Design

Bad (Stateful):

# Storing session in memory
user_sessions = {}

@app.route('/login', methods=['POST'])
def login():
    user_sessions[user_id] = {'logged_in': True}  # Problem: lost if server restarts

Good (Stateless):

# Store session in Redis (shared across all servers)
@app.route('/login', methods=['POST'])
def login():
    session_token = generate_token()
    redis_client.setex(
        f'session:{session_token}',
        3600,
        json.dumps({'user_id': user_id})
    )
    return jsonify({'token': session_token})

Docker + Kubernetes for Scaling

Dockerfile (Flask):

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"]

Kubernetes Deployment:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  replicas: 10  # Start with 10 pods
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: your-api:latest
        ports:
        - containerPort: 5000
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi

---
# Auto-scaling configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment
  minReplicas: 10
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Monitoring and Observability

The Three Pillars

Logs: What happened?
Metrics: How much/how fast?
Traces: Where did time go?

1. Logging

Flask Structured Logging:

import logging
import json
from datetime import datetime

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_data = {
            'timestamp': datetime.utcnow().isoformat(),
            'level': record.levelname,
            'message': record.getMessage(),
            'path': record.pathname,
            'line': record.lineno
        }
        return json.dumps(log_data)

# Configure logger
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)

@app.route('/api/user/<int:user_id>')
def get_user(user_id):
    logger.info(f'Fetching user {user_id}')
    # ... your code
    logger.info(f'User {user_id} fetched successfully')

Laravel Logging:

<?php
use Illuminate\Support\Facades\Log;

Route::get('/user/{id}', function ($id) {
    Log::info('Fetching user', ['user_id' => $id]);
    
    $user = User::find($id);
    
    Log::info('User fetched successfully', [
        'user_id' => $id,
        'execution_time_ms' => 50
    ]);
    
    return response()->json($user);
});

2. Metrics Collection

Flask with Prometheus:

from flask import Flask
from prometheus_flask_exporter import PrometheusMetrics

app = Flask(__name__)
metrics = PrometheusMetrics(app)

# Automatic metrics:
# - Request count
# - Request duration
# - Request errors

# Custom metrics
from prometheus_client import Counter, Histogram

user_registrations = Counter(
    'user_registrations_total',
    'Total user registrations'
)

db_query_duration = Histogram(
    'db_query_duration_seconds',
    'Database query duration'
)

@app.route('/api/register', methods=['POST'])
def register():
    # ... registration logic
    user_registrations.inc()
    return jsonify({'status': 'success'})

# Metrics endpoint: /metrics

3. Key Metrics to Track

# Request metrics
- Requests per second (RPS)
- Average response time
- 95th percentile response time (p95)
- 99th percentile response time (p99)
- Error rate

# System metrics
- CPU usage
- Memory usage
- Disk I/O
- Network throughput

# Application metrics
- Active database connections
- Cache hit rate
- Queue length
- Worker processing time

4. Alerting Rules

# Example Prometheus alerting rules
groups:
- name: api_alerts
  rules:
  - alert: HighErrorRate
    expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
    for: 5m
    annotations:
      summary: "High error rate detected"
      
  - alert: HighLatency
    expr: http_request_duration_seconds{quantile="0.95"} > 1
    for: 5m
    annotations:
      summary: "95th percentile latency > 1s"
      
  - alert: HighCPU
    expr: cpu_usage_percent > 80
    for: 10m
    annotations:
      summary: "CPU usage above 80%"

Real-World Implementation

Complete Production-Ready Flask API

# app.py
from flask import Flask, jsonify, request
from flask_sqlalchemy import SQLAlchemy
from redis import Redis
from prometheus_flask_exporter import PrometheusMetrics
import json
import logging

# Initialize Flask app
app = Flask(__name__)

# Database configuration
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://user:pass@db-master/mydb'
app.config['SQLALCHEMY_BINDS'] = {
    'read_replica': 'postgresql://user:pass@db-replica/mydb'
}
app.config['SQLALCHEMY_POOL_SIZE'] = 20
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 10
app.config['SQLALCHEMY_POOL_RECYCLE'] = 3600

db = SQLAlchemy(app)
redis_client = Redis(host='redis', port=6379, decode_responses=True)
metrics = PrometheusMetrics(app)

# Logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Models
class User(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100))
    email = db.Column(db.String(100), unique=True, index=True)
    
    def to_dict(self):
        return {
            'id': self.id,
            'name': self.name,
            'email': self.email
        }

# Rate limiting decorator
def rate_limit(max_requests=100, window=60):
    def decorator(f):
        def wrapper(*args, **kwargs):
            key = f'rate_limit:{request.remote_addr}'
            current = redis_client.get(key)
            
            if current and int(current) >= max_requests:
                return jsonify({'error': 'Rate limit exceeded'}), 429
            
            pipe = redis_client.pipeline()
            pipe.incr(key)
            pipe.expire(key, window)
            pipe.execute()
            
            return f(*args, **kwargs)
        wrapper.__name__ = f.__name__
        return wrapper
    return decorator

# Caching decorator
def cached(ttl=3600):
    def decorator(f):
        def wrapper(*args, **kwargs):
            # Generate cache key from function name and arguments
            cache_key = f'cache:{f.__name__}:{":".join(map(str, args))}'
            
            # Try cache
            cached_result = redis_client.get(cache_key)
            if cached_result:
                logger.info(f'Cache HIT for {cache_key}')
                return jsonify(json.loads(cached_result))
            
            # Execute function
            logger.info(f'Cache MISS for {cache_key}')
            result = f(*args, **kwargs)
            
            # Store in cache
            if result:
                redis_client.setex(cache_key, ttl, json.dumps(result.get_json()))
            
            return result
        wrapper.__name__ = f.__name__
        return wrapper
    return decorator

# Health check endpoint
@app.route('/health')
def health():
    return jsonify({'status': 'healthy'}), 200

# API endpoints
@app.route('/api/users', methods=['GET'])
@rate_limit(max_requests=1000, window=60)
def get_users():
    """Get all users with pagination"""
    page = request.args.get('page', 1, type=int)
    per_page = request.args.get('per_page', 20, type=int)
    
    # Use read replica for reads
    users = User.query.with_bind('read_replica')\
        .paginate(page=page, per_page=per_page, error_out=False)
    
    return jsonify({
        'users': [u.to_dict() for u in users.items],
        'total': users.total,
        'page': page,
        'per_page': per_page
    })

@app.route('/api/user/<int:user_id>', methods=['GET'])
@rate_limit(max_requests=1000, window=60)
@cached(ttl=3600)
def get_user(user_id):
    """Get single user with caching"""
    user = User.query.with_bind('read_replica').get(user_id)
    
    if not user:
        return jsonify({'error': 'User not found'}), 404
    
    return jsonify(user.to_dict())

@app.route('/api/users', methods=['POST'])
@rate_limit(max_requests=100, window=60)
def create_user():
    """Create new user"""
    data = request.get_json()
    
    # Validation
    if not data.get('email') or not data.get('name'):
        return jsonify({'error': 'Missing required fields'}), 400
    
    # Create user (writes go to master)
    user = User(name=data['name'], email=data['email'])
    
    try:
        db.session.add(user)
        db.session.commit()
        logger.info(f'User created: {user.id}')
        return jsonify(user.to_dict()), 201
    except Exception as e:
        db.session.rollback()
        logger.error(f'Error creating user: {str(e)}')
        return jsonify({'error': 'Failed to create user'}), 500

@app.route('/api/user/<int:user_id>', methods=['DELETE'])
@rate_limit(max_requests=50, window=60)
def delete_user(user_id):
    """Delete user and invalidate cache"""
    user = User.query.get(user_id)
    
    if not user:
        return jsonify({'error': 'User not found'}), 404
    
    try:
        db.session.delete(user)
        db.session.commit()
        
        # Invalidate cache
        redis_client.delete(f'cache:get_user:{user_id}')
        
        logger.info(f'User deleted: {user_id}')
        return jsonify({'message': 'User deleted'}), 200
    except Exception as e:
        db.session.rollback()
        logger.error(f'Error deleting user: {str(e)}')
        return jsonify({'error': 'Failed to delete user'}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

requirements.txt:

Flask==3.0.0
Flask-SQLAlchemy==3.1.1
psycopg2-binary==2.9.9
redis==5.0.1
gunicorn==21.2.0
prometheus-flask-exporter==0.23.0

Running the Flask API:

# Development
python app.py

# Production with Gunicorn (4 workers)
gunicorn -w 4 -b 0.0.0.0:5000 app:app

# Production with more workers and threads
gunicorn -w 8 --threads 2 -b 0.0.0.0:5000 --timeout 30 --keep-alive 5 app:app

Complete Production-Ready Laravel API

<?php
// routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\Redis;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Log;
use App\Models\User;
use Illuminate\Http\Request;

// Health check
Route::get('/health', function () {
    return response()->json(['status' => 'healthy']);
});

// Rate limiting middleware group
Route::middleware(['throttle:api'])->group(function () {
    
    // Get all users with pagination
    Route::get('/users', function (Request $request) {
        $page = $request->query('page', 1);
        $perPage = $request->query('per_page', 20);
        
        // Use read replica
        $users = User::on('read_replica')
            ->paginate($perPage, ['*'], 'page', $page);
        
        return response()->json([
            'users' => $users->items(),
            'total' => $users->total(),
            'page' => $users->currentPage(),
            'per_page' => $users->perPage()
        ]);
    });
    
    // Get single user with caching
    Route::get('/user/{id}', function ($id) {
        $cacheKey = "user:{$id}";
        
        // Try cache first
        $user = Cache::remember($cacheKey, 3600, function () use ($id) {
            Log::info("Cache MISS for user {$id}");
            return User::on('read_replica')->find($id);
        });
        
        if (!$user) {
            return response()->json(['error' => 'User not found'], 404);
        }
        
        Log::info("Cache HIT for user {$id}");
        return response()->json($user);
    });
    
    // Create user
    Route::post('/users', function (Request $request) {
        $validated = $request->validate([
            'name' => 'required|string|max:100',
            'email' => 'required|email|unique:users,email'
        ]);
        
        try {
            $user = User::create($validated);
            Log::info("User created: {$user->id}");
            
            return response()->json($user, 201);
        } catch (\Exception $e) {
            Log::error("Error creating user: {$e->getMessage()}");
            return response()->json(['error' => 'Failed to create user'], 500);
        }
    })->middleware('throttle:100,1'); // Stricter rate limit
    
    // Delete user
    Route::delete('/user/{id}', function ($id) {
        $user = User::find($id);
        
        if (!$user) {
            return response()->json(['error' => 'User not found'], 404);
        }
        
        try {
            $user->delete();
            
            // Invalidate cache
            Cache::forget("user:{$id}");
            
            Log::info("User deleted: {$id}");
            return response()->json(['message' => 'User deleted'], 200);
        } catch (\Exception $e) {
            Log::error("Error deleting user: {$e->getMessage()}");
            return response()->json(['error' => 'Failed to delete user'], 500);
        }
    })->middleware('throttle:50,1');
});

User Model:

<?php
// app/Models/User.php
namespace App\Models;

use Illuminate\Database\Eloquent\Model;

class User extends Model
{
    protected $fillable = ['name', 'email'];
    
    protected $hidden = ['password'];
    
    // Specify which connection to use
    protected $connection = 'mysql';
}

Database Configuration:

<?php
// config/database.php
return [
    'default' => env('DB_CONNECTION', 'mysql'),
    
    'connections' => [
        'mysql' => [
            'driver' => 'mysql',
            'read' => [
                'host' => [
                    env('DB_READ_HOST_1', '192.168.1.2'),
                    env('DB_READ_HOST_2', '192.168.1.3'),
                ],
            ],
            'write' => [
                'host' => [
                    env('DB_WRITE_HOST', '192.168.1.1'),
                ],
            ],
            'sticky' => true,  // Ensure reads after writes use master
            'database' => env('DB_DATABASE', 'myapp'),
            'username' => env('DB_USERNAME', 'root'),
            'password' => env('DB_PASSWORD', ''),
            'charset' => 'utf8mb4',
            'collation' => 'utf8mb4_unicode_ci',
            'prefix' => '',
            
            // Connection pooling
            'options' => [
                \PDO::ATTR_PERSISTENT => true,
                \PDO::ATTR_TIMEOUT => 5,
            ],
        ],
        
        'read_replica' => [
            'driver' => 'mysql',
            'host' => env('DB_READ_HOST_1', '192.168.1.2'),
            'database' => env('DB_DATABASE', 'myapp'),
            'username' => env('DB_USERNAME', 'root'),
            'password' => env('DB_PASSWORD', ''),
        ],
    ],
    
    'redis' => [
        'client' => env('REDIS_CLIENT', 'phpredis'),
        
        'default' => [
            'host' => env('REDIS_HOST', '127.0.0.1'),
            'password' => env('REDIS_PASSWORD', null),
            'port' => env('REDIS_PORT', 6379),
            'database' => 0,
        ],
        
        'cache' => [
            'host' => env('REDIS_HOST', '127.0.0.1'),
            'password' => env('REDIS_PASSWORD', null),
            'port' => env('REDIS_PORT', 6379),
            'database' => 1,
        ],
    ],
];

Running Laravel API:

# Install dependencies
composer install

# Run migrations
php artisan migrate

# Production server with PHP-FPM + NGINX
# Install: apt-get install php-fpm nginx

# Or use Laravel Octane for better performance
composer require laravel/octane
php artisan octane:install --server=swoole

# Run with Octane (handles 10x more requests)
php artisan octane:start --workers=8 --max-requests=1000

Performance Optimization Checklist

Application Level

Enable gzip compression
Minimize response payload size
Use pagination for large datasets
Implement connection pooling
Use async processing for heavy tasks
Add appropriate indexes to database
Avoid N+1 queries
Use eager loading for relationships

Infrastructure Level

Deploy behind load balancer
Use CDN for static assets
Implement multi-layer caching
Set up database read replicas
Use container orchestration (Kubernetes)
Enable auto-scaling
Deploy in multiple regions
Use managed services (RDS, ElastiCache)

Monitoring Level

Track key metrics (RPS, latency, errors)
Set up alerting for anomalies
Implement distributed tracing
Log all critical events
Monitor database performance
Track cache hit rates
Monitor server resources

Architecture Diagram

                          ┌─────────────┐
                          │   Internet  │
                          └──────┬──────┘
                                 │
                          ┌──────▼──────┐
                          │     CDN     │
                          │ (CloudFlare)│
                          └──────┬──────┘
                                 │
                    ┌────────────▼────────────┐
                    │   Load Balancer (ELB)   │
                    │   nginx / HAProxy       │
                    └──┬────────┬──────┬──────┘
                       │        │      │
         ┌─────────────┼────────┼──────┼─────────────┐
         │             │        │      │             │
    ┌────▼───┐   ┌────▼───┐   ...   ┌▼────┐   ┌────▼───┐
    │ API    │   │ API    │         │ API │   │ API    │
    │ Server │   │ Server │         │ ... │   │ Server │
    │   1    │   │   2    │         │     │   │  100   │
    └────┬───┘   └────┬───┘         └─────┘   └────┬───┘
         │            │                            │
         └────────────┼────────────────────────────┘
                      │
         ┌────────────┼────────────┐
         │            │            │
    ┌────▼───┐   ┌───▼────┐   ┌───▼────┐
    │ Redis  │   │ Redis  │   │ Redis  │
    │ Cache  │   │ Queue  │   │Session │
    └────────┘   └───┬────┘   └────────┘
                     │
                ┌────▼────┐
                │ Celery  │
                │ Workers │
                └─────────┘
                     │
         ┌───────────┼───────────┐
         │           │           │
    ┌────▼──┐   ┌───▼───┐   ┌───▼───┐
    │ DB    │   │ DB    │   │ DB    │
    │Master │───│Replica│   │Replica│
    └───────┘   └───────┘   └───────┘

Deployment Example (Docker Compose)

# docker-compose.yml
version: '3.8'

services:
  # Load Balancer
  nginx:
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - api1
      - api2
      - api3
  
  # API Servers
  api1:
    build: .
    environment:
      - DB_HOST=postgres-master
      - REDIS_HOST=redis
    depends_on:
      - postgres-master
      - redis
  
  api2:
    build: .
    environment:
      - DB_HOST=postgres-master
      - REDIS_HOST=redis
    depends_on:
      - postgres-master
      - redis
  
  api3:
    build: .
    environment:
      - DB_HOST=postgres-master
      - REDIS_HOST=redis
    depends_on:
      - postgres-master
      - redis
  
  # Redis Cache
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru
  
  # PostgreSQL Master
  postgres-master:
    image: postgres:16
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: secret
    volumes:
      - postgres-data:/var/lib/postgresql/data
    ports:
      - "5432:5432"
  
  # PostgreSQL Replica
  postgres-replica:
    image: postgres:16
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: secret
    ports:
      - "5433:5432"
  
  # Celery Worker (for Flask)
  celery-worker:
    build: .
    command: celery -A celery_app worker --loglevel=info --concurrency=4
    environment:
      - REDIS_HOST=redis
    depends_on:
      - redis
  
  # Monitoring - Prometheus
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
  
  # Monitoring - Grafana
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

volumes:
  postgres-data:

Load Testing

Using Apache Bench (ab)

# Install
apt-get install apache2-utils

# Test 1: Simple load test
ab -n 10000 -c 100 http://localhost/api/users
# -n: Total requests
# -c: Concurrent requests

# Test 2: POST request
ab -n 1000 -c 50 -p data.json -T application/json http://localhost/api/users

# Test 3: With authentication
ab -n 5000 -c 100 -H "Authorization: Bearer token123" http://localhost/api/user/1

Using wrk (More Advanced)

# Install
apt-get install wrk

# Test 1: Basic load test
wrk -t12 -c400 -d30s http://localhost/api/users
# -t: Threads
# -c: Connections
# -d: Duration

# Test 2: With script
wrk -t12 -c400 -d30s -s script.lua http://localhost/api/users

# script.lua
wrk.method = "POST"
wrk.body   = '{"name":"John","email":"john@example.com"}'
wrk.headers["Content-Type"] = "application/json"

Using Locust (Python)

# locustfile.py
from locust import HttpUser, task, between

class APIUser(HttpUser):
    wait_time = between(1, 3)  # Wait 1-3 seconds between requests
    
    @task(3)  # Weight: 3x more likely
    def get_users(self):
        self.client.get("/api/users")
    
    @task(1)
    def get_user(self):
        self.client.get("/api/user/1")
    
    @task(1)
    def create_user(self):
        self.client.post("/api/users", json={
            "name": "Test User",
            "email": "test@example.com"
        })

# Run locust
# locust -f locustfile.py --host=http://localhost
# Open browser: http://localhost:8089
# Set users: 10000, Spawn rate: 100/s

Reading Load Test Results

# Example output from wrk:
Running 30s test @ http://localhost/api/users
  12 threads and 400 connections
  
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    45.23ms   12.45ms  200.00ms   75.23%
    Req/Sec     8.85k   890.00    11.20k    68.12%
    
  3180000 requests in 30.00s, 1.20GB read
  
Requests/sec: 106000.00  ← Your RPS!
Transfer/sec:  41.00MB

Key Metrics:
- Requests/sec: 106,000 (Good! Target: 1M needs ~10 such servers)
- Avg Latency: 45.23ms (Good! Under 100ms)
- Max Latency: 200ms (Acceptable)
- Errors: 0 (Perfect!)

Database Schema Best Practices

Optimal Schema Design

-- Users table with indexes
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    name VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Indexes for common queries
CREATE INDEX idx_users_email ON users(email);  -- For lookups
CREATE INDEX idx_users_created ON users(created_at);  -- For sorting

-- Composite index for filtered queries
CREATE INDEX idx_users_status_created ON users(status, created_at) 
WHERE status = 'active';

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

Avoiding N+1 Queries

Bad (N+1 Query Problem):

# Flask/SQLAlchemy
@app.route('/api/posts')
def get_posts():
    posts = Post.query.all()
    return jsonify([{
        'id': p.id,
        'title': p.title,
        'author': p.author.name  # Triggers separate query for EACH post!
    } for p in posts])

# Generates: 
# SELECT * FROM posts;           (1 query)
# SELECT * FROM users WHERE id=1; (N queries)
# SELECT * FROM users WHERE id=2;
# ... (N more queries)

Good (Eager Loading):

# Flask/SQLAlchemy
from sqlalchemy.orm import joinedload

@app.route('/api/posts')
def get_posts():
    posts = Post.query.options(joinedload(Post.author)).all()
    return jsonify([{
        'id': p.id,
        'title': p.title,
        'author': p.author.name  # No additional query!
    } for p in posts])

# Generates only:
# SELECT * FROM posts JOIN users ON posts.author_id = users.id;

Laravel Example:

// Bad
$posts = Post::all();
foreach ($posts as $post) {
    echo $post->author->name;  // N+1 problem
}

// Good
$posts = Post::with('author')->get();  // Eager loading
foreach ($posts as $post) {
    echo $post->author->name;  // No extra queries
}

Security Best Practices

1. API Authentication

JWT (JSON Web Tokens):

# Flask with JWT
from flask_jwt_extended import JWTManager, create_access_token, jwt_required

app.config['JWT_SECRET_KEY'] = 'super-secret-key'
jwt = JWTManager(app)

@app.route('/api/login', methods=['POST'])
def login():
    email = request.json.get('email')
    password = request.json.get('password')
    
    # Validate credentials
    user = authenticate_user(email, password)
    if not user:
        return jsonify({'error': 'Invalid credentials'}), 401
    
    # Create token
    access_token = create_access_token(identity=user.id)
    return jsonify({'token': access_token})

@app.route('/api/protected')
@jwt_required()
def protected():
    return jsonify({'message': 'Access granted'})

2. Input Validation

# Flask with marshmallow
from marshmallow import Schema, fields, validate, ValidationError

class UserSchema(Schema):
    name = fields.Str(required=True, validate=validate.Length(min=1, max=100))
    email = fields.Email(required=True)
    age = fields.Int(validate=validate.Range(min=0, max=150))

@app.route('/api/users', methods=['POST'])
def create_user():
    schema = UserSchema()
    
    try:
        data = schema.load(request.json)
    except ValidationError as err:
        return jsonify({'errors': err.messages}), 400
    
    # Data is validated and safe to use
    user = User(**data)
    db.session.add(user)
    db.session.commit()
    
    return jsonify(user.to_dict()), 201

3. SQL Injection Prevention

# Bad - Vulnerable to SQL injection
query = f"SELECT * FROM users WHERE email = '{user_input}'"

# Good - Use parameterized queries
query = "SELECT * FROM users WHERE email = %s"
result = db.execute(query, (user_input,))

# ORM (automatically safe)
user = User.query.filter_by(email=user_input).first()

4. CORS Configuration

# Flask-CORS
from flask_cors import CORS

app = Flask(__name__)
CORS(app, resources={
    r"/api/*": {
        "origins": ["https://myapp.com"],
        "methods": ["GET", "POST", "PUT", "DELETE"],
        "allow_headers": ["Content-Type", "Authorization"],
        "max_age": 3600
    }
})

Cost Optimization

Cloud Provider Comparison (Monthly)

Scenario: 1M RPS, 99.99% uptime

AWS:
- 100 EC2 instances (t3.medium): $3,600
- Application Load Balancer: $30
- RDS (Multi-AZ): $400
- ElastiCache (Redis): $200
- CloudFront (CDN): $150
- Data transfer: $500
Total: ~$4,880/month

Google Cloud:
- 100 GCE instances: $3,200
- Cloud Load Balancing: $25
- Cloud SQL: $350
- Memorystore (Redis): $180
- Cloud CDN: $120
- Data transfer: $450
Total: ~$4,325/month

Azure:
- 100 VMs: $3,400
- Load Balancer: $35
- Azure Database: $380
- Azure Cache: $190
- Azure CDN: $140
- Data transfer: $480
Total: ~$4,625/month

Cost Saving Tips

Use spot/preemptible instances (60-90% discount)
Reserved instances (30-50% discount for 1-3 year commitment)
Auto-scaling: Scale down during low traffic
Aggressive caching: Reduce database/compute load
Optimize data transfer: Use CDN, compress responses
Use managed services: RDS cheaper than self-managed
Multi-region carefully: Only if needed for latency

Troubleshooting Common Issues

Issue 1: High Latency

Symptoms: Response time > 1 second

Debug Steps:

import time

@app.before_request
def start_timer():
    g.start = time.time()

@app.after_request
def log_request(response):
    duration = time.time() - g.start
    if duration > 0.5:  # Log slow requests
        logger.warning(f'Slow request: {request.path} took {duration:.2f}s')
    return response

Common Causes:

Slow database queries → Add indexes
No caching → Implement Redis
N+1 queries → Use eager loading
Blocking I/O → Use async

Issue 2: High Error Rate

Symptoms: 500 errors, connection timeouts

Debug:

@app.errorhandler(500)
def internal_error(error):
    logger.error(f'Internal error: {error}', exc_info=True)
    db.session.rollback()
    return jsonify({'error': 'Internal server error'}), 500

@app.errorhandler(Exception)
def handle_exception(e):
    logger.error(f'Unhandled exception: {e}', exc_info=True)
    return jsonify({'error': str(e)}), 500

Common Causes:

Database connection pool exhausted
Memory leaks
Unhandled exceptions
External API failures

Issue 3: Memory Leaks

Debug:

import tracemalloc

tracemalloc.start()

@app.route('/api/memory-snapshot')
def memory_snapshot():
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')
    
    return jsonify([{
        'file': str(stat.traceback),
        'size_mb': stat.size / 1024 / 1024
    } for stat in top_stats[:10]])

Conclusion

Building an API that handles 1 million requests per second requires:

Distributed Architecture: No single server can handle this load
Caching Everything: Reduces database load by 80-95%
Database Optimization: Read replicas, connection pooling, indexes
Async Processing: Don’t make users wait for background tasks
Monitoring: You can’t improve what you don’t measure
Load Balancing: Distribute traffic evenly
Auto-Scaling: Add servers during traffic spikes
Rate Limiting: Protect against abuse

Key Takeaways

Start simple, scale gradually
Measure before optimizing
Cache aggressively
Database is usually the bottleneck
Horizontal scaling > Vertical scaling
Monitor everything
Test under load before going live

Next Steps

Implement basic caching (biggest win)
Set up connection pooling
Add database indexes
Implement rate limiting
Deploy behind load balancer
Set up monitoring
Load test regularly
Scale horizontally

Remember: Premature optimization is the root of all evil. Build for your current scale, but architect for future growth.

Happy scaling!

Categorized in:

API Development

Tagged in:

API load balancing caching strategies, API scalability 1 million requests, Flask Laravel API optimization, high performance API design

Press ESC to close

Or check our Popular Categories...

Table of Contents

Understanding the Challenge

What Does 1 Million RPS Mean?

Why Single Server Approach Fails

Architecture Fundamentals

The High-Level Architecture

Key Principles

Load Balancing

What is a Load Balancer?

Load Balancing Algorithms

Implementation with NGINX

Hardware Load Balancers vs Software

Caching Strategies

Why Caching Matters

Caching Layers

1. In-Memory Caching (Redis/Memcached)

2. Cache Invalidation Strategies

3. CDN for Static Assets

Database Optimization

The Database is Your Bottleneck

1. Connection Pooling

2. Database Indexing

3. Read Replicas

4. Database Sharding

Asynchronous Processing

Why Async Matters

Task Queues (Celery for Flask, Queues for Laravel)

Rate Limiting

Why Rate Limit?

Implementation Strategies

Flask Rate Limiting Example

Laravel Rate Limiting Example

Horizontal Scaling

Vertical vs Horizontal Scaling

Auto-Scaling Strategy

Stateless Application Design

Docker + Kubernetes for Scaling

Monitoring and Observability

The Three Pillars

1. Logging

2. Metrics Collection

3. Key Metrics to Track

4. Alerting Rules

Real-World Implementation

Complete Production-Ready Flask API

Complete Production-Ready Laravel API

Performance Optimization Checklist

Application Level

Infrastructure Level

Monitoring Level

Architecture Diagram

Deployment Example (Docker Compose)

Load Testing

Using Apache Bench (ab)

Using wrk (More Advanced)

Using Locust (Python)

Reading Load Test Results

Database Schema Best Practices

Optimal Schema Design

Avoiding N+1 Queries

Security Best Practices

1. API Authentication

2. Input Validation

3. SQL Injection Prevention

4. CORS Configuration

Cost Optimization

Cloud Provider Comparison (Monthly)

Cost Saving Tips

Troubleshooting Common Issues

Issue 1: High Latency

Issue 2: High Error Rate

Issue 3: Memory Leaks

Conclusion

Key Takeaways

Next Steps

Comments

Leave a Reply Cancel reply

Related Articles