Table of Contents
- Understanding the Challenge
- Architecture Fundamentals
- Load Balancing
- Caching Strategies
- Database Optimization
- Asynchronous Processing
- Rate Limiting
- Horizontal Scaling
- Monitoring and Observability
- Real-World Implementation
Understanding the Challenge
What Does 1 Million RPS Mean?
1 million requests per second (RPS) means your API needs to handle:
- 1,000,000 requests every second
- 60 million requests per minute
- 86.4 billion requests per day
To put this in perspective:
- A single server typically handles 1,000-10,000 RPS
- Twitter handles around 6,000 tweets per second at peak
- Netflix serves millions of concurrent streams
Why Single Server Approach Fails
A typical Flask or Laravel application on one server can handle:
- Flask: 500-2,000 RPS (with Gunicorn/uWSGI)
- Laravel: 300-1,500 RPS (with PHP-FPM)
Bottlenecks:
- CPU processing limits
- Memory constraints
- Network bandwidth
- Database connections
- Disk I/O
The solution: Distributed architecture, not bigger servers.
Architecture Fundamentals
The High-Level Architecture
Internet → CDN → Load Balancer → API Servers → Cache → Database
↓
Message Queue → Workers
Key Principles
- Statelessness: Each request is independent
- Horizontal Scaling: Add more servers, not bigger ones
- Caching: Store frequently accessed data
- Async Processing: Don’t make users wait
- Database Optimization: Your database is often the bottleneck
Load Balancing
What is a Load Balancer?
A load balancer distributes incoming requests across multiple servers, preventing any single server from being overwhelmed.
Load Balancing Algorithms
1. Round Robin
- Distributes requests evenly in sequence
- Simple but doesn’t consider server load
2. Least Connections
- Sends requests to server with fewest active connections
- Better for long-running requests
3. IP Hash
- Routes based on client IP
- Ensures same client goes to same server
Implementation with NGINX
# /etc/nginx/nginx.conf
http {
upstream api_backend {
least_conn; # Use least connections algorithm
server api1.example.com:5000 weight=3;
server api2.example.com:5000 weight=2;
server api3.example.com:5000 weight=2;
server api4.example.com:5000 backup; # Backup server
keepalive 32; # Connection pooling
}
server {
listen 80;
location /api/ {
proxy_pass http://api_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 10s;
proxy_read_timeout 10s;
}
}
}
Hardware Load Balancers vs Software
Hardware (F5, Citrix):
- Expensive ($10,000+)
- High performance
- Dedicated hardware
Software (NGINX, HAProxy, AWS ALB):
- Cost-effective
- Flexible and programmable
- Easy to scale
Caching Strategies
Why Caching Matters
Caching can reduce database load by 80-95% and response time from 200ms to 5ms.
Caching Layers
Browser Cache → CDN → Application Cache → Database Cache
1. In-Memory Caching (Redis/Memcached)
Flask Example with Redis:
from flask import Flask, jsonify
from redis import Redis
import json
import time
app = Flask(__name__)
redis_client = Redis(host='localhost', port=6379, decode_responses=True)
@app.route('/api/user/<int:user_id>')
def get_user(user_id):
# Try cache first
cache_key = f'user:{user_id}'
cached_user = redis_client.get(cache_key)
if cached_user:
print("Cache HIT")
return jsonify(json.loads(cached_user))
# Cache MISS - fetch from database
print("Cache MISS")
user = fetch_user_from_db(user_id) # Expensive DB query
# Store in cache for 1 hour
redis_client.setex(
cache_key,
3600, # TTL in seconds
json.dumps(user)
)
return jsonify(user)
def fetch_user_from_db(user_id):
time.sleep(0.1) # Simulate slow DB query
return {
'id': user_id,
'name': 'John Doe',
'email': 'john@example.com'
}
Laravel Example with Redis:
<?php
// routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\Redis;
use Illuminate\Support\Facades\DB;
Route::get('/user/{id}', function ($id) {
$cacheKey = "user:{$id}";
// Try cache first
$cached = Redis::get($cacheKey);
if ($cached) {
Log::info("Cache HIT");
return response()->json(json_decode($cached));
}
// Cache MISS - fetch from database
Log::info("Cache MISS");
$user = DB::table('users')->find($id);
// Store in cache for 1 hour
Redis::setex($cacheKey, 3600, json_encode($user));
return response()->json($user);
});
2. Cache Invalidation Strategies
Problem: When do you update/delete cached data?
Strategy 1: Time-based (TTL – Time To Live)
# Cache expires after 1 hour
redis_client.setex('key', 3600, value)
Strategy 2: Event-based
@app.route('/api/user/<int:user_id>', methods=['PUT'])
def update_user(user_id):
# Update database
update_user_in_db(user_id, request.json)
# Invalidate cache
redis_client.delete(f'user:{user_id}')
return jsonify({'message': 'Updated'})
Strategy 3: Cache Aside Pattern
def get_user_with_cache_aside(user_id):
cache_key = f'user:{user_id}'
# 1. Try cache
user = redis_client.get(cache_key)
if user:
return json.loads(user)
# 2. Fetch from DB
user = fetch_user_from_db(user_id)
# 3. Populate cache
redis_client.setex(cache_key, 3600, json.dumps(user))
return user
3. CDN for Static Assets
Use CDNs (CloudFlare, AWS CloudFront, Fastly) for:
- Images, CSS, JavaScript
- API responses for public data
- Reduces server load by 60-80%
Database Optimization
The Database is Your Bottleneck
At high scale, 80% of performance issues come from the database.
1. Connection Pooling
Problem: Creating database connections is expensive (50-100ms each).
Solution: Reuse connections from a pool.
Flask Example:
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
# Connection pooling configuration
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://user:pass@localhost/db'
app.config['SQLALCHEMY_POOL_SIZE'] = 20 # Max connections
app.config['SQLALCHEMY_POOL_TIMEOUT'] = 10 # Wait time for connection
app.config['SQLALCHEMY_POOL_RECYCLE'] = 3600 # Recycle connections every hour
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 10 # Extra connections when pool full
db = SQLAlchemy(app)
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100))
email = db.Column(db.String(100))
Laravel Example:
<?php
// config/database.php
'mysql' => [
'driver' => 'mysql',
'host' => env('DB_HOST', '127.0.0.1'),
'database' => env('DB_DATABASE', 'forge'),
'username' => env('DB_USERNAME', 'forge'),
'password' => env('DB_PASSWORD', ''),
// Connection pooling
'pool' => [
'min' => 5, // Minimum connections
'max' => 20, // Maximum connections
],
// Connection options
'options' => [
PDO::ATTR_PERSISTENT => true, // Persistent connections
PDO::ATTR_TIMEOUT => 5, // Connection timeout
],
],
2. Database Indexing
Without Index: Database scans all rows (slow) With Index: Database uses B-tree lookup (fast)
-- Check query performance
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';
-- Add index
CREATE INDEX idx_users_email ON users(email);
-- Composite index for multiple columns
CREATE INDEX idx_users_status_created ON users(status, created_at);
Performance Impact:
- Query without index: 2000ms (scanning 10M rows)
- Query with index: 5ms (B-tree lookup)
3. Read Replicas
Concept: Split reads and writes across different database servers.
Master DB (writes only) → replicates to → Replica 1, 2, 3 (reads only)
Flask Example:
from flask_sqlalchemy import SQLAlchemy
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://master/db' # Write
app.config['SQLALCHEMY_BINDS'] = {
'read_replica': 'postgresql://replica/db' # Read
}
db = SQLAlchemy(app)
# Write operation
@app.route('/api/user', methods=['POST'])
def create_user():
user = User(name=request.json['name'])
db.session.add(user)
db.session.commit() # Goes to master
return jsonify({'id': user.id})
# Read operation
@app.route('/api/users')
def get_users():
# Force read from replica
users = User.query.with_bind('read_replica').all()
return jsonify([u.to_dict() for u in users])
Laravel Example:
<?php
// config/database.php
'mysql' => [
'read' => [
'host' => [
'192.168.1.2', // Replica 1
'192.168.1.3', // Replica 2
],
],
'write' => [
'host' => [
'192.168.1.1', // Master
],
],
'driver' => 'mysql',
'database' => 'myapp',
],
4. Database Sharding
Concept: Split data across multiple databases.
Users 1-100K → DB Shard 1
Users 100K-200K → DB Shard 2
Users 200K-300K → DB Shard 3
Simple Sharding Logic:
def get_shard_for_user(user_id):
shard_count = 4
shard_id = user_id % shard_count
return f'shard_{shard_id}'
def get_user_from_shard(user_id):
shard = get_shard_for_user(user_id)
# Connect to specific shard
connection = get_db_connection(shard)
user = connection.execute(
'SELECT * FROM users WHERE id = ?',
(user_id,)
).fetchone()
return user
Asynchronous Processing
Why Async Matters
Synchronous (Bad):
Request → Process (200ms) → Wait for Email (500ms) → Response (Total: 700ms)
Asynchronous (Good):
Request → Process (200ms) → Queue Email Task → Response (Total: 200ms)
↓
Background Worker sends email
Task Queues (Celery for Flask, Queues for Laravel)
Flask + Celery Example:
# celery_app.py
from celery import Celery
celery_app = Celery(
'tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/1'
)
@celery_app.task
def send_welcome_email(user_email):
"""This runs in background worker"""
import time
time.sleep(2) # Simulate email sending
print(f"Email sent to {user_email}")
return True
# app.py
from flask import Flask, request, jsonify
from celery_app import send_welcome_email
app = Flask(__name__)
@app.route('/api/register', methods=['POST'])
def register():
email = request.json['email']
# Save user to database (fast)
user = create_user(email)
# Queue email task (don't wait)
send_welcome_email.delay(email)
# Return immediately
return jsonify({
'message': 'Registration successful',
'user_id': user.id
}), 201
Start Celery Worker:
celery -A celery_app worker --loglevel=info
Laravel Queue Example:
<?php
// app/Jobs/SendWelcomeEmail.php
namespace App\Jobs;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
class SendWelcomeEmail implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
protected $email;
public function __construct($email)
{
$this->email = $email;
}
public function handle()
{
// Send email (runs in background)
sleep(2);
Log::info("Email sent to {$this->email}");
}
}
// routes/api.php
Route::post('/register', function (Request $request) {
$email = $request->input('email');
// Save user (fast)
$user = User::create(['email' => $email]);
// Dispatch job to queue (don't wait)
SendWelcomeEmail::dispatch($email);
// Return immediately
return response()->json([
'message' => 'Registration successful',
'user_id' => $user->id
], 201);
});
Start Laravel Worker:
php artisan queue:work redis
Rate Limiting
Why Rate Limit?
Protect your API from:
- Abuse and DDoS attacks
- Single user consuming all resources
- Scraping and unauthorized access
Implementation Strategies
1. Fixed Window
- Allow N requests per time window
- Simple but allows bursts
2. Sliding Window
- Smoother rate limiting
- More complex to implement
3. Token Bucket
- Allows burst traffic
- Industry standard
Flask Rate Limiting Example
from flask import Flask, jsonify, request
from redis import Redis
import time
app = Flask(__name__)
redis_client = Redis(host='localhost', port=6379, decode_responses=True)
def rate_limit(max_requests=100, window_seconds=60):
"""Token bucket rate limiter"""
def decorator(f):
def wrapper(*args, **kwargs):
# Use IP or API key as identifier
identifier = request.headers.get('X-API-Key') or request.remote_addr
key = f'rate_limit:{identifier}'
# Get current request count
current = redis_client.get(key)
if current is None:
# First request in window
redis_client.setex(key, window_seconds, 1)
return f(*args, **kwargs)
if int(current) >= max_requests:
# Rate limit exceeded
return jsonify({
'error': 'Rate limit exceeded',
'retry_after': redis_client.ttl(key)
}), 429
# Increment counter
redis_client.incr(key)
return f(*args, **kwargs)
wrapper.__name__ = f.__name__
return wrapper
return decorator
@app.route('/api/data')
@rate_limit(max_requests=100, window_seconds=60)
def get_data():
return jsonify({'data': 'Your data here'})
Laravel Rate Limiting Example
<?php
// routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\RateLimiter;
use Illuminate\Http\Request;
// Define rate limiter
RateLimiter::for('api', function (Request $request) {
return Limit::perMinute(100)
->by($request->header('X-API-Key') ?: $request->ip())
->response(function () {
return response()->json([
'error' => 'Rate limit exceeded'
], 429);
});
});
// Apply to routes
Route::middleware('throttle:api')->group(function () {
Route::get('/data', function () {
return response()->json(['data' => 'Your data here']);
});
});
// Custom rate limit for specific endpoint
Route::get('/heavy-operation', function () {
return response()->json(['result' => 'Done']);
})->middleware('throttle:10,1'); // 10 requests per minute
Horizontal Scaling
Vertical vs Horizontal Scaling
Vertical (Scaling Up):
- Add more CPU/RAM to one server
- Limited by hardware
- Expensive
- Single point of failure
Horizontal (Scaling Out):
- Add more servers
- Unlimited scaling
- Cost-effective
- High availability
Auto-Scaling Strategy
# pseudo-code for auto-scaling logic
if cpu_usage > 70% for 5 minutes:
add_server()
if cpu_usage < 30% for 15 minutes:
remove_server()
Stateless Application Design
Bad (Stateful):
# Storing session in memory
user_sessions = {}
@app.route('/login', methods=['POST'])
def login():
user_sessions[user_id] = {'logged_in': True} # Problem: lost if server restarts
Good (Stateless):
# Store session in Redis (shared across all servers)
@app.route('/login', methods=['POST'])
def login():
session_token = generate_token()
redis_client.setex(
f'session:{session_token}',
3600,
json.dumps({'user_id': user_id})
)
return jsonify({'token': session_token})
Docker + Kubernetes for Scaling
Dockerfile (Flask):
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"]
Kubernetes Deployment:
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-deployment
spec:
replicas: 10 # Start with 10 pods
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: your-api:latest
ports:
- containerPort: 5000
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
---
# Auto-scaling configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-deployment
minReplicas: 10
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Monitoring and Observability
The Three Pillars
- Logs: What happened?
- Metrics: How much/how fast?
- Traces: Where did time go?
1. Logging
Flask Structured Logging:
import logging
import json
from datetime import datetime
class JSONFormatter(logging.Formatter):
def format(self, record):
log_data = {
'timestamp': datetime.utcnow().isoformat(),
'level': record.levelname,
'message': record.getMessage(),
'path': record.pathname,
'line': record.lineno
}
return json.dumps(log_data)
# Configure logger
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)
@app.route('/api/user/<int:user_id>')
def get_user(user_id):
logger.info(f'Fetching user {user_id}')
# ... your code
logger.info(f'User {user_id} fetched successfully')
Laravel Logging:
<?php
use Illuminate\Support\Facades\Log;
Route::get('/user/{id}', function ($id) {
Log::info('Fetching user', ['user_id' => $id]);
$user = User::find($id);
Log::info('User fetched successfully', [
'user_id' => $id,
'execution_time_ms' => 50
]);
return response()->json($user);
});
2. Metrics Collection
Flask with Prometheus:
from flask import Flask
from prometheus_flask_exporter import PrometheusMetrics
app = Flask(__name__)
metrics = PrometheusMetrics(app)
# Automatic metrics:
# - Request count
# - Request duration
# - Request errors
# Custom metrics
from prometheus_client import Counter, Histogram
user_registrations = Counter(
'user_registrations_total',
'Total user registrations'
)
db_query_duration = Histogram(
'db_query_duration_seconds',
'Database query duration'
)
@app.route('/api/register', methods=['POST'])
def register():
# ... registration logic
user_registrations.inc()
return jsonify({'status': 'success'})
# Metrics endpoint: /metrics
3. Key Metrics to Track
# Request metrics
- Requests per second (RPS)
- Average response time
- 95th percentile response time (p95)
- 99th percentile response time (p99)
- Error rate
# System metrics
- CPU usage
- Memory usage
- Disk I/O
- Network throughput
# Application metrics
- Active database connections
- Cache hit rate
- Queue length
- Worker processing time
4. Alerting Rules
# Example Prometheus alerting rules
groups:
- name: api_alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate detected"
- alert: HighLatency
expr: http_request_duration_seconds{quantile="0.95"} > 1
for: 5m
annotations:
summary: "95th percentile latency > 1s"
- alert: HighCPU
expr: cpu_usage_percent > 80
for: 10m
annotations:
summary: "CPU usage above 80%"
Real-World Implementation
Complete Production-Ready Flask API
# app.py
from flask import Flask, jsonify, request
from flask_sqlalchemy import SQLAlchemy
from redis import Redis
from prometheus_flask_exporter import PrometheusMetrics
import json
import logging
# Initialize Flask app
app = Flask(__name__)
# Database configuration
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://user:pass@db-master/mydb'
app.config['SQLALCHEMY_BINDS'] = {
'read_replica': 'postgresql://user:pass@db-replica/mydb'
}
app.config['SQLALCHEMY_POOL_SIZE'] = 20
app.config['SQLALCHEMY_MAX_OVERFLOW'] = 10
app.config['SQLALCHEMY_POOL_RECYCLE'] = 3600
db = SQLAlchemy(app)
redis_client = Redis(host='redis', port=6379, decode_responses=True)
metrics = PrometheusMetrics(app)
# Logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Models
class User(db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100))
email = db.Column(db.String(100), unique=True, index=True)
def to_dict(self):
return {
'id': self.id,
'name': self.name,
'email': self.email
}
# Rate limiting decorator
def rate_limit(max_requests=100, window=60):
def decorator(f):
def wrapper(*args, **kwargs):
key = f'rate_limit:{request.remote_addr}'
current = redis_client.get(key)
if current and int(current) >= max_requests:
return jsonify({'error': 'Rate limit exceeded'}), 429
pipe = redis_client.pipeline()
pipe.incr(key)
pipe.expire(key, window)
pipe.execute()
return f(*args, **kwargs)
wrapper.__name__ = f.__name__
return wrapper
return decorator
# Caching decorator
def cached(ttl=3600):
def decorator(f):
def wrapper(*args, **kwargs):
# Generate cache key from function name and arguments
cache_key = f'cache:{f.__name__}:{":".join(map(str, args))}'
# Try cache
cached_result = redis_client.get(cache_key)
if cached_result:
logger.info(f'Cache HIT for {cache_key}')
return jsonify(json.loads(cached_result))
# Execute function
logger.info(f'Cache MISS for {cache_key}')
result = f(*args, **kwargs)
# Store in cache
if result:
redis_client.setex(cache_key, ttl, json.dumps(result.get_json()))
return result
wrapper.__name__ = f.__name__
return wrapper
return decorator
# Health check endpoint
@app.route('/health')
def health():
return jsonify({'status': 'healthy'}), 200
# API endpoints
@app.route('/api/users', methods=['GET'])
@rate_limit(max_requests=1000, window=60)
def get_users():
"""Get all users with pagination"""
page = request.args.get('page', 1, type=int)
per_page = request.args.get('per_page', 20, type=int)
# Use read replica for reads
users = User.query.with_bind('read_replica')\
.paginate(page=page, per_page=per_page, error_out=False)
return jsonify({
'users': [u.to_dict() for u in users.items],
'total': users.total,
'page': page,
'per_page': per_page
})
@app.route('/api/user/<int:user_id>', methods=['GET'])
@rate_limit(max_requests=1000, window=60)
@cached(ttl=3600)
def get_user(user_id):
"""Get single user with caching"""
user = User.query.with_bind('read_replica').get(user_id)
if not user:
return jsonify({'error': 'User not found'}), 404
return jsonify(user.to_dict())
@app.route('/api/users', methods=['POST'])
@rate_limit(max_requests=100, window=60)
def create_user():
"""Create new user"""
data = request.get_json()
# Validation
if not data.get('email') or not data.get('name'):
return jsonify({'error': 'Missing required fields'}), 400
# Create user (writes go to master)
user = User(name=data['name'], email=data['email'])
try:
db.session.add(user)
db.session.commit()
logger.info(f'User created: {user.id}')
return jsonify(user.to_dict()), 201
except Exception as e:
db.session.rollback()
logger.error(f'Error creating user: {str(e)}')
return jsonify({'error': 'Failed to create user'}), 500
@app.route('/api/user/<int:user_id>', methods=['DELETE'])
@rate_limit(max_requests=50, window=60)
def delete_user(user_id):
"""Delete user and invalidate cache"""
user = User.query.get(user_id)
if not user:
return jsonify({'error': 'User not found'}), 404
try:
db.session.delete(user)
db.session.commit()
# Invalidate cache
redis_client.delete(f'cache:get_user:{user_id}')
logger.info(f'User deleted: {user_id}')
return jsonify({'message': 'User deleted'}), 200
except Exception as e:
db.session.rollback()
logger.error(f'Error deleting user: {str(e)}')
return jsonify({'error': 'Failed to delete user'}), 500
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
requirements.txt:
Flask==3.0.0
Flask-SQLAlchemy==3.1.1
psycopg2-binary==2.9.9
redis==5.0.1
gunicorn==21.2.0
prometheus-flask-exporter==0.23.0
Running the Flask API:
# Development
python app.py
# Production with Gunicorn (4 workers)
gunicorn -w 4 -b 0.0.0.0:5000 app:app
# Production with more workers and threads
gunicorn -w 8 --threads 2 -b 0.0.0.0:5000 --timeout 30 --keep-alive 5 app:app
Complete Production-Ready Laravel API
<?php
// routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\Redis;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Log;
use App\Models\User;
use Illuminate\Http\Request;
// Health check
Route::get('/health', function () {
return response()->json(['status' => 'healthy']);
});
// Rate limiting middleware group
Route::middleware(['throttle:api'])->group(function () {
// Get all users with pagination
Route::get('/users', function (Request $request) {
$page = $request->query('page', 1);
$perPage = $request->query('per_page', 20);
// Use read replica
$users = User::on('read_replica')
->paginate($perPage, ['*'], 'page', $page);
return response()->json([
'users' => $users->items(),
'total' => $users->total(),
'page' => $users->currentPage(),
'per_page' => $users->perPage()
]);
});
// Get single user with caching
Route::get('/user/{id}', function ($id) {
$cacheKey = "user:{$id}";
// Try cache first
$user = Cache::remember($cacheKey, 3600, function () use ($id) {
Log::info("Cache MISS for user {$id}");
return User::on('read_replica')->find($id);
});
if (!$user) {
return response()->json(['error' => 'User not found'], 404);
}
Log::info("Cache HIT for user {$id}");
return response()->json($user);
});
// Create user
Route::post('/users', function (Request $request) {
$validated = $request->validate([
'name' => 'required|string|max:100',
'email' => 'required|email|unique:users,email'
]);
try {
$user = User::create($validated);
Log::info("User created: {$user->id}");
return response()->json($user, 201);
} catch (\Exception $e) {
Log::error("Error creating user: {$e->getMessage()}");
return response()->json(['error' => 'Failed to create user'], 500);
}
})->middleware('throttle:100,1'); // Stricter rate limit
// Delete user
Route::delete('/user/{id}', function ($id) {
$user = User::find($id);
if (!$user) {
return response()->json(['error' => 'User not found'], 404);
}
try {
$user->delete();
// Invalidate cache
Cache::forget("user:{$id}");
Log::info("User deleted: {$id}");
return response()->json(['message' => 'User deleted'], 200);
} catch (\Exception $e) {
Log::error("Error deleting user: {$e->getMessage()}");
return response()->json(['error' => 'Failed to delete user'], 500);
}
})->middleware('throttle:50,1');
});
User Model:
<?php
// app/Models/User.php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class User extends Model
{
protected $fillable = ['name', 'email'];
protected $hidden = ['password'];
// Specify which connection to use
protected $connection = 'mysql';
}
Database Configuration:
<?php
// config/database.php
return [
'default' => env('DB_CONNECTION', 'mysql'),
'connections' => [
'mysql' => [
'driver' => 'mysql',
'read' => [
'host' => [
env('DB_READ_HOST_1', '192.168.1.2'),
env('DB_READ_HOST_2', '192.168.1.3'),
],
],
'write' => [
'host' => [
env('DB_WRITE_HOST', '192.168.1.1'),
],
],
'sticky' => true, // Ensure reads after writes use master
'database' => env('DB_DATABASE', 'myapp'),
'username' => env('DB_USERNAME', 'root'),
'password' => env('DB_PASSWORD', ''),
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
'prefix' => '',
// Connection pooling
'options' => [
\PDO::ATTR_PERSISTENT => true,
\PDO::ATTR_TIMEOUT => 5,
],
],
'read_replica' => [
'driver' => 'mysql',
'host' => env('DB_READ_HOST_1', '192.168.1.2'),
'database' => env('DB_DATABASE', 'myapp'),
'username' => env('DB_USERNAME', 'root'),
'password' => env('DB_PASSWORD', ''),
],
],
'redis' => [
'client' => env('REDIS_CLIENT', 'phpredis'),
'default' => [
'host' => env('REDIS_HOST', '127.0.0.1'),
'password' => env('REDIS_PASSWORD', null),
'port' => env('REDIS_PORT', 6379),
'database' => 0,
],
'cache' => [
'host' => env('REDIS_HOST', '127.0.0.1'),
'password' => env('REDIS_PASSWORD', null),
'port' => env('REDIS_PORT', 6379),
'database' => 1,
],
],
];
Running Laravel API:
# Install dependencies
composer install
# Run migrations
php artisan migrate
# Production server with PHP-FPM + NGINX
# Install: apt-get install php-fpm nginx
# Or use Laravel Octane for better performance
composer require laravel/octane
php artisan octane:install --server=swoole
# Run with Octane (handles 10x more requests)
php artisan octane:start --workers=8 --max-requests=1000
Performance Optimization Checklist
Application Level
- Enable gzip compression
- Minimize response payload size
- Use pagination for large datasets
- Implement connection pooling
- Use async processing for heavy tasks
- Add appropriate indexes to database
- Avoid N+1 queries
- Use eager loading for relationships
Infrastructure Level
- Deploy behind load balancer
- Use CDN for static assets
- Implement multi-layer caching
- Set up database read replicas
- Use container orchestration (Kubernetes)
- Enable auto-scaling
- Deploy in multiple regions
- Use managed services (RDS, ElastiCache)
Monitoring Level
- Track key metrics (RPS, latency, errors)
- Set up alerting for anomalies
- Implement distributed tracing
- Log all critical events
- Monitor database performance
- Track cache hit rates
- Monitor server resources
Architecture Diagram
┌─────────────┐
│ Internet │
└──────┬──────┘
│
┌──────▼──────┐
│ CDN │
│ (CloudFlare)│
└──────┬──────┘
│
┌────────────▼────────────┐
│ Load Balancer (ELB) │
│ nginx / HAProxy │
└──┬────────┬──────┬──────┘
│ │ │
┌─────────────┼────────┼──────┼─────────────┐
│ │ │ │ │
┌────▼───┐ ┌────▼───┐ ... ┌▼────┐ ┌────▼───┐
│ API │ │ API │ │ API │ │ API │
│ Server │ │ Server │ │ ... │ │ Server │
│ 1 │ │ 2 │ │ │ │ 100 │
└────┬───┘ └────┬───┘ └─────┘ └────┬───┘
│ │ │
└────────────┼────────────────────────────┘
│
┌────────────┼────────────┐
│ │ │
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│ Redis │ │ Redis │ │ Redis │
│ Cache │ │ Queue │ │Session │
└────────┘ └───┬────┘ └────────┘
│
┌────▼────┐
│ Celery │
│ Workers │
└─────────┘
│
┌───────────┼───────────┐
│ │ │
┌────▼──┐ ┌───▼───┐ ┌───▼───┐
│ DB │ │ DB │ │ DB │
│Master │───│Replica│ │Replica│
└───────┘ └───────┘ └───────┘
Deployment Example (Docker Compose)
# docker-compose.yml
version: '3.8'
services:
# Load Balancer
nginx:
image: nginx:latest
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- api1
- api2
- api3
# API Servers
api1:
build: .
environment:
- DB_HOST=postgres-master
- REDIS_HOST=redis
depends_on:
- postgres-master
- redis
api2:
build: .
environment:
- DB_HOST=postgres-master
- REDIS_HOST=redis
depends_on:
- postgres-master
- redis
api3:
build: .
environment:
- DB_HOST=postgres-master
- REDIS_HOST=redis
depends_on:
- postgres-master
- redis
# Redis Cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru
# PostgreSQL Master
postgres-master:
image: postgres:16
environment:
POSTGRES_DB: myapp
POSTGRES_USER: admin
POSTGRES_PASSWORD: secret
volumes:
- postgres-data:/var/lib/postgresql/data
ports:
- "5432:5432"
# PostgreSQL Replica
postgres-replica:
image: postgres:16
environment:
POSTGRES_DB: myapp
POSTGRES_USER: admin
POSTGRES_PASSWORD: secret
ports:
- "5433:5432"
# Celery Worker (for Flask)
celery-worker:
build: .
command: celery -A celery_app worker --loglevel=info --concurrency=4
environment:
- REDIS_HOST=redis
depends_on:
- redis
# Monitoring - Prometheus
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
# Monitoring - Grafana
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
postgres-data:
Load Testing
Using Apache Bench (ab)
# Install
apt-get install apache2-utils
# Test 1: Simple load test
ab -n 10000 -c 100 http://localhost/api/users
# -n: Total requests
# -c: Concurrent requests
# Test 2: POST request
ab -n 1000 -c 50 -p data.json -T application/json http://localhost/api/users
# Test 3: With authentication
ab -n 5000 -c 100 -H "Authorization: Bearer token123" http://localhost/api/user/1
Using wrk (More Advanced)
# Install
apt-get install wrk
# Test 1: Basic load test
wrk -t12 -c400 -d30s http://localhost/api/users
# -t: Threads
# -c: Connections
# -d: Duration
# Test 2: With script
wrk -t12 -c400 -d30s -s script.lua http://localhost/api/users
# script.lua
wrk.method = "POST"
wrk.body = '{"name":"John","email":"john@example.com"}'
wrk.headers["Content-Type"] = "application/json"
Using Locust (Python)
# locustfile.py
from locust import HttpUser, task, between
class APIUser(HttpUser):
wait_time = between(1, 3) # Wait 1-3 seconds between requests
@task(3) # Weight: 3x more likely
def get_users(self):
self.client.get("/api/users")
@task(1)
def get_user(self):
self.client.get("/api/user/1")
@task(1)
def create_user(self):
self.client.post("/api/users", json={
"name": "Test User",
"email": "test@example.com"
})
# Run locust
# locust -f locustfile.py --host=http://localhost
# Open browser: http://localhost:8089
# Set users: 10000, Spawn rate: 100/s
Reading Load Test Results
# Example output from wrk:
Running 30s test @ http://localhost/api/users
12 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 45.23ms 12.45ms 200.00ms 75.23%
Req/Sec 8.85k 890.00 11.20k 68.12%
3180000 requests in 30.00s, 1.20GB read
Requests/sec: 106000.00 ← Your RPS!
Transfer/sec: 41.00MB
Key Metrics:
- Requests/sec: 106,000 (Good! Target: 1M needs ~10 such servers)
- Avg Latency: 45.23ms (Good! Under 100ms)
- Max Latency: 200ms (Acceptable)
- Errors: 0 (Perfect!)
Database Schema Best Practices
Optimal Schema Design
-- Users table with indexes
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
name VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Indexes for common queries
CREATE INDEX idx_users_email ON users(email); -- For lookups
CREATE INDEX idx_users_created ON users(created_at); -- For sorting
-- Composite index for filtered queries
CREATE INDEX idx_users_status_created ON users(status, created_at)
WHERE status = 'active';
-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
Avoiding N+1 Queries
Bad (N+1 Query Problem):
# Flask/SQLAlchemy
@app.route('/api/posts')
def get_posts():
posts = Post.query.all()
return jsonify([{
'id': p.id,
'title': p.title,
'author': p.author.name # Triggers separate query for EACH post!
} for p in posts])
# Generates:
# SELECT * FROM posts; (1 query)
# SELECT * FROM users WHERE id=1; (N queries)
# SELECT * FROM users WHERE id=2;
# ... (N more queries)
Good (Eager Loading):
# Flask/SQLAlchemy
from sqlalchemy.orm import joinedload
@app.route('/api/posts')
def get_posts():
posts = Post.query.options(joinedload(Post.author)).all()
return jsonify([{
'id': p.id,
'title': p.title,
'author': p.author.name # No additional query!
} for p in posts])
# Generates only:
# SELECT * FROM posts JOIN users ON posts.author_id = users.id;
Laravel Example:
// Bad
$posts = Post::all();
foreach ($posts as $post) {
echo $post->author->name; // N+1 problem
}
// Good
$posts = Post::with('author')->get(); // Eager loading
foreach ($posts as $post) {
echo $post->author->name; // No extra queries
}
Security Best Practices
1. API Authentication
JWT (JSON Web Tokens):
# Flask with JWT
from flask_jwt_extended import JWTManager, create_access_token, jwt_required
app.config['JWT_SECRET_KEY'] = 'super-secret-key'
jwt = JWTManager(app)
@app.route('/api/login', methods=['POST'])
def login():
email = request.json.get('email')
password = request.json.get('password')
# Validate credentials
user = authenticate_user(email, password)
if not user:
return jsonify({'error': 'Invalid credentials'}), 401
# Create token
access_token = create_access_token(identity=user.id)
return jsonify({'token': access_token})
@app.route('/api/protected')
@jwt_required()
def protected():
return jsonify({'message': 'Access granted'})
2. Input Validation
# Flask with marshmallow
from marshmallow import Schema, fields, validate, ValidationError
class UserSchema(Schema):
name = fields.Str(required=True, validate=validate.Length(min=1, max=100))
email = fields.Email(required=True)
age = fields.Int(validate=validate.Range(min=0, max=150))
@app.route('/api/users', methods=['POST'])
def create_user():
schema = UserSchema()
try:
data = schema.load(request.json)
except ValidationError as err:
return jsonify({'errors': err.messages}), 400
# Data is validated and safe to use
user = User(**data)
db.session.add(user)
db.session.commit()
return jsonify(user.to_dict()), 201
3. SQL Injection Prevention
# Bad - Vulnerable to SQL injection
query = f"SELECT * FROM users WHERE email = '{user_input}'"
# Good - Use parameterized queries
query = "SELECT * FROM users WHERE email = %s"
result = db.execute(query, (user_input,))
# ORM (automatically safe)
user = User.query.filter_by(email=user_input).first()
4. CORS Configuration
# Flask-CORS
from flask_cors import CORS
app = Flask(__name__)
CORS(app, resources={
r"/api/*": {
"origins": ["https://myapp.com"],
"methods": ["GET", "POST", "PUT", "DELETE"],
"allow_headers": ["Content-Type", "Authorization"],
"max_age": 3600
}
})
Cost Optimization
Cloud Provider Comparison (Monthly)
Scenario: 1M RPS, 99.99% uptime
AWS:
- 100 EC2 instances (t3.medium): $3,600
- Application Load Balancer: $30
- RDS (Multi-AZ): $400
- ElastiCache (Redis): $200
- CloudFront (CDN): $150
- Data transfer: $500
Total: ~$4,880/month
Google Cloud:
- 100 GCE instances: $3,200
- Cloud Load Balancing: $25
- Cloud SQL: $350
- Memorystore (Redis): $180
- Cloud CDN: $120
- Data transfer: $450
Total: ~$4,325/month
Azure:
- 100 VMs: $3,400
- Load Balancer: $35
- Azure Database: $380
- Azure Cache: $190
- Azure CDN: $140
- Data transfer: $480
Total: ~$4,625/month
Cost Saving Tips
- Use spot/preemptible instances (60-90% discount)
- Reserved instances (30-50% discount for 1-3 year commitment)
- Auto-scaling: Scale down during low traffic
- Aggressive caching: Reduce database/compute load
- Optimize data transfer: Use CDN, compress responses
- Use managed services: RDS cheaper than self-managed
- Multi-region carefully: Only if needed for latency
Troubleshooting Common Issues
Issue 1: High Latency
Symptoms: Response time > 1 second
Debug Steps:
import time
@app.before_request
def start_timer():
g.start = time.time()
@app.after_request
def log_request(response):
duration = time.time() - g.start
if duration > 0.5: # Log slow requests
logger.warning(f'Slow request: {request.path} took {duration:.2f}s')
return response
Common Causes:
- Slow database queries → Add indexes
- No caching → Implement Redis
- N+1 queries → Use eager loading
- Blocking I/O → Use async
Issue 2: High Error Rate
Symptoms: 500 errors, connection timeouts
Debug:
@app.errorhandler(500)
def internal_error(error):
logger.error(f'Internal error: {error}', exc_info=True)
db.session.rollback()
return jsonify({'error': 'Internal server error'}), 500
@app.errorhandler(Exception)
def handle_exception(e):
logger.error(f'Unhandled exception: {e}', exc_info=True)
return jsonify({'error': str(e)}), 500
Common Causes:
- Database connection pool exhausted
- Memory leaks
- Unhandled exceptions
- External API failures
Issue 3: Memory Leaks
Debug:
import tracemalloc
tracemalloc.start()
@app.route('/api/memory-snapshot')
def memory_snapshot():
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
return jsonify([{
'file': str(stat.traceback),
'size_mb': stat.size / 1024 / 1024
} for stat in top_stats[:10]])
Conclusion
Building an API that handles 1 million requests per second requires:
- Distributed Architecture: No single server can handle this load
- Caching Everything: Reduces database load by 80-95%
- Database Optimization: Read replicas, connection pooling, indexes
- Async Processing: Don’t make users wait for background tasks
- Monitoring: You can’t improve what you don’t measure
- Load Balancing: Distribute traffic evenly
- Auto-Scaling: Add servers during traffic spikes
- Rate Limiting: Protect against abuse
Key Takeaways
- Start simple, scale gradually
- Measure before optimizing
- Cache aggressively
- Database is usually the bottleneck
- Horizontal scaling > Vertical scaling
- Monitor everything
- Test under load before going live
Next Steps
- Implement basic caching (biggest win)
- Set up connection pooling
- Add database indexes
- Implement rate limiting
- Deploy behind load balancer
- Set up monitoring
- Load test regularly
- Scale horizontally
Remember: Premature optimization is the root of all evil. Build for your current scale, but architect for future growth.
Happy scaling!

Comments