Rate limiting controls the number of requests a client can make to an API within a specific timeframe to prevent abuse.
Rate limiting restricts how many requests a user or IP address can make to your API within a timeframe. Once the limit is exceeded, additional requests are rejected until the window resets.
Think of it as a bouncer at a club counting how many times someone enters. After 100 entries in an hour, they stop letting that person in until the next hour.
Prevent Abuse: Without limits, one user could spam your API with millions of requests, crashing your servers or racking up huge costs.
Fair Usage: Ensure all users get reasonable access. Prevent one user from hogging all resources.
Cost Control: Cloud services charge per request. Rate limiting prevents astronomical bills from malicious or buggy clients.
Security: Slow down brute force attacks, credential stuffing, and automated scraping.
Set a limit: "100 requests per hour per user."
Track requests: Each time a user makes a request, increment their counter.
Enforce: When they hit 100, reject additional requests with "429 Too Many Requests" error.
Reset: After an hour, counter resets to zero.
Fixed Window: 100 requests per hour, resets at the top of each hour. Simple but has edge cases (user makes 100 requests at 1:59, then 100 more at 2:01).
Sliding Window: Tracks requests over rolling 60-minute window. More accurate but complex to implement.
Token Bucket: Users get tokens that refill over time. Allows bursts while maintaining average rate.
Leaky Bucket: Requests processed at steady rate. Excess requests queue up or get rejected.
Twitter API: 900 requests per 15 minutes for most endpoints. Exceed it and you wait.
GitHub API: 5,000 requests per hour for authenticated users, 60 per hour for unauthenticated. Clear, documented limits.
Stripe: Prevents brute forcing payment methods by limiting failed payment attempts.
OpenAI API: Rate limits prevent abuse while allowing legitimate usage.
When rate limited, APIs return 429 Too Many Requests with headers showing:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 23
X-RateLimit-Reset: 1640000000
Good clients respect these headers and slow down.
Redis: Store counters in Redis. Fast, handles high traffic, expires keys automatically.
API Gateway: AWS API Gateway, Kong, Nginx have built-in rate limiting.
Libraries: Express-rate-limit (Node.js), Django-ratelimit (Python). Easy to add to existing apps.
Most developers use existing tools rather than building from scratch.
Public API: Generous limits for normal use, strict enough to prevent abuse.
Internal API: Higher limits since traffic is controlled and trusted.
Authentication: Very strict on login attempts (prevent brute force).
Free vs Paid Tiers: Free users get lower limits, paid users get higher. Common monetization strategy.
Communicate limits clearly in documentation. Show remaining requests in responses. Provide meaningful error messages when limited.
Bad: "Error 429"
Good: "Rate limit exceeded. You can make 100 requests per hour. Try again in 23 minutes."
Respect Limits: Do not hammer APIs. Implement backoff strategies.
Cache Responses: Reduce requests by caching data locally.
Batch Requests: Some APIs allow requesting multiple resources in one call.
Monitor Usage: Track your request count to avoid hitting limits unexpectedly.
Too strict: Frustrates legitimate users.
Too lenient: Allows abuse and drives up costs.
Finding the right balance requires understanding your users and monitoring actual usage patterns.
Rate limiting is essential infrastructure for any production API. It protects your service, controls costs, and ensures fair access for all users.