Rate Limiting and Abuse Prevention

After this lesson, you will be able to: Add rate limiting to public endpoints, choose between fixed-window, sliding-window, and token-bucket algorithms, implement it in Next.js with Upstash Redis, and back it with Cloudflare WAF rules.

Every public endpoint without a rate limit is an open door to brute force, credential stuffing, scraping, and resource exhaustion. This lesson covers why rate limiting matters, the main algorithms and their trade-offs, implementing it in a Next.js app with Upstash Redis, layering it at the Cloudflare edge, and returning friendly 429 responses.

Prerequisites:Working With APIs: HTTP, Status Codes, Postman, and fetch

Why every public endpoint needs a rate limit

Without a rate limit, an attacker can hit your login endpoint thousands of times a second trying passwords (brute force) or known leaked credentials (credential stuffing), scrape your entire API, or simply exhaust your database connections to take you offline. Rate limiting caps how many requests a single client can make in a window. It is the single cheapest defense with the highest payoff.

The three algorithms and their trade-offs

Fixed window: count requests per clock window (e.g. 100 per minute). Simple, but allows a burst of 200 across a window boundary. Sliding window: count requests in the trailing N seconds from now. Smoother, slightly more expensive to track. Token bucket: tokens refill at a steady rate; each request spends one; bursts are allowed up to the bucket size, then throttled to the refill rate. Best for APIs that want to allow short bursts but a steady average. Most apps want sliding window for auth and token bucket for general API traffic.

Rate limiting a Next.js route with Upstash Redis

Upstash provides serverless Redis with an HTTP API, which works in edge and serverless functions where a TCP Redis connection does not.

python

import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(5, "60 s"), // 5 requests / 60s
  prefix: "ratelimit:login",
});

export async function POST(req: Request) {
  const ip = req.headers.get("x-forwarded-for") ?? "unknown";
  const { success, reset } = await ratelimit.limit(ip);
  if (!success) {
    return new Response("Too many attempts", {
      status: 429,
      headers: { "Retry-After": String(Math.ceil((reset - Date.now()) / 1000)) },
    });
  }
  // ... handle the login
}

Rate limiting at the infrastructure layer (Cloudflare WAF)

Application-level limits protect logic, but the request still reaches your server. Cloudflare WAF rate-limiting rules stop abusive traffic at the edge before it costs you anything. A typical rule: 'if the request path matches /api/auth/* and the same IP exceeds 10 requests in 1 minute, block for 10 minutes.' Layer this with your in-app limit (defense in depth): the edge absorbs floods, the app enforces per-user fairness.

Rate limiting vs throttling, and good 429 responses

Rate limiting rejects requests over the cap (returns 429). Throttling slows them down (queues or delays) instead of rejecting. For a user-facing app, return 429 with a Retry-After header and a friendly message ('Too many attempts, try again in 30 seconds') rather than a cryptic error. For failed logins specifically, progressive delays (each failure waits a bit longer) are gentler on legitimate users who mistype than a hard lockout, but a hard lockout after N failures is stronger against automated attacks. Many apps combine both: small progressive delays, then a lockout.

Quick Check

Which algorithm best allows short bursts of traffic while enforcing a steady average rate?

Pick one.

Token bucketFixed windowSliding windowNo limit, just log

Common mistakes only experienced devs catch

Keying the limit on a value the client controls (a header or user-supplied ID) instead of IP or authenticated user ID. Trusting x-forwarded-for blindly when not behind a trusted proxy (it is spoofable; behind Cloudflare/Vercel use the platform's real-IP header). Rate limiting in process memory on a serverless platform, where each instance has its own counter and the limit effectively multiplies by the instance count (use shared Redis). Forgetting Retry-After, so clients hammer immediately. Limiting only the happy path and leaving password-reset and signup wide open.

←Secrets Management and Environment Variables

Back to Security for Developers

Bot Protection and Anti-Scraping→