Node.js API Rate Limiting & Auth: Complete Security Guide

My API bill jumped from $40 to $380 in three days.

I woke up to that Stripe notification last year, ran to my VPS, and saw the problem immediately: 47,000 failed login attempts against my authentication endpoint. Someone was brute-forcing user accounts. My API had no rate limiting, no request throttling, and the database was melting under the load.

I killed the process, added IP-based rate limiting, and rebuilt the auth flow with proper JWT refresh tokens. The next attack — and there was a next attack — hit the rate limit at 100 requests and stopped cold.

Most guides split rate limiting and authentication into separate topics. That misses the point. You need both, and you need them working together. Authenticated users get higher limits. Anonymous requests get throttled hard. Expensive endpoints like search or export get their own caps. And when you're running on a budget VPS — not a million-dollar cloud bill — rate limiting isn't just security. It's cost control.

This is the combined guide I wish I'd had: multi-tier rate limiting with Redis, JWT authentication with refresh tokens, and implementations for both Express and Next.js that work in production.

Why Rate Limiting and Auth Matter

If your API is public, it will be abused.

I learned this running a SaaS API on a $12/month VPS. Bots hammered my endpoints until the database melted. Rate limiting stops the flood, but authentication is what lets you differentiate between legitimate users and attackers.

DoS and brute-force attacks. An attacker can try thousands of password combinations per second without throttling. I've seen credential-stuffing attacks cycle through 200,000+ email/password pairs in an hour.

API abuse and cost implications. When my API got hammered, the database connection pool maxed out, legitimate requests timed out, and my hosting bill jumped from $40 to $380 in three days.

Resource protection. A single rogue script making 1,000 req/sec can monopolize your database connections and take down the service.

Compliance requirements. SOC2 and PCI-DSS audits ask how you prevent brute-force attacks. Rate limiting + proper authentication isn't optional for certifications.

Authentication tells you who is making the request. Rate limiting tells you how much they're allowed to do.

Authentication Strategies Overview

JWT vs session-based auth. JWTs are stateless — the server validates the signature without storing session data. This scales well for Docker deployments. Session-based auth gives you instant revocation but requires shared Redis storage.

I use JWTs for most APIs. They work cleanly with multi-container deployments, and you can embed user tiers in the payload to enforce different rate limits.

API keys for server-to-server. Generate a long random string, hash it before storing, check it on every request. Rotate periodically.

OAuth2 for third-party integrations. Use OAuth2 for GitHub/Google logins, but JWT for internal auth.

When to use each:

Internal API, mobile app, SPA → JWT with refresh tokens
Server-to-server, webhooks → API keys
Third-party integrations → OAuth2
High-security, instant revocation → session-based with Redis

Implementing JWT Authentication

JWTs are self-contained. The server signs them, the client stores them, and subsequent requests send them back. You validate the signature without hitting the database — until the token expires and the client requests a new one using a refresh token.

Here's how I implement JWT auth for production APIs:

Token generation and validation. When a user logs in successfully, generate two tokens:

Access token (short-lived, 15 minutes): used for API requests
Refresh token (long-lived, 7 days): used to get a new access token

// auth.js
import jwt from 'jsonwebtoken';
import bcrypt from 'bcrypt';

const ACCESS_TOKEN_SECRET = process.env.ACCESS_TOKEN_SECRET;
const REFRESH_TOKEN_SECRET = process.env.REFRESH_TOKEN_SECRET;

export async function login(email, password, db) {
  const user = await db.users.findOne({ email });
  if (!user || !await bcrypt.compare(password, user.passwordHash)) {
    throw new Error('Invalid credentials');
  }

  const accessToken = jwt.sign(
    { userId: user.id, email: user.email, tier: user.tier },
    ACCESS_TOKEN_SECRET,
    { expiresIn: '15m' }
  );

  const refreshToken = jwt.sign(
    { userId: user.id },
    REFRESH_TOKEN_SECRET,
    { expiresIn: '7d' }
  );

  // Store refresh token in DB for revocation capability
  await db.refreshTokens.insert({ userId: user.id, token: refreshToken });

  return { accessToken, refreshToken };
}

export function verifyAccessToken(token) {
  try {
    return jwt.verify(token, ACCESS_TOKEN_SECRET);
  } catch (err) {
    throw new Error('Invalid or expired token');
  }
}

export async function refreshAccessToken(refreshToken, db) {
  const payload = jwt.verify(refreshToken, REFRESH_TOKEN_SECRET);
  
  // Check if refresh token exists in DB (not revoked)
  const storedToken = await db.refreshTokens.findOne({ 
    userId: payload.userId, 
    token: refreshToken 
  });
  
  if (!storedToken) {
    throw new Error('Refresh token revoked');
  }

  const user = await db.users.findOne({ id: payload.userId });
  
  return jwt.sign(
    { userId: user.id, email: user.email, tier: user.tier },
    ACCESS_TOKEN_SECRET,
    { expiresIn: '15m' }
  );
}

Refresh token flow. The client stores both tokens. When an API request returns 401 Unauthorized, the client automatically calls /auth/refresh with the refresh token, gets a new access token, and retries the original request. This flow is invisible to the user.

Secure storage and transmission. Access tokens go in the Authorization: Bearer <token> header. Never put them in URLs — they get logged. Refresh tokens should be stored in HTTP-only cookies (for web clients) or secure storage (for mobile apps). Never localStorage — it's vulnerable to XSS.

Common JWT vulnerabilities and fixes:

None algorithm attack: Always specify the algorithm (HS256, RS256) when verifying. Never accept algorithm: "none".
Secret reuse: Use different secrets for access and refresh tokens. If an access token is compromised, the refresh token secret is still safe.
No expiration: Always set expiresIn. A token that never expires is a permanent backdoor.
Sensitive data in payload: JWTs are base64-encoded, not encrypted. Don't put passwords, credit card numbers, or SSNs in the payload.

NextAuth.js integration. If you're using Next.js, NextAuth handles token generation and validation. Wrap it with custom rate limiting since NextAuth doesn't enforce per-user limits by default.

JWT authentication gives you the foundation. Now let's limit what authenticated users can do.

Rate Limiting Fundamentals

Rate limiting is just math: count requests per time window, reject when the count exceeds a threshold.

The simplest implementation is a counter in memory. But memory-based counters don't work when you scale horizontally — each server instance has its own counter, so a user can bypass limits by hitting different servers.

This is why production rate limiting uses Redis. It's shared state that all your API servers can read and write.

Token bucket algorithm. Each user (or IP) has a "bucket" with a fixed number of tokens. Every request consumes one token. Tokens refill at a constant rate (e.g., 10 tokens per minute). When the bucket is empty, requests are rejected.

This is better than a simple counter because it allows bursts. A user can make 100 requests instantly if they haven't used the API in a while, but sustained traffic gets throttled to the refill rate.

Fixed window vs sliding window. Fixed window resets the counter every N seconds. If the limit is 100 req/minute, a user can make 100 requests at 0:59 and another 100 at 1:00 — 200 requests in two seconds.

Sliding window smooths this out by tracking the exact timestamp of each request and evicting old requests as the window slides forward. It's more accurate but slightly more expensive to compute.

For most APIs, fixed window with a token bucket is good enough. The burst tolerance evens out edge cases.

Distributed rate limiting with Redis. Every request increments a key in Redis like ratelimit:ip:203.0.113.5. You set a TTL on the key equal to the window duration. If the key's value exceeds the limit, reject the request.

Redis is fast enough that the round-trip adds ~1-2ms to each request. For a VPS deployment, run Redis in a Docker container with proper security hardening. For AWS/Azure, use their managed Redis (ElastiCache, Azure Cache) or Upstash if you're on Vercel.

Rate limit headers (X-RateLimit-*). Always return these headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1672531200

Clients need these to back off gracefully.

Multi-Tier Rate Limiting Strategy

One global rate limit for all traffic is too crude. You need different limits for different contexts.

Here's the three-tier strategy I use in production:

Tier 1: IP-based (anonymous requests). Before authentication, you don't know who the user is. Rate limit by IP address with a strict cap: 20 requests per minute for public endpoints like /login, /register, /forgot-password.

This stops brute-force attacks cold. An attacker can't try 10,000 passwords per second if they're capped at 20 login attempts per minute.

Tier 2: User-based (authenticated requests). Once a user logs in, you have their userId from the JWT. Switch from IP-based to user-based limits: 100 requests per minute for standard users, 500 for premium users.

This is where you encode the user's tier in the JWT payload. The rate limiter reads req.user.tier from the decoded token and applies the matching limit.

Tier 3: Endpoint-specific (expensive operations). Some endpoints are more expensive than others. A GET /posts request is cheap. A POST /export-all-data request might take 10 seconds and hammer the database.

Layer a third rate limit on top: 5 exports per hour, even if the user has 500 general requests remaining.

Free vs paid tier differentiation. This is the real payoff. Free users get 100 req/min. Paid users get 1,000 req/min. Enterprise users get 10,000. You encode this in the JWT, enforce it in middleware, and your API scales with revenue.

Here's what the middleware logic looks like:

// middleware/rateLimiter.js
import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

// Tier 1: IP-based for anonymous requests
export const anonymousLimiter = rateLimit({
  store: new RedisStore({ client: redis, prefix: 'rl:anon:' }),
  windowMs: 60 * 1000, // 1 minute
  max: 20,
  standardHeaders: true,
  legacyHeaders: false,
  message: 'Too many requests from this IP, please try again later.'
});

// Tier 2: User-based for authenticated requests
export const authenticatedLimiter = (req, res, next) => {
  if (!req.user) {
    return res.status(401).json({ error: 'Unauthorized' });
  }

  const limits = {
    free: 100,
    premium: 500,
    enterprise: 10000
  };

  const userLimit = limits[req.user.tier] || limits.free;

  const limiter = rateLimit({
    store: new RedisStore({ client: redis, prefix: `rl:user:${req.user.userId}:` }),
    windowMs: 60 * 1000,
    max: userLimit,
    standardHeaders: true,
    legacyHeaders: false,
    keyGenerator: (req) => req.user.userId
  });

  limiter(req, res, next);
};

// Tier 3: Endpoint-specific limits
export const expensiveOperationLimiter = rateLimit({
  store: new RedisStore({ client: redis, prefix: 'rl:export:' }),
  windowMs: 60 * 60 * 1000, // 1 hour
  max: 5,
  standardHeaders: true,
  keyGenerator: (req) => req.user?.userId || req.ip
});

Multi-tier limiting is how you protect your API without punishing legitimate users. A free user browsing your docs at 10 req/min never hits the cap. A bot hammering login endpoints at 100 req/sec gets blocked in 200ms.

Implementation in Express.js

Express middleware makes this clean. You stack rate limiters in front of your routes, and they fire in order.

Here's the full Express setup with all three tiers:

// server.js
import express from 'express';
import { authenticateJWT } from './middleware/auth.js';
import { 
  anonymousLimiter, 
  authenticatedLimiter, 
  expensiveOperationLimiter 
} from './middleware/rateLimiter.js';

const app = express();
app.use(express.json());

// Public routes: IP-based rate limiting only
app.post('/auth/login', anonymousLimiter, async (req, res) => {
  try {
    const { email, password } = req.body;
    const tokens = await login(email, password, db);
    res.json(tokens);
  } catch (err) {
    res.status(401).json({ error: err.message });
  }
});

app.post('/auth/refresh', anonymousLimiter, async (req, res) => {
  try {
    const { refreshToken } = req.body;
    const accessToken = await refreshAccessToken(refreshToken, db);
    res.json({ accessToken });
  } catch (err) {
    res.status(401).json({ error: err.message });
  }
});

// Protected routes: JWT + user-based rate limiting
app.use('/api', authenticateJWT, authenticatedLimiter);

app.get('/api/posts', async (req, res) => {
  const posts = await db.posts.find({ userId: req.user.userId });
  res.json(posts);
});

app.post('/api/posts', async (req, res) => {
  const post = await db.posts.insert({ ...req.body, userId: req.user.userId });
  res.json(post);
});

// Expensive operation: additional endpoint-specific limit
app.post('/api/export', expensiveOperationLimiter, async (req, res) => {
  const data = await generateFullExport(req.user.userId);
  res.json(data);
});

app.listen(3000, () => {
  console.log('API server running on port 3000');
});

Error handling and user feedback. When a rate limit triggers, return the reset time:

const limiter = rateLimit({
  // ... other config
  handler: (req, res) => {
    res.status(429).json({
      error: 'Rate limit exceeded',
      limit: req.rateLimit.limit,
      remaining: 0,
      resetAt: new Date(Date.now() + req.rateLimit.resetTime).toISOString()
    });
  }
});

Express + Redis gives you production-grade rate limiting with 50 lines of code. Now let's look at how to do the same thing in Next.js.

Implementation in Next.js API Routes

Next.js API routes don't have Express-style middleware chains, but you can build the same system with a wrapper function.

Rate limiting in edge runtime. Next.js Edge Runtime doesn't support all Redis clients. Use Upstash Redis or Vercel KV:

// app/api/posts/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { checkRateLimit } from '@/lib/rateLimiter';
import { verifyAuth } from '@/lib/auth';

export const runtime = 'edge';

export async function GET(req: NextRequest) {
  // Authenticate
  const user = await verifyAuth(req);
  if (!user) {
    return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
  }

  // Rate limit by user
  const limits = { free: 100, premium: 500, enterprise: 10000 };
  const userLimit = limits[user.tier] || limits.free;

  const { success, remaining, resetAt } = await checkRateLimit(
    `user:${user.id}`,
    userLimit,
    60 * 1000 // 1 minute
  );

  if (!success) {
    return NextResponse.json(
      { 
        error: 'Rate limit exceeded',
        resetAt: new Date(resetAt).toISOString()
      },
      { 
        status: 429,
        headers: {
          'X-RateLimit-Limit': userLimit.toString(),
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': Math.floor(resetAt / 1000).toString()
        }
      }
    );
  }

  // Fetch data
  const posts = await db.posts.findMany({ where: { userId: user.id } });

  return NextResponse.json(posts, {
    headers: {
      'X-RateLimit-Limit': userLimit.toString(),
      'X-RateLimit-Remaining': remaining.toString(),
      'X-RateLimit-Reset': Math.floor(resetAt / 1000).toString()
    }
  });
}

For a production deployment on a VPS, I prefer Express. The middleware model is cleaner, and you can run Redis in a Docker container alongside your app.

Monitoring and Response

Rate limiting only works if you're watching it.

I log every rate limit hit to a separate stream so I can spot abuse patterns:

const limiter = rateLimit({
  handler: (req, res) => {
    logger.info({
      ip: req.ip,
      userId: req.user?.userId,
      path: req.path,
      timestamp: new Date().toISOString()
    });
    res.status(429).json({ error: 'Rate limit exceeded' });
  }
});

Rate limit hit metrics. I track three metrics:

Hit rate: should be <1% for normal traffic. If it spikes to 5-10%, you're under attack or your limits are too strict.
Unique IPs hitting limits: A single IP hitting limits all day is a bot. Twenty different users in an hour means your cap is too low.
Endpoint distribution: Which endpoints are getting hammered? If /api/search is rate-limited by legitimate users, raise the cap.

Alerting on abuse patterns. Simple rule: if the same IP hits rate limits on 3+ different endpoints in 10 minutes, block the IP at the firewall level.

IP blocking strategies. When an IP crosses the abuse threshold, add it to an in-memory blocklist that expires after 24 hours. For persistent blocking, use iptables on the VPS or a Cloudflare WAF rule.

Logging for security audits. Keep 90 days of rate limit logs. Generate weekly reports showing total hits, unique IPs blocked, most-hit endpoints, and average time-to-block. This goes into compliance folders for SOC2 audits.

Production Case Study

Before: I was running a SaaS API on a single VPS with no rate limiting. In March 2025, someone brute-forced the login endpoint. Database CPU hit 98%, response times spiked from 50ms to 8 seconds, and my Stripe bill jumped from $40 to $380.

The attacker rotated IPs slowly — a new IP every 500 requests. By the time I noticed, they'd tried 47,000 username/password combinations over three days.

Implemented: Multi-tier rate limiting + JWT refresh:

IP-based limiting on /login: 20 attempts/minute per IP
Redis-backed distributed rate limiting
JWT with 15-minute access tokens + 7-day refresh tokens
User-based limiting post-login: 100 req/min
Expensive-endpoint caps: 5 exports/hour

After: The next brute-force attempt hit the rate limit in 12 seconds. The attacker made 240 requests across 12 IPs, got rate-limited, and stopped. No database overload, no cost spike.

Metrics: 240 blocked requests, 0.02% false positive rate, database CPU stayed under 30%, response time under 100ms, VPS bill stayed flat.

Rate limiting isn't about perfection — it's about making attacks expensive enough that attackers move on.

Security Checklist

Before launching a new API:

Rate limiting:

Anonymous endpoints have IP-based rate limits
Authenticated endpoints have user-based rate limits
Expensive endpoints have additional caps
Rate limit headers returned on every response
State stored in Redis
Limits scale with user tier

Token security:

Access tokens expire ≤15 minutes
Refresh tokens stored securely and revocable
Token secrets are long, random, never committed
JWT algorithm explicitly specified (HS256/RS256, never none)
No sensitive data in JWT payload

Monitoring:

Rate limit hits logged separately
Alert when hit rate >2%
Alert when single IP hits 3+ endpoints in 10 minutes
Weekly compliance report generated

Incident response:

Procedure for blocking IPs at firewall
Procedure for revoking compromised user tokens
Emergency contact for WAF provider
Redis backup/rebuild plan

Run this checklist before every production deploy.

Wrapping Up

Rate limiting and authentication work best together. Authentication tells you who the user is. Rate limiting tells you how much they can do. When you combine them with multi-tier limits and distributed state in Redis, you get an API that scales with legitimate traffic and shuts down abuse before it costs you money.

The setup I showed here — JWT with refresh tokens, Redis-backed rate limiting, and multi-tier limits — is what I run in production on every Node.js API I build. It's simple enough to implement in an afternoon, and robust enough to survive brute-force attacks, credential stuffing, and traffic spikes.

If you're deploying this on a VPS, the next step is getting the Redis and Node.js containers talking to each other in a Docker Compose stack. That's how you build a production-ready Docker deployment that stays up under load.

Tested environment: Node.js 22 LTS, Redis 7.2, Docker 27.0, Ubuntu 22.04