Skip to main content
Back to Blog
Technical8 min read

Rate Limiting Strategies for SaaS APIs

Rate limiting protects your API from abuse, prevents a single user from consuming all resources, and keeps your infrastructure costs predictable. This article covers the three main algorithms, when to use each, and how Styrby implements rate limiting with Upstash Redis.

Three Algorithms

Fixed Window

Divide time into fixed intervals (e.g., 1 minute) and count requests per interval. When the count exceeds the limit, reject requests until the next interval starts.

// Fixed window: 100 requests per minute
const key = `rate:${userId}:${Math.floor(Date.now() / 60000)}`;
const count = await redis.incr(key);
if (count === 1) await redis.expire(key, 60);
if (count > 100) return { status: 429, retryAfter: 60 - (Date.now() % 60000) / 1000 };

Pro: Simple to implement and understand. Low memory: one counter per user per window.

Con: Burst problem at window boundaries. A user can send 100 requests at second 59 of window 1, then 100 more at second 0 of window 2. That is 200 requests in 2 seconds despite a 100/minute limit.

Sliding Window

Combines the current window count with a weighted portion of the previous window to smooth out boundary bursts:

// Sliding window: 100 requests per minute
const currentWindow = Math.floor(Date.now() / 60000);
const previousWindow = currentWindow - 1;
const elapsed = (Date.now() % 60000) / 60000; // 0.0 to 1.0

const currentCount = await redis.get(`rate:${userId}:${currentWindow}`) || 0;
const previousCount = await redis.get(`rate:${userId}:${previousWindow}`) || 0;

const weightedCount = previousCount * (1 - elapsed) + currentCount;
if (weightedCount >= 100) return { status: 429 };

Pro: Eliminates the boundary burst problem. Still uses fixed memory per user.

Con: The weighted count is an approximation, not exact. For most applications, the approximation is close enough.

Token Bucket

Each user has a bucket that fills with tokens at a steady rate. Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity that limits bursts.

// Token bucket: 100 tokens/minute, max burst of 20
interface Bucket {
  tokens: number;
  lastRefill: number;
}

function checkRateLimit(bucket: Bucket): { allowed: boolean; bucket: Bucket } {
  const now = Date.now();
  const elapsed = (now - bucket.lastRefill) / 1000;
  const refillRate = 100 / 60; // tokens per second

  // Refill tokens based on elapsed time
  const newTokens = Math.min(
    20, // max burst capacity
    bucket.tokens + elapsed * refillRate
  );

  if (newTokens < 1) {
    return { allowed: false, bucket: { ...bucket, tokens: newTokens } };
  }

  return {
    allowed: true,
    bucket: { tokens: newTokens - 1, lastRefill: now },
  };
}

Pro: Natural burst handling. Allows short bursts up to the bucket capacity while enforcing the average rate.

Con: Slightly more complex. Requires storing the token count and last refill timestamp per user.

When to Use Each

AlgorithmBest ForAvoid When
Fixed WindowSimple APIs, internal services, prototypingBoundary bursts would cause problems
Sliding WindowPublic APIs, SaaS products, general useYou need exact (not approximate) counting
Token BucketAPIs that should allow controlled burstsSimple rate limits suffice

Per-Endpoint vs. Per-User Limits

Styrby uses both:

  • Per-user global limit: 1,000 requests per minute across all endpoints. Prevents any single user from overwhelming the system.
  • Per-endpoint limits: Sensitive endpoints have tighter limits. The session creation endpoint allows 10 requests per minute. The cost data endpoint allows 60 requests per minute.

Both limits apply simultaneously. A user might be within their global limit but hit the per-endpoint limit on a specific route.

Implementation with Upstash Redis

Styrby uses Upstash Redis for rate limiting because it provides a serverless Redis instance that works well with Vercel and Supabase Edge Functions. The @upstash/ratelimit library handles the algorithm implementation:

import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL!,
  token: process.env.UPSTASH_REDIS_TOKEN!,
});

// Sliding window: 10 session creations per minute
const sessionCreateLimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(10, "1 m"),
  prefix: "ratelimit:session-create",
});

// In the API route handler
export async function POST(request: Request) {
  const userId = getUserId(request);
  const { success, limit, remaining, reset } =
    await sessionCreateLimit.limit(userId);

  if (!success) {
    return Response.json(
      { error: "RATE_LIMITED", message: "Too many requests", retryAfter: reset },
      {
        status: 429,
        headers: {
          "X-RateLimit-Limit": String(limit),
          "X-RateLimit-Remaining": String(remaining),
          "X-RateLimit-Reset": String(reset),
          "Retry-After": String(Math.ceil((reset - Date.now()) / 1000)),
        },
      }
    );
  }

  // Process the request...
}

Rate Limit Headers

Always include rate limit information in response headers so clients can self-regulate:

  • X-RateLimit-Limit: Maximum requests allowed
  • X-RateLimit-Remaining: Requests remaining in window
  • X-RateLimit-Reset: Unix timestamp when the window resets
  • Retry-After: Seconds until the client should retry (on 429 only)

Monitoring and Tuning

Track how often rate limits fire. If legitimate users regularly hit limits, the thresholds are too low. If limits never fire, they might be too high to provide protection. Review rate limit metrics monthly and adjust based on actual usage patterns.

Ready to manage your AI agents from one place?

Styrby gives you cost tracking, remote permissions, and session replay across five agents.