Back to Blog
·Hook Mesh Engineering

Webhook Rate Limiting: Strategies for Senders and Receivers

A comprehensive technical guide to webhook rate limiting covering both sender and receiver perspectives, including implementation strategies, code examples, and best practices for handling high-volume event delivery.

Webhook Rate Limiting: Strategies for Senders and Receivers

Webhook Rate Limiting: Strategies for Senders and Receivers

Rate limiting is essential for reliable webhook infrastructure. Whether sending millions of events daily or receiving webhooks from dozens of services, understanding rate limiting builds systems that scale under pressure.

This covers both sides: sender rate limiting and receiver protection.

Why Rate Limiting Matters for Webhooks

Unlike traditional APIs where clients control volume, webhooks flip the model—senders control traffic. A flash sale triggering thousands of webhooks can overwhelm unprepared endpoints.

Senders: Uncontrolled delivery exhausts resources, triggers blocks, loses events. Receivers: Lacking rate limiting means one misbehaving upstream service takes down your pipeline.

Rate Limiting for Webhook Senders

Why Limit Your Outbound Rate

Smart senders implement rate limiting to:

  • Protect receivers: Not all can handle 1,000 req/sec
  • Prevent cascading failures: Slow endpoints don't exhaust workers
  • Maintain fairness: High-volume customers don't starve smaller ones
  • Reduce retry storms: Controlled delivery doesn't overwhelm recovering endpoints

Implementing Per-Endpoint Limits

Maintain separate rate limits per endpoint. Isolates customers, allows capacity-based customization.

import time
from dataclasses import dataclass
from typing import Dict
import redis

@dataclass
class EndpointRateLimit:
    requests_per_second: int
    burst_size: int

class WebhookRateLimiter:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.default_limit = EndpointRateLimit(
            requests_per_second=10,
            burst_size=20
        )

    def can_send(self, endpoint_id: str) -> tuple[bool, float]:
        """
        Token bucket implementation for per-endpoint limiting.
        Returns (allowed, wait_time_seconds).
        """
        limit = self.get_endpoint_limit(endpoint_id)
        key = f"webhook_bucket:{endpoint_id}"
        now = time.time()

        pipe = self.redis.pipeline()
        pipe.hgetall(key)
        result = pipe.execute()[0]

        tokens = float(result.get(b'tokens', limit.burst_size))
        last_update = float(result.get(b'last_update', now))

        # Replenish tokens based on elapsed time
        elapsed = now - last_update
        tokens = min(
            limit.burst_size,
            tokens + elapsed * limit.requests_per_second
        )

        if tokens >= 1:
            # Consume a token and allow the request
            self.redis.hset(key, mapping={
                'tokens': tokens - 1,
                'last_update': now
            })
            self.redis.expire(key, 300)
            return True, 0

        # Calculate wait time for next available token
        wait_time = (1 - tokens) / limit.requests_per_second
        return False, wait_time

    def get_endpoint_limit(self, endpoint_id: str) -> EndpointRateLimit:
        # Load custom limits from config or return default
        custom = self.redis.hgetall(f"endpoint_config:{endpoint_id}")
        if custom:
            return EndpointRateLimit(
                requests_per_second=int(custom.get(b'rps', 10)),
                burst_size=int(custom.get(b'burst', 20))
            )
        return self.default_limit

Handling Backpressure

Queue events for later delivery instead of dropping or hammering unresponsive endpoints.

class BackpressureHandler:
    def __init__(self, rate_limiter: WebhookRateLimiter):
        self.limiter = rate_limiter
        self.queue = redis.Redis()

    async def deliver_webhook(self, endpoint_id: str, payload: dict):
        allowed, wait_time = self.limiter.can_send(endpoint_id)

        if not allowed:
            # Queue for delayed delivery instead of blocking
            scheduled_time = time.time() + wait_time
            self.queue.zadd(
                f"delayed_webhooks:{endpoint_id}",
                {json.dumps(payload): scheduled_time}
            )
            return {"status": "queued", "deliver_at": scheduled_time}

        return await self._send_webhook(endpoint_id, payload)

Respecting Retry-After Headers

When receivers return 429, they include guidance on when to retry. Respect these signals.

async def _send_webhook(self, endpoint_id: str, payload: dict):
    response = await httpx.post(endpoint_url, json=payload)

    if response.status_code == 429:
        retry_after = response.headers.get('Retry-After')

        if retry_after:
            if retry_after.isdigit():
                delay = int(retry_after)
            else:
                # Handle HTTP-date format
                retry_date = parsedate_to_datetime(retry_after)
                delay = (retry_date - datetime.now()).total_seconds()

            # Temporarily reduce rate limit for this endpoint
            self.limiter.apply_backoff(endpoint_id, delay)

        # Requeue with delay
        return self._requeue_with_delay(endpoint_id, payload, delay)

    return response

Rate Limiting for Webhook Receivers

Protecting Your Endpoints

Defend against high-volume senders and abuse. Implement rate limiting at ingestion.

const express = require('express');
const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');

const app = express();

// Per-sender rate limiting using API key or signature
const webhookLimiter = rateLimit({
  store: new RedisStore({
    client: redisClient,
    prefix: 'webhook_rl:'
  }),
  windowMs: 60 * 1000, // 1 minute window
  max: 100, // 100 requests per minute per sender
  keyGenerator: (req) => {
    // Use webhook source identifier
    return req.headers['x-webhook-source'] || req.ip;
  },
  handler: (req, res) => {
    const retryAfter = Math.ceil(req.rateLimit.resetTime / 1000);
    res.set('Retry-After', retryAfter);
    res.status(429).json({
      error: 'rate_limit_exceeded',
      message: 'Too many webhooks received',
      retry_after: retryAfter
    });
  }
});

app.post('/webhooks/:source', webhookLimiter, processWebhook);

Token Bucket vs Leaky Bucket

Token Bucket: Allows bursts up to bucket size, then steady rate. Accommodates traffic spikes.

Leaky Bucket: Constant processing rate, queues excess. Ensures consistent throughput.

class LeakyBucket:
    """
    Leaky bucket for consistent webhook processing rate.
    """
    def __init__(self, rate: float, capacity: int):
        self.rate = rate  # requests per second
        self.capacity = capacity
        self.water = 0
        self.last_leak = time.time()

    def allow_request(self) -> bool:
        now = time.time()

        # Leak water based on elapsed time
        elapsed = now - self.last_leak
        self.water = max(0, self.water - elapsed * self.rate)
        self.last_leak = now

        if self.water < self.capacity:
            self.water += 1
            return True
        return False

Queue-Based Processing

Decouple ingestion from processing. Accept quickly, process at controlled rate.

from celery import Celery

app = Celery('webhooks')

@app.route('/webhooks', methods=['POST'])
def receive_webhook():
    payload = request.json

    # Immediately queue for async processing
    process_webhook.apply_async(
        args=[payload],
        queue='webhooks',
        rate_limit='100/m'  # Celery built-in rate limiting
    )

    # Return 202 Accepted immediately
    return {'status': 'accepted'}, 202

@app.task(bind=True, max_retries=3)
def process_webhook(self, payload):
    try:
        handle_webhook_event(payload)
    except ProcessingError as e:
        raise self.retry(countdown=60)

Best Practices for Production Systems

Graceful degradation: Under pressure, prioritize stability. Use circuit breakers to reject rather than crash.

Burst handling: Configure limits with bursts. 10 req/sec with 50-burst handles normal traffic while protecting against overload.

Customer-specific limits: Enterprise customers handle higher rates than startups. Make configurable per endpoint.

Capacity planning: Anticipate spikes. Pre-scale for product launches. See scaling webhooks 0-10K req/sec.

Conclusion

Rate limiting requires both sides: senders protect receivers and handle backpressure; receivers defend endpoints with proper feedback.

Rate limiting combines with webhook reliability patterns: retries, circuit breakers, observability. Understanding these fundamentals builds systems that scale reliably.

Related Posts