Webhook Rate Limiting: Strategies for Senders and Receivers
A comprehensive technical guide to webhook rate limiting covering both sender and receiver perspectives, including implementation strategies, code examples, and best practices for handling high-volume event delivery.

Webhook Rate Limiting: Strategies for Senders and Receivers
Rate limiting is essential for reliable webhook infrastructure. Whether sending millions of events daily or receiving webhooks from dozens of services, understanding rate limiting builds systems that scale under pressure.
This covers both sides: sender rate limiting and receiver protection.
Why Rate Limiting Matters for Webhooks
Unlike traditional APIs where clients control volume, webhooks flip the model—senders control traffic. A flash sale triggering thousands of webhooks can overwhelm unprepared endpoints.
Senders: Uncontrolled delivery exhausts resources, triggers blocks, loses events. Receivers: Lacking rate limiting means one misbehaving upstream service takes down your pipeline.
Rate Limiting for Webhook Senders
Why Limit Your Outbound Rate
Smart senders implement rate limiting to:
- Protect receivers: Not all can handle 1,000 req/sec
- Prevent cascading failures: Slow endpoints don't exhaust workers
- Maintain fairness: High-volume customers don't starve smaller ones
- Reduce retry storms: Controlled delivery doesn't overwhelm recovering endpoints
Implementing Per-Endpoint Limits
Maintain separate rate limits per endpoint. Isolates customers, allows capacity-based customization.
import time
from dataclasses import dataclass
from typing import Dict
import redis
@dataclass
class EndpointRateLimit:
requests_per_second: int
burst_size: int
class WebhookRateLimiter:
def __init__(self, redis_client: redis.Redis):
self.redis = redis_client
self.default_limit = EndpointRateLimit(
requests_per_second=10,
burst_size=20
)
def can_send(self, endpoint_id: str) -> tuple[bool, float]:
"""
Token bucket implementation for per-endpoint limiting.
Returns (allowed, wait_time_seconds).
"""
limit = self.get_endpoint_limit(endpoint_id)
key = f"webhook_bucket:{endpoint_id}"
now = time.time()
pipe = self.redis.pipeline()
pipe.hgetall(key)
result = pipe.execute()[0]
tokens = float(result.get(b'tokens', limit.burst_size))
last_update = float(result.get(b'last_update', now))
# Replenish tokens based on elapsed time
elapsed = now - last_update
tokens = min(
limit.burst_size,
tokens + elapsed * limit.requests_per_second
)
if tokens >= 1:
# Consume a token and allow the request
self.redis.hset(key, mapping={
'tokens': tokens - 1,
'last_update': now
})
self.redis.expire(key, 300)
return True, 0
# Calculate wait time for next available token
wait_time = (1 - tokens) / limit.requests_per_second
return False, wait_time
def get_endpoint_limit(self, endpoint_id: str) -> EndpointRateLimit:
# Load custom limits from config or return default
custom = self.redis.hgetall(f"endpoint_config:{endpoint_id}")
if custom:
return EndpointRateLimit(
requests_per_second=int(custom.get(b'rps', 10)),
burst_size=int(custom.get(b'burst', 20))
)
return self.default_limitHandling Backpressure
Queue events for later delivery instead of dropping or hammering unresponsive endpoints.
class BackpressureHandler:
def __init__(self, rate_limiter: WebhookRateLimiter):
self.limiter = rate_limiter
self.queue = redis.Redis()
async def deliver_webhook(self, endpoint_id: str, payload: dict):
allowed, wait_time = self.limiter.can_send(endpoint_id)
if not allowed:
# Queue for delayed delivery instead of blocking
scheduled_time = time.time() + wait_time
self.queue.zadd(
f"delayed_webhooks:{endpoint_id}",
{json.dumps(payload): scheduled_time}
)
return {"status": "queued", "deliver_at": scheduled_time}
return await self._send_webhook(endpoint_id, payload)Respecting Retry-After Headers
When receivers return 429, they include guidance on when to retry. Respect these signals.
async def _send_webhook(self, endpoint_id: str, payload: dict):
response = await httpx.post(endpoint_url, json=payload)
if response.status_code == 429:
retry_after = response.headers.get('Retry-After')
if retry_after:
if retry_after.isdigit():
delay = int(retry_after)
else:
# Handle HTTP-date format
retry_date = parsedate_to_datetime(retry_after)
delay = (retry_date - datetime.now()).total_seconds()
# Temporarily reduce rate limit for this endpoint
self.limiter.apply_backoff(endpoint_id, delay)
# Requeue with delay
return self._requeue_with_delay(endpoint_id, payload, delay)
return responseRate Limiting for Webhook Receivers
Protecting Your Endpoints
Defend against high-volume senders and abuse. Implement rate limiting at ingestion.
const express = require('express');
const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const app = express();
// Per-sender rate limiting using API key or signature
const webhookLimiter = rateLimit({
store: new RedisStore({
client: redisClient,
prefix: 'webhook_rl:'
}),
windowMs: 60 * 1000, // 1 minute window
max: 100, // 100 requests per minute per sender
keyGenerator: (req) => {
// Use webhook source identifier
return req.headers['x-webhook-source'] || req.ip;
},
handler: (req, res) => {
const retryAfter = Math.ceil(req.rateLimit.resetTime / 1000);
res.set('Retry-After', retryAfter);
res.status(429).json({
error: 'rate_limit_exceeded',
message: 'Too many webhooks received',
retry_after: retryAfter
});
}
});
app.post('/webhooks/:source', webhookLimiter, processWebhook);Token Bucket vs Leaky Bucket
Token Bucket: Allows bursts up to bucket size, then steady rate. Accommodates traffic spikes.
Leaky Bucket: Constant processing rate, queues excess. Ensures consistent throughput.
class LeakyBucket:
"""
Leaky bucket for consistent webhook processing rate.
"""
def __init__(self, rate: float, capacity: int):
self.rate = rate # requests per second
self.capacity = capacity
self.water = 0
self.last_leak = time.time()
def allow_request(self) -> bool:
now = time.time()
# Leak water based on elapsed time
elapsed = now - self.last_leak
self.water = max(0, self.water - elapsed * self.rate)
self.last_leak = now
if self.water < self.capacity:
self.water += 1
return True
return FalseQueue-Based Processing
Decouple ingestion from processing. Accept quickly, process at controlled rate.
from celery import Celery
app = Celery('webhooks')
@app.route('/webhooks', methods=['POST'])
def receive_webhook():
payload = request.json
# Immediately queue for async processing
process_webhook.apply_async(
args=[payload],
queue='webhooks',
rate_limit='100/m' # Celery built-in rate limiting
)
# Return 202 Accepted immediately
return {'status': 'accepted'}, 202
@app.task(bind=True, max_retries=3)
def process_webhook(self, payload):
try:
handle_webhook_event(payload)
except ProcessingError as e:
raise self.retry(countdown=60)Best Practices for Production Systems
Graceful degradation: Under pressure, prioritize stability. Use circuit breakers to reject rather than crash.
Burst handling: Configure limits with bursts. 10 req/sec with 50-burst handles normal traffic while protecting against overload.
Customer-specific limits: Enterprise customers handle higher rates than startups. Make configurable per endpoint.
Capacity planning: Anticipate spikes. Pre-scale for product launches. See scaling webhooks 0-10K req/sec.
Conclusion
Rate limiting requires both sides: senders protect receivers and handle backpressure; receivers defend endpoints with proper feedback.
Rate limiting combines with webhook reliability patterns: retries, circuit breakers, observability. Understanding these fundamentals builds systems that scale reliably.
Related Posts
Circuit Breakers for Webhooks: Protecting Your Infrastructure
Learn how to implement the circuit breaker pattern for webhook delivery to prevent cascading failures, handle failing endpoints gracefully, and protect your infrastructure from retry storms.
Webhook Retry Strategies: Linear vs Exponential Backoff
A technical deep-dive into webhook retry strategies, comparing linear and exponential backoff approaches, with code examples and best practices for building reliable webhook delivery systems.
From 0 to 10K Webhooks: Scaling Your First Implementation
A practical guide for startups on how to scale webhooks from your first implementation to handling 10,000+ events per hour. Learn what breaks at each growth phase and how to fix it before your customers notice.
Build vs Buy: Should You Build Webhook Infrastructure In-House?
A practical guide for engineering teams deciding whether to build webhook delivery infrastructure from scratch or use a managed service. Covers engineering costs, timelines, and when each approach makes sense.
Webhook Observability: Logging, Metrics, and Distributed Tracing
A comprehensive technical guide to implementing observability for webhook systems. Learn about structured logging, key metrics to track, distributed tracing with OpenTelemetry, and alerting best practices.