Circuit Breakers for Webhooks: Protecting Your Infrastructure
Learn how to implement the circuit breaker pattern for webhook delivery to prevent cascading failures, handle failing endpoints gracefully, and protect your infrastructure from retry storms.

Circuit Breakers for Webhooks: Protecting Your Infrastructure
Without safeguards, a single failing endpoint triggers cascading retries that overwhelm your servers, delay healthy deliveries, and waste resources. The circuit breaker pattern solves this elegantly—essential for resilient webhook infrastructure.
What Is a Circuit Breaker?
The circuit breaker pattern, borrowed from electrical engineering, prevents repeated attempts on likely-to-fail operations. When failure rates cross a threshold, the breaker "opens" to stop requests, giving the service time to recover while protecting infrastructure from wasted resources.
For webhooks, a circuit breaker monitors delivery attempts per endpoint. When failures mount, it opens to halt requests, then later attempts recovery with test probes.
Why You Need Circuit Breakers
Retry storms: When a customer's endpoint fails and receives hundreds of webhooks/hour, thousands of retries accumulate, delaying healthy endpoints.
Cascading failures: A timeout that locks workers slows the entire pipeline. Healthy endpoints wait minutes instead of milliseconds.
Resource exhaustion: Retries to doomed endpoints waste CPU, memory, network, and database operations without delivering value.
Circuit Breaker States
A circuit breaker operates in three states:
Closed: Default state. All requests pass through normally. The breaker monitors successes and failures.
Open: When failures exceed threshold, the breaker opens. Requests are rejected immediately without contacting the endpoint, giving it time to recover.
Half-Open: After timeout, the breaker allows test probes. If they succeed, return to closed. If they fail, reopen and wait.
Implementing Circuit Breaker Logic
Here's a practical implementation of a webhook circuit breaker:
interface CircuitBreakerConfig {
failureThreshold: number; // Failures before opening
successThreshold: number; // Successes to close from half-open
timeout: number; // Ms before trying half-open
errorRateThreshold: number; // Percentage (0-100)
minimumRequests: number; // Min requests before rate calculation
}
class WebhookCircuitBreaker {
private state: 'closed' | 'open' | 'half-open' = 'closed';
private failures: number = 0;
private successes: number = 0;
private lastFailureTime: number = 0;
private halfOpenSuccesses: number = 0;
constructor(
private endpointId: string,
private config: CircuitBreakerConfig
) {}
async execute(deliveryFn: () => Promise<void>): Promise<void> {
if (!this.canExecute()) {
throw new CircuitOpenError(this.endpointId, this.getResetTime());
}
try {
await deliveryFn();
this.onSuccess();
} catch (error) {
this.onFailure();
throw error;
}
}
private canExecute(): boolean {
switch (this.state) {
case 'closed':
return true;
case 'open':
if (Date.now() - this.lastFailureTime >= this.config.timeout) {
this.transitionTo('half-open');
return true;
}
return false;
case 'half-open':
return true;
}
}
private onSuccess(): void {
if (this.state === 'half-open') {
this.halfOpenSuccesses++;
if (this.halfOpenSuccesses >= this.config.successThreshold) {
this.transitionTo('closed');
}
} else {
this.successes++;
this.failures = Math.max(0, this.failures - 1);
}
}
private onFailure(): void {
this.failures++;
this.lastFailureTime = Date.now();
if (this.state === 'half-open') {
this.transitionTo('open');
} else if (this.shouldTrip()) {
this.transitionTo('open');
}
}
private shouldTrip(): boolean {
if (this.failures >= this.config.failureThreshold) {
return true;
}
const total = this.successes + this.failures;
if (total >= this.config.minimumRequests) {
const errorRate = (this.failures / total) * 100;
return errorRate >= this.config.errorRateThreshold;
}
return false;
}
private transitionTo(newState: 'closed' | 'open' | 'half-open'): void {
this.state = newState;
if (newState === 'closed') {
this.failures = 0;
this.successes = 0;
this.halfOpenSuccesses = 0;
} else if (newState === 'half-open') {
this.halfOpenSuccesses = 0;
}
}
}When to Trip the Circuit Breaker
Balance sensitivity against stability. Trip too eagerly and transient issues disrupt; trip slowly and infrastructure risk.
Consecutive failures: Trip after N failures (works well for rarely-failing endpoints):
failureThreshold: 5 // Trip after 5 consecutive failuresError rate thresholds: For high-volume endpoints, percentage-based triggers work better (1,000 webhooks/min might occasionally fail):
errorRateThreshold: 50 // Trip when 50% fail
minimumRequests: 20 // Calculate rate after 20 requestsTimeout handling: Weight timeouts heavily—they consume more resources:
const weight = isTimeout ? 3 : 1;
this.failures += weight;Recovery Strategies
How your circuit breaker recovers matters as much as how it trips. Poor recovery logic can cause oscillation between open and closed states, creating unpredictable delivery behavior.
Gradual Recovery
Rather than immediately returning to full traffic after successful probes, gradually increase the load:
class GradualRecoveryBreaker extends WebhookCircuitBreaker {
private recoveryPercentage: number = 0;
protected canExecuteInHalfOpen(): boolean {
// Gradually allow more traffic through
return Math.random() * 100 < this.recoveryPercentage;
}
protected onHalfOpenSuccess(): void {
this.recoveryPercentage = Math.min(100, this.recoveryPercentage + 10);
if (this.recoveryPercentage >= 100) {
this.transitionTo('closed');
}
}
}Health Check Probes
Instead of using real webhook deliveries as probes, implement dedicated health checks. This prevents customer webhooks from being lost during recovery testing:
async function probeEndpointHealth(endpoint: Endpoint): Promise<boolean> {
try {
const response = await fetch(endpoint.healthCheckUrl || endpoint.url, {
method: 'HEAD',
timeout: 5000,
});
return response.ok;
} catch {
return false;
}
}Manual Reset
Sometimes automated recovery isn't appropriate. Provide operators with manual control for situations requiring human judgment:
class ManualResetBreaker extends WebhookCircuitBreaker {
private manuallyOpened: boolean = false;
manualOpen(reason: string): void {
this.manuallyOpened = true;
this.transitionTo('open');
this.logManualAction('open', reason);
}
manualClose(reason: string): void {
this.manuallyOpened = false;
this.transitionTo('closed');
this.logManualAction('close', reason);
}
protected canExecute(): boolean {
if (this.manuallyOpened) return false;
return super.canExecute();
}
}Endpoint Health Tracking
Circuit breakers need persistent state across delivery infrastructure. Store breaker state in a shared data store (Redis):
interface EndpointHealth {
endpointId: string;
state: 'closed' | 'open' | 'half-open';
failureCount: number;
lastFailure: Date | null;
}
// Store in Redis for fast access
await redis.hset(`circuit:${health.endpointId}`, health);See webhook observability for visibility into breaker states.
Conclusion
Circuit breakers transform webhook delivery from fragile to resilient infrastructure. Combined with retry strategies, rate limiting, and dead letter queues, they form a pillar of webhook reliability.
Proper implementation requires attention to failure detection, state management, and recovery logic. Whether you build your own or use a managed solution, this pattern is essential for production systems.
Related Posts
Webhook Retry Strategies: Linear vs Exponential Backoff
A technical deep-dive into webhook retry strategies, comparing linear and exponential backoff approaches, with code examples and best practices for building reliable webhook delivery systems.
Webhook Rate Limiting: Strategies for Senders and Receivers
A comprehensive technical guide to webhook rate limiting covering both sender and receiver perspectives, including implementation strategies, code examples, and best practices for handling high-volume event delivery.
Dead Letter Queues for Failed Webhooks: A Complete Technical Guide
Learn how to implement dead letter queues (DLQ) for handling permanently failed webhook deliveries. Covers queue setup, failure criteria, alerting, and best practices for webhook reliability.
Webhook Observability: Logging, Metrics, and Distributed Tracing
A comprehensive technical guide to implementing observability for webhook systems. Learn about structured logging, key metrics to track, distributed tracing with OpenTelemetry, and alerting best practices.
From 0 to 10K Webhooks: Scaling Your First Implementation
A practical guide for startups on how to scale webhooks from your first implementation to handling 10,000+ events per hour. Learn what breaks at each growth phase and how to fix it before your customers notice.