Circuit Breakers for Webhooks: Protecting Your Infrastructure

Without safeguards, a single failing endpoint triggers cascading retries that overwhelm your servers, delay healthy deliveries, and waste resources. The circuit breaker pattern solves this elegantly—essential for resilient webhook infrastructure.

What Is a Circuit Breaker?

The circuit breaker pattern, borrowed from electrical engineering, prevents repeated attempts on likely-to-fail operations. When failure rates cross a threshold, the breaker "opens" to stop requests, giving the service time to recover while protecting infrastructure from wasted resources.

For webhooks, a circuit breaker monitors delivery attempts per endpoint. When failures mount, it opens to halt requests, then later attempts recovery with test probes.

Why You Need Circuit Breakers

Retry storms: When a customer's endpoint fails and receives hundreds of webhooks/hour, thousands of retries accumulate, delaying healthy endpoints.

Cascading failures: A timeout that locks workers slows the entire pipeline. Healthy endpoints wait minutes instead of milliseconds.

Resource exhaustion: Retries to doomed endpoints waste CPU, memory, network, and database operations without delivering value.

Circuit Breaker States

A circuit breaker operates in three states:

Closed: Default state. All requests pass through normally. The breaker monitors successes and failures.

Open: When failures exceed threshold, the breaker opens. Requests are rejected immediately without contacting the endpoint, giving it time to recover.

Half-Open: After timeout, the breaker allows test probes. If they succeed, return to closed. If they fail, reopen and wait.

Implementing Circuit Breaker Logic

Here's a practical implementation of a webhook circuit breaker:

interface CircuitBreakerConfig {
  failureThreshold: number;      // Failures before opening
  successThreshold: number;      // Successes to close from half-open
  timeout: number;               // Ms before trying half-open
  errorRateThreshold: number;    // Percentage (0-100)
  minimumRequests: number;       // Min requests before rate calculation
}

class WebhookCircuitBreaker {
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private failures: number = 0;
  private successes: number = 0;
  private lastFailureTime: number = 0;
  private halfOpenSuccesses: number = 0;

  constructor(
    private endpointId: string,
    private config: CircuitBreakerConfig
  ) {}

  async execute(deliveryFn: () => Promise<void>): Promise<void> {
    if (!this.canExecute()) {
      throw new CircuitOpenError(this.endpointId, this.getResetTime());
    }

    try {
      await deliveryFn();
      this.onSuccess();
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private canExecute(): boolean {
    switch (this.state) {
      case 'closed':
        return true;
      case 'open':
        if (Date.now() - this.lastFailureTime >= this.config.timeout) {
          this.transitionTo('half-open');
          return true;
        }
        return false;
      case 'half-open':
        return true;
    }
  }

  private onSuccess(): void {
    if (this.state === 'half-open') {
      this.halfOpenSuccesses++;
      if (this.halfOpenSuccesses >= this.config.successThreshold) {
        this.transitionTo('closed');
      }
    } else {
      this.successes++;
      this.failures = Math.max(0, this.failures - 1);
    }
  }

  private onFailure(): void {
    this.failures++;
    this.lastFailureTime = Date.now();

    if (this.state === 'half-open') {
      this.transitionTo('open');
    } else if (this.shouldTrip()) {
      this.transitionTo('open');
    }
  }

  private shouldTrip(): boolean {
    if (this.failures >= this.config.failureThreshold) {
      return true;
    }

    const total = this.successes + this.failures;
    if (total >= this.config.minimumRequests) {
      const errorRate = (this.failures / total) * 100;
      return errorRate >= this.config.errorRateThreshold;
    }

    return false;
  }

  private transitionTo(newState: 'closed' | 'open' | 'half-open'): void {
    this.state = newState;
    if (newState === 'closed') {
      this.failures = 0;
      this.successes = 0;
      this.halfOpenSuccesses = 0;
    } else if (newState === 'half-open') {
      this.halfOpenSuccesses = 0;
    }
  }
}

When to Trip the Circuit Breaker

Balance sensitivity against stability. Trip too eagerly and transient issues disrupt; trip slowly and infrastructure risk.

Consecutive failures: Trip after N failures (works well for rarely-failing endpoints):

failureThreshold: 5  // Trip after 5 consecutive failures

Error rate thresholds: For high-volume endpoints, percentage-based triggers work better (1,000 webhooks/min might occasionally fail):

errorRateThreshold: 50    // Trip when 50% fail
minimumRequests: 20       // Calculate rate after 20 requests

Timeout handling: Weight timeouts heavily—they consume more resources:

const weight = isTimeout ? 3 : 1;
this.failures += weight;

Recovery Strategies

How your circuit breaker recovers matters as much as how it trips. Poor recovery logic can cause oscillation between open and closed states, creating unpredictable delivery behavior.

Gradual Recovery

Rather than immediately returning to full traffic after successful probes, gradually increase the load:

class GradualRecoveryBreaker extends WebhookCircuitBreaker {
  private recoveryPercentage: number = 0;

  protected canExecuteInHalfOpen(): boolean {
    // Gradually allow more traffic through
    return Math.random() * 100 < this.recoveryPercentage;
  }

  protected onHalfOpenSuccess(): void {
    this.recoveryPercentage = Math.min(100, this.recoveryPercentage + 10);
    if (this.recoveryPercentage >= 100) {
      this.transitionTo('closed');
    }
  }
}

Health Check Probes

Instead of using real webhook deliveries as probes, implement dedicated health checks. This prevents customer webhooks from being lost during recovery testing:

async function probeEndpointHealth(endpoint: Endpoint): Promise<boolean> {
  try {
    const response = await fetch(endpoint.healthCheckUrl || endpoint.url, {
      method: 'HEAD',
      timeout: 5000,
    });
    return response.ok;
  } catch {
    return false;
  }
}

Manual Reset

Sometimes automated recovery isn't appropriate. Provide operators with manual control for situations requiring human judgment:

class ManualResetBreaker extends WebhookCircuitBreaker {
  private manuallyOpened: boolean = false;

  manualOpen(reason: string): void {
    this.manuallyOpened = true;
    this.transitionTo('open');
    this.logManualAction('open', reason);
  }

  manualClose(reason: string): void {
    this.manuallyOpened = false;
    this.transitionTo('closed');
    this.logManualAction('close', reason);
  }

  protected canExecute(): boolean {
    if (this.manuallyOpened) return false;
    return super.canExecute();
  }
}

Endpoint Health Tracking

Circuit breakers need persistent state across delivery infrastructure. Store breaker state in a shared data store (Redis):

interface EndpointHealth {
  endpointId: string;
  state: 'closed' | 'open' | 'half-open';
  failureCount: number;
  lastFailure: Date | null;
}

// Store in Redis for fast access
await redis.hset(`circuit:${health.endpointId}`, health);

See webhook observability for visibility into breaker states.

Conclusion

Circuit breakers transform webhook delivery from fragile to resilient infrastructure. Combined with retry strategies, rate limiting, and dead letter queues, they form a pillar of webhook reliability.

Proper implementation requires attention to failure detection, state management, and recovery logic. Whether you build your own or use a managed solution, this pattern is essential for production systems.

Circuit Breakers for Webhooks: Protecting Your Infrastructure

Circuit Breakers for Webhooks: Protecting Your Infrastructure

What Is a Circuit Breaker?

Why You Need Circuit Breakers

Circuit Breaker States

Implementing Circuit Breaker Logic

When to Trip the Circuit Breaker

Recovery Strategies

Gradual Recovery

Health Check Probes

Manual Reset

Endpoint Health Tracking

Conclusion

Related Posts

Webhook Retry Strategies: Linear vs Exponential Backoff

Webhook Rate Limiting: Strategies for Senders and Receivers

Dead Letter Queues for Failed Webhooks: A Complete Technical Guide

Webhook Observability: Logging, Metrics, and Distributed Tracing

From 0 to 10K Webhooks: Scaling Your First Implementation