Circuit Breaker

Hook Mesh's circuit breaker automatically detects consistently failing endpoints and temporarily pauses deliveries to prevent wasted retry attempts. When the endpoint recovers, deliveries automatically resume.

How It Works

The circuit breaker monitors endpoint health and takes action when it detects persistent failures. This prevents overwhelming already-failing endpoints and conserves retry budget for transient failures.

Circuit States

CLOSED (Normal)
Endpoint is healthy. All webhook jobs are delivered normally with standard retry logic.
OPEN (Failing)
Endpoint is consistently failing. New webhook deliveries are paused. A test delivery is attempted every 5 minutes.
HALF-OPEN (Testing)
Circuit is testing if the endpoint has recovered. One test delivery is sent to check health.

When the Circuit Opens

The circuit opens (deliveries pause) when Hook Mesh detects consistent failure patterns indicating the endpoint is down or misconfigured.

Trigger ConditionThresholdWindow
Consecutive Failures5+ failed deliveriesIn a row (no successes)
Failure Rate≥50% failuresLast 10 deliveries

Failure Definition

A delivery is considered "failed" if it returns a 5xx error, times out, or has a connection error. 4xx errors (except 429) do NOT trigger the circuit breaker since they indicate client-side issues, not endpoint downtime.

Example: Circuit Opens After 5 Failures
Delivery 1: 500 Internal Server Error → Failed
Delivery 2: 503 Service Unavailable → Failed
Delivery 3: Timeout (30s) → Failed
Delivery 4: Connection Refused → Failed
Delivery 5: 500 Internal Server Error → Failed

→ Circuit Opens: Endpoint paused
→ Status changed to "paused" (circuit_breaker_open)
→ Customer notified via webhook (endpoint.circuit_breaker_opened)

Circuit Lifecycle

When the circuit opens, Hook Mesh automatically tests the endpoint periodically to detect when it recovers.

Lifecycle Diagram
                    ┌──────────────┐
                    │              │
          ┌────────▶│    CLOSED    │◀────────┐
          │         │   (Healthy)  │         │
          │         │              │         │
          │         └───────┬──────┘         │
          │                 │                │
          │      5+ failures│                │ Test succeeds
          │                 ▼                │
          │         ┌──────────────┐         │
          │         │     OPEN     │         │
          │         │   (Paused)   │─────────┤
          │         │              │         │
          │         └──────┬───────┘         │
          │                │                 │
          │   After 5 min  │                 │
          │                ▼                 │
          │         ┌──────────────┐         │
          │         │  HALF-OPEN   │         │
          └─────────│  (Testing)   │─────────┘
                    │              │
                    └──────────────┘

State Details:
• CLOSED: Normal delivery with retries
• OPEN: All new jobs queued, test every 5 minutes
• HALF-OPEN: Send one test delivery
  ├─ Success → Return to CLOSED
  └─ Failure → Return to OPEN (wait 5 min)

Recovery Process

1.

Circuit Opens

After 5 consecutive failures, the circuit opens. Endpoint status changes to paused.

2.

Jobs Queue Up

New webhook jobs for this endpoint are queued (not delivered) while the circuit is open.

3.

Test After 5 Minutes

Hook Mesh waits 5 minutes, then sends a test delivery to check if the endpoint has recovered.

4.

Test Result

If the test succeeds (2xx response), the circuit closes and queued jobs are delivered. If it fails, wait another 5 minutes and retry.

What Happens to Webhook Jobs

When the circuit is open, webhook jobs are not discarded—they're queued and will be delivered once the endpoint recovers.

ScenarioBehavior
Jobs created before circuit openedContinue retry schedule (will eventually be discarded after 48h if endpoint doesn't recover)
Jobs created while circuit is openQueued with status pending. Delivered when circuit closes.
Circuit closes (endpoint recovers)All queued jobs are delivered immediately (with rate limiting to avoid overwhelming the endpoint)

48-Hour Delivery Window

Jobs are still subject to the 48-hour delivery window. If an endpoint is down for more than 48 hours, jobs created during that time will be discarded after 48 hours even if still queued.

Manual Circuit Breaker Reset

If you know an endpoint has recovered (e.g., after a deployment), you can manually close the circuit via the API instead of waiting for the automatic test.

Node.js - Reset Circuit Breaker
import fetch from 'node-fetch';

const endpointId = 'ep_abc123';

const response = await fetch(
  `https://api.hookmesh.com/v1/endpoints/${endpointId}/circuit-breaker/reset`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.HOOKMESH_API_KEY}`
    }
  }
);

if (response.ok) {
  console.log('Circuit breaker reset - deliveries will resume');
} else {
  console.error('Failed to reset circuit breaker');
}
Python - Reset Circuit Breaker
import requests
import os

endpoint_id = 'ep_abc123'

response = requests.post(
    f'https://api.hookmesh.com/v1/endpoints/{endpoint_id}/circuit-breaker/reset',
    headers={'Authorization': f'Bearer {os.environ["HOOKMESH_API_KEY"]}'}
)

if response.ok:
    print('Circuit breaker reset - deliveries will resume')
else:
    print('Failed to reset circuit breaker')

When to Reset Manually

Manual reset is useful after deployments, configuration fixes, or when you've verified the endpoint is healthy. The circuit will immediately close and queued jobs will be delivered.

Monitoring Circuit State

Track circuit breaker state via the Endpoints API and webhook events.

Node.js - Check Endpoint Status
const response = await fetch(
  `https://api.hookmesh.com/v1/endpoints/${endpointId}`,
  {
    headers: {
      'Authorization': `Bearer ${process.env.HOOKMESH_API_KEY}`
    }
  }
);

const endpoint = await response.json();

console.log('Status:', endpoint.status);
console.log('Circuit state:', endpoint.circuit_breaker_state);
console.log('Last failure:', endpoint.last_failure_at);
console.log('Consecutive failures:', endpoint.consecutive_failures);

// Alert if circuit is open
if (endpoint.circuit_breaker_state === 'open') {
  console.warn(`⚠ Endpoint ${endpoint.url} has open circuit!`);
  console.warn(`Failing since: ${endpoint.last_failure_at}`);
}

Webhook Events

Hook Mesh can send webhook events to notify you of circuit state changes.

EventDescription
endpoint.circuit_breaker_openedCircuit opened due to failures. Deliveries paused.
endpoint.circuit_breaker_closedCircuit closed after successful test. Deliveries resumed.

Benefits

  • Prevents wasted retries - Don't waste retry attempts on endpoints that are clearly down
  • Protects failing endpoints - Prevents overwhelming endpoints during outages or deployments
  • Automatic recovery - No manual intervention needed - deliveries resume when endpoint recovers
  • Preserves delivery guarantees - Jobs aren't discarded, they're queued until the endpoint is healthy
  • Improves overall reliability - System adapts to endpoint health without manual intervention

Best Practices

  • Monitor circuit state - Set up alerts for endpoint.circuit_breaker_opened events
  • Fix root causes quickly - Investigate why endpoints are failing to prevent circuit breaker triggers
  • Use manual reset after deployments - Don't wait 5 minutes if you know the endpoint is fixed
  • Return proper status codes - 5xx triggers circuit breaker, 4xx doesn't. Use appropriate codes for errors.
  • Test endpoint health - Use the test webhook endpoint to verify health before deploying changes
  • Implement graceful degradation - Handle webhook delivery failures gracefully in your application

Related Documentation