From 0 to 10K Webhooks: Scaling Your First Implementation

Your MVP works. Then you land an enterprise customer or hit Hacker News—and everything breaks.

Scaling webhooks is predictable. The same problems break systems at the same volume thresholds. This guide maps that journey from first webhook to 10,000/hour, explaining what breaks at each phase and how to fix it before customers complain.

Phase 1: 0-100 Webhooks/Hour

Simplicity wins. Handful of customers, goal is shipping features.

Starting Architecture

Most teams use synchronous delivery: event occurs, HTTP POST, wait for response:

async function sendWebhook(endpoint, payload) {
  try {
    const response = await fetch(endpoint.url, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(payload),
      timeout: 30000
    });
    await logDelivery(endpoint, payload, response.status);
  } catch (error) {
    await logFailure(endpoint, payload, error);
  }
}

Easy to understand, debug, no infrastructure overhead.

What Works

Synchronous delivery from main app
Simple database logging
Manual retry on customer reports
Basic error logging

Warning Signs

Watch for:

API response times increasing during webhook peaks
Customer complaints about delays/missing webhooks
Timeouts on slow customer endpoints

Phase 2: 100-1K Webhooks/Hour

Real customers depend on webhooks. Reliability becomes non-negotiable.

What Breaks: Synchronous Delivery

One slow customer endpoint blocks processing for everyone. Your API slows, background jobs back up, users notice.

The Fix: Introduce a Queue

Move delivery asynchronous. Push to queue, return immediately. Separate worker handles delivery:

// Event occurs - push to queue instantly
async function onOrderCompleted(order) {
  await webhookQueue.push({
    type: 'order.completed',
    payload: serializeOrder(order),
    endpoints: await getSubscribedEndpoints(order.customerId)
  });
}

// Worker processes queue independently
async function processWebhookJob(job) {
  for (const endpoint of job.endpoints) {
    await attemptDelivery(endpoint, job.payload);
  }
}

Decoupling is transformative. Main app stays fast. Customer endpoint problems become isolated incidents.

Retry Logic

With a queue, implement proper retries. Exponential backoff prevents hammering failed endpoints:

const RETRY_DELAYS = [60, 300, 1800, 7200, 86400]; // 1m, 5m, 30m, 2h, 24h

async function attemptDelivery(endpoint, payload, attempt = 0) {
  try {
    const response = await deliverWebhook(endpoint, payload);
    if (response.ok) {
      await markDelivered(endpoint, payload);
    } else if (attempt < RETRY_DELAYS.length) {
      await scheduleRetry(endpoint, payload, attempt + 1);
    } else {
      await markFailed(endpoint, payload);
    }
  } catch (error) {
    if (attempt < RETRY_DELAYS.length) {
      await scheduleRetry(endpoint, payload, attempt + 1);
    } else {
      await markFailed(endpoint, payload);
    }
  }
}

Metrics to Track

Delivery rate (target: 99%+)
Latency p95 (target: <5 sec)
Failure rate (target: <0.1%)
Queue depth (should stay low)

Climbing queue depth or dropping delivery rate signals problems early.

Phase 3: 1K-10K Webhooks/Hour

Scaling reveals infrastructure limits. Theoretical problems become urgent.

Challenge 1: Database Load

Every attempt generates writes. At 10K/hour with 3 retries: 30K+ ops/hour.

Solutions:

Batch writes instead of individual inserts
Use time-series DB for logs (optimized for append-heavy)
Rotate and archive logs (no need for instant 6-month access)
Separate webhook logging from main DB

Challenge 2: Retry Storms

Customer endpoint down 1 hour. All queued retries fire simultaneously when it recovers. Thousands of requests in seconds. Overwhelms queue, delays new webhooks, crashes recovering endpoint.

Solutions:

Add jitter to retry timing
Per-endpoint rate limiting
Circuit breakers for consistently failing endpoints
Gradually ramp up delivery on recovery

// Circuit breaker pattern
async function checkEndpointHealth(endpoint) {
  const recentFailures = await getRecentFailures(endpoint, '5m');
  if (recentFailures > 10) {
    await pauseEndpoint(endpoint, '15m');
    await notifyCustomer(endpoint, 'Endpoint paused due to failures');
    return false;
  }
  return true;
}

Challenge 3: Queue Management

Simple Redis queues struggle at 10K. Need visibility, prioritization, graceful backlog handling.

Considerations:

Dead letter queues for permanent failures
Priority lanes for time-sensitive events
Monitor consumer lag, scale workers
Consider managed queues for auto-scaling

Monitoring & Alerting

Proactive alerting essential at this volume:

Alert if delivery rate drops below 98%
Alert on p95 latency >30 sec
Alert on sustained high queue depth
Alert on high-value customer failures

Set up on-call rotations. Webhook failures at 3 AM affect customers' businesses.

When DIY Stops Making Sense

Teams discover:

Maintenance grows faster than expected. Edge cases (slow endpoints, DNS failures, SSL problems) accumulate complexity.

Customer expectations increase. Enterprise customers expect 99.99% delivery, detailed logs, same-day resolution.

Opportunity cost rises. Engineers debugging retry logic could ship differentiating features.

Math changes at 10K/hr: Managed service costs a few hundred $/month. Less than hours of engineering time—and you spend far more on maintenance.

Signs to Consider Managed Services

Spending >few hours/week on webhook infrastructure
Customers report undiagnosable delivery issues
Scaling queue workers is routine
Building features services already provide
Engineer departure creates knowledge gap

Conclusion

Scaling from 0 to 10K/hour is predictable. Synchronous works at launch, breaks when reliability matters. Simple queues scale poorly. DIY systems become distractions from core product.

Every startup reaches the point where webhook infrastructure stops being an advantage and becomes a tax on engineering time. Recognizing that transition is the difference between smooth and painful scaling.

Queue your delivery, implement retries, monitor, and know when to rent expertise rather than build it. Your customers depend on reliable webhooks. How you deliver that reliability matters less than delivering it.

From 0 to 10K Webhooks: Scaling Your First Implementation

From 0 to 10K Webhooks: Scaling Your First Implementation

Phase 1: 0-100 Webhooks/Hour

Starting Architecture

What Works

Warning Signs

Phase 2: 100-1K Webhooks/Hour

What Breaks: Synchronous Delivery

The Fix: Introduce a Queue

Retry Logic

Metrics to Track

Phase 3: 1K-10K Webhooks/Hour

Challenge 1: Database Load

Challenge 2: Retry Storms

Challenge 3: Queue Management

Monitoring & Alerting

When DIY Stops Making Sense

Signs to Consider Managed Services

Conclusion

Related Posts

MVP Webhook Architecture: Start Simple, Scale Later

Circuit Breakers for Webhooks: Protecting Your Infrastructure

Webhook Rate Limiting: Strategies for Senders and Receivers

Build vs Buy: Should You Build Webhook Infrastructure In-House?

Webhook Observability: Logging, Metrics, and Distributed Tracing