Back to Blog
·Hook Mesh Engineering

Why You Shouldn't Rely on Webhook Ordering

A deep technical analysis of why webhook ordering guarantees are nearly impossible in distributed systems, and practical patterns for building systems that handle out-of-order delivery gracefully.

Why You Shouldn't Rely on Webhook Ordering

Why You Shouldn't Rely on Webhook Ordering

You update a subscription, then cancel it. Your consumer processes the cancellation first, then the update—and the cancelled customer now has an active subscription. This is out-of-order webhook delivery.

Webhook ordering is misunderstood in event-driven architecture. Developers assume sequential events arrive sequentially. They don't. Building systems around this assumption leads to subtle bugs that surface weeks later.

This post explores why ordering is hard, what causes out-of-order delivery, and how to design systems that handle disorder gracefully.

Why Ordering Is Hard: The Distributed Systems Reality

Webhooks operate in a distributed environment where ordering guarantees are extremely difficult—often impossible—to provide. Understanding why requires grasping a few fundamental challenges.

Network unreliability: Two sequential HTTP requests don't arrive sequentially. The first may take a slow path; the second flies through first. Load balancers, CDNs, and geographic distribution worsen this.

Parallel processing for scale: Sequential delivery (wait for completion before next) would require 5,000 seconds to send 10,000 webhooks/sec at 500ms each. Parallel workers are essential but break ordering:

Event A (T=0ms) → Worker 1 → delivers T=520ms Event B (T=10ms) → Worker 2 → delivers T=180ms Result: B arrives 340ms before A

Retries destroy order: When first delivery fails and enters a retry queue, subsequent events succeed immediately and arrive first—unavoidable with essential retry mechanisms:

Event 1: Fails → retry in 30s Event 2: Succeeds → immediate Event 3: Succeeds → immediate Event 1: Retry succeeds → 30s late Arrival order: 2, 3, 1

Why Providers Don't Guarantee Order

Most webhook providers—including Stripe, GitHub, Twilio, and yes, Hook Mesh—explicitly do not guarantee FIFO (first-in, first-out) delivery. This isn't laziness or oversight; it's a deliberate architectural decision.

Guaranteeing order requires:

  1. Sequential delivery per destination (throughput penalty)
  2. Head-of-line blocking (one slow endpoint blocks all)
  3. Complex distributed coordination
  4. Retry complications (block all or break guarantee)

Trade-off: reliability beats ordering. A system that guarantees order but drops events is less useful than one that guarantees delivery but not order.

Some providers offer ordered delivery for high-value cases with throughput penalties. For general-purpose delivery at scale, assume out-of-order arrival.

Real-World Scenarios Where Order Breaks

Understanding specific failure modes helps you design appropriate defenses.

Scenario 1: Rapid State Changes

A user updates their profile three times in quick succession:

T=0ms: user.updated (name: "Alice") T=50ms: user.updated (name: "Alicia") T=100ms: user.updated (name: "Ali")

If the first delivery takes 200ms and the others take 50ms each, arrivals could be:

Arrived: user.updated (name: "Alicia") Arrived: user.updated (name: "Ali") Arrived: user.updated (name: "Alice") // This is now stale!

Processing these in arrival order leaves the user with the wrong name.

Scenario 2: Create-Then-Update Race

An object is created and immediately modified:

T=0ms: order.created (status: "pending") T=10ms: order.updated (status: "confirmed")

If the update arrives first, your system might reject it—the order doesn't exist yet. When the create finally arrives, you have an order stuck in "pending" status.

Scenario 3: Retry-Induced Reversal

A subscription is upgraded, then downgraded:

T=0ms: subscription.updated (plan: "pro") → Fails, retries in 60s T=30s: subscription.updated (plan: "basic") → Succeeds immediately T=60s: subscription.updated (plan: "pro") → Retry succeeds

The customer downgraded to "basic," but after the retry, your system shows "pro."

How to Handle Out-of-Order Delivery

The solution isn't preventing disorder—it's designing systems that handle it correctly.

Strategy 1: Timestamp-Based Ordering

Include a timestamp in every webhook payload and only process events newer than your current state.

import datetime
from dataclasses import dataclass
from typing import Optional

@dataclass
class WebhookEvent:
    event_type: str
    resource_id: str
    timestamp: datetime.datetime
    data: dict

class OrderProcessor:
    def __init__(self, db):
        self.db = db

    def process_webhook(self, event: WebhookEvent) -> bool:
        """Process webhook only if it's newer than current state."""
        current = self.db.get_resource(event.resource_id)

        if current and current.last_updated >= event.timestamp:
            # Event is stale, acknowledge but don't process
            print(f"Skipping stale event: {event.timestamp} <= {current.last_updated}")
            return True  # Return success to prevent retries

        # Event is newer, apply the update
        self.db.update_resource(
            resource_id=event.resource_id,
            data=event.data,
            last_updated=event.timestamp
        )
        return True

This approach requires the webhook provider to include reliable timestamps. Most major providers do—check for fields like created_at, timestamp, or occurred_at in the payload.

Strategy 2: Sequence Numbers

Some providers include monotonically increasing sequence numbers. These are more reliable than timestamps because they're immune to clock skew.

async function processWebhook(event) {
  const { resourceId, sequenceNumber, data } = event;

  const result = await db.query(`
    UPDATE resources
    SET data = $1, sequence_number = $2
    WHERE id = $3
      AND (sequence_number IS NULL OR sequence_number < $2)
    RETURNING id
  `, [data, sequenceNumber, resourceId]);

  if (result.rowCount === 0) {
    console.log(`Skipping event with sequence ${sequenceNumber} - already processed newer`);
  }

  // Always return 200 to acknowledge receipt
  return { status: 200 };
}

The database query atomically checks and updates only if the new sequence number is higher.

Strategy 3: State Machines

For complex state transitions, model your domain as a state machine with explicit valid transitions.

from enum import Enum
from typing import Set

class OrderStatus(Enum):
    PENDING = "pending"
    CONFIRMED = "confirmed"
    SHIPPED = "shipped"
    DELIVERED = "delivered"
    CANCELLED = "cancelled"

VALID_TRANSITIONS: dict[OrderStatus, Set[OrderStatus]] = {
    OrderStatus.PENDING: {OrderStatus.CONFIRMED, OrderStatus.CANCELLED},
    OrderStatus.CONFIRMED: {OrderStatus.SHIPPED, OrderStatus.CANCELLED},
    OrderStatus.SHIPPED: {OrderStatus.DELIVERED},
    OrderStatus.DELIVERED: set(),  # Terminal state
    OrderStatus.CANCELLED: set(),  # Terminal state
}

def process_order_webhook(order_id: str, new_status: OrderStatus, timestamp: datetime):
    order = db.get_order(order_id)

    if order is None:
        if new_status == OrderStatus.PENDING:
            db.create_order(order_id, new_status, timestamp)
        else:
            # Non-creation event for unknown order - queue for later
            queue_for_retry(order_id, new_status, timestamp)
        return

    # Check if transition is valid
    if new_status not in VALID_TRANSITIONS[order.status]:
        # Invalid transition - might be out of order or duplicate
        if timestamp <= order.last_updated:
            return  # Stale event, ignore
        # Log for investigation - might indicate a bug
        log_invalid_transition(order, new_status, timestamp)
        return

    db.update_order(order_id, new_status, timestamp)

Design Patterns for Disorder

Event Sourcing

Store events as source of truth. Replay in correct order when needed.

def append_event(resource_id: str, event: dict):
    """Append event to log regardless of order."""
    db.insert_event(
        resource_id=resource_id,
        event_type=event["type"],
        timestamp=event["timestamp"],
        payload=event
    )

def rebuild_state(resource_id: str) -> dict:
    """Rebuild current state by replaying events in timestamp order."""
    events = db.get_events(resource_id, order_by="timestamp")
    state = {}
    for event in events:
        state = apply_event(state, event)
    return state

Idempotent Operations

Design every handler safely callable multiple times. See webhook idempotency for patterns.

def process_payment_webhook(event: dict):
    event_id = event["id"]

    # Check if already processed
    if db.event_exists(event_id):
        return {"status": "already_processed"}

    # Process with idempotency key
    try:
        with db.transaction():
            db.record_event(event_id)
            apply_payment(event["data"])
    except UniqueViolation:
        # Race condition - another worker processed it
        return {"status": "already_processed"}

    return {"status": "processed"}

Version Vectors

For complex multi-field updates, track versions per field rather than per resource. This allows different fields to be updated independently while still maintaining consistency—each field only accepts updates with a higher version number than what's currently stored.

Practical Advice: Design for Disorder

Don't depend on order—ever. Providers claiming ordered delivery will eventually break (network issues, bugs, edge cases).

Include ordering metadata: Timestamps and sequence numbers help consumers handle disorder.

Make handlers idempotent: Every handler should produce the same result whether called once or ten times.

Test with disorder: Deliberately deliver out of order. Shuffle sequences. Duplicate events. Your system should handle this gracefully.

Monitor for anomalies: Track stale events, out-of-order arrivals, processing conflicts. Spikes indicate upstream issues. Webhook observability makes these visible.

Conclusion

Webhook ordering is a distributed systems problem with no perfect solution. Reliable delivery beats strict ordering.

You don't need ordering guarantees. Design systems handling out-of-order delivery using timestamps, sequence numbers, state machines, and idempotent operations. Build resilient integrations that work regardless of arrival order.

Most providers prioritize reliable delivery over strict ordering—the right trade-off. Embrace this reality. Design for disorder and your integrations will be robust. Understanding these complexities helps build vs buy webhook infrastructure decisions.

Related Posts