Replaying dead-lettered messages in RabbitMQ

When RabbitMQ dead-letters a message, the payload is intact but stranded — it sits in a dead-letter queue and will never reach a consumer until you move it back. This guide is part of Dead-Letter Queues & Poison-Message Handling within Queue Fundamentals & Architecture, and it walks through the exact procedure for inspecting a dead-letter queue and re-publishing its messages to the original work queue without re-poisoning it or duplicating side effects.

Problem Statement

A deploy shipped a bug that threw on every order with a missing currency field. Your work queue dead-lettered roughly 4,000 of these to orders.dlq over an hour. You have now fixed the handler and redeployed. The messages are valid; they only failed because of a transient code defect. You need to move all 4,000 back to the orders queue so they process normally — once each, with no manual re-entry of payloads, and with confidence that messages already processed on an earlier attempt will not double-charge a customer.

Prerequisites

  • A running RabbitMQ 3.8+ cluster with the management plugin enabled (rabbitmq-plugins enable rabbitmq_management).
  • The work queue already declared with x-dead-letter-exchange pointing at a DLX, and a dead-letter queue bound to it (see the setup below).
  • Management UI or API credentials with access to the relevant vhost.
  • For the scripted approach: Python 3.9+ with pika installed, or the rabbitmq_shovel plugin available.
  • A fixed and redeployed consumer — never replay before the root cause is corrected.
  • An idempotency key on each message (an order ID, a UUID, or similar) so replays are safe.

Step 1 — Confirm the Dead-Letter Topology

Replay only works if the original queue routes failures to a DLX and you know which queue accumulates them. Declare the topology explicitly so the dead-letter queue is durable and named.

# topology.py — idempotent declaration of the work queue + DLX + DLQ
import pika

conn = pika.BlockingConnection(pika.ConnectionParameters("rabbitmq-host"))
ch = conn.channel()

# Dead-letter exchange and its backing queue
ch.exchange_declare(exchange="dlx", exchange_type="direct", durable=True)
ch.queue_declare(queue="orders.dlq", durable=True)            # terminal: NO x-dead-letter-* here
ch.queue_bind(queue="orders.dlq", exchange="dlx", routing_key="orders")

# Work queue routes rejected/expired messages to the DLX
ch.queue_declare(
    queue="orders",
    durable=True,
    arguments={
        "x-dead-letter-exchange": "dlx",
        "x-dead-letter-routing-key": "orders",  # DLQ binds on this key
    },
)
conn.close()

The dead-letter queue itself must have no x-dead-letter-exchange argument — a DLQ that can dead-letter further is a place where messages vanish. It is worth understanding why RabbitMQ put a message here in the first place, because the reason changes how you replay it. RabbitMQ dead-letters a message under exactly three conditions: the consumer rejected it with basic.reject or basic.nack and requeue=false; the message exceeded a per-message or per-queue TTL; or the queue hit a length or byte limit (x-max-length) and overflowed. Only the first of these is a true processing failure. TTL and overflow dead-letters are capacity signals, and replaying them blindly without addressing the underlying backlog will just refill the DLQ. The x-death header's reason field tells you which case you are looking at, which is why Step 2 inspects it before you move anything.

Step 2 — Inspect the Dead-Letter Queue

Before moving anything, look at what is actually there. Use the management API to read messages without consuming them (ackmode=reject_requeue_true puts them back).

# Peek at up to 10 messages without removing them from orders.dlq
curl -s -u guest:guest \
  -X POST http://rabbitmq-host:15672/api/queues/%2F/orders.dlq/get \
  -H "content-type: application/json" \
  -d '{"count":10,"ackmode":"reject_requeue_true","encoding":"auto"}' \
  | python3 -m json.tool

Each entry exposes the payload and, critically, the x-death header, which records every dead-lettering event with a count, reason (rejected, expired, maxlen), the source queue, and a timestamp. Read it to confirm the failures share the root cause you fixed and are not a mix of unrelated problems.

# Current depth of the DLQ — your replay target count
curl -s -u guest:guest \
  http://rabbitmq-host:15672/api/queues/%2F/orders.dlq \
  | python3 -c "import sys,json; print('messages:', json.load(sys.stdin)['messages'])"

Step 3a — Replay With the Shovel Plugin (no code)

For a one-shot bulk move, the dynamic shovel plugin is the simplest tool. It consumes from the DLQ and re-publishes to the default exchange with routing key orders, which lands messages in the orders queue.

rabbitmq-plugins enable rabbitmq_shovel rabbitmq_shovel_management

# Create a dynamic shovel that drains orders.dlq back into orders, then stops
rabbitmqctl set_parameter shovel replay-orders '{
  "src-protocol": "amqp091",
  "src-uri": "amqp://guest:guest@localhost",
  "src-queue": "orders.dlq",
  "src-delete-after": "queue-length",     "_comment": "stop once the DLQ is drained",
  "dest-protocol": "amqp091",
  "dest-uri": "amqp://guest:guest@localhost",
  "dest-exchange": "",
  "dest-routing-key": "orders"
}'

src-delete-after: queue-length snapshots the depth at start and stops after that many messages, so a producer that is still dead-lettering will not cause an endless loop. The shovel is also resilient: it uses publisher confirms internally, reconnects automatically if either broker connection drops, and will not acknowledge a message on the source until the destination has accepted it — so an interrupted shovel resumes without losing or duplicating messages. The trade-off is control: the shovel moves everything indiscriminately at full speed, with no batching, filtering, or header rewriting. If you need any of those, reach for the script in Step 3b. Delete the shovel when finished:

rabbitmqctl clear_parameter shovel replay-orders

Step 3b — Replay With a Controlled Script (batched)

When you want batching, filtering, or to strip the x-death header before re-publishing, a script gives you control. This consumes from the DLQ and republishes in bounded batches, persisting each message.

# replay.py — move messages from orders.dlq back to orders in safe batches
import pika

BATCH = 200  # confirm a batch drains before continuing

conn = pika.BlockingConnection(pika.ConnectionParameters("rabbitmq-host"))
ch = conn.channel()
ch.confirm_delivery()  # publisher confirms: only ack the source after the dest accepts

moved = 0
while moved < BATCH:
    method, props, body = ch.basic_get(queue="orders.dlq", auto_ack=False)
    if method is None:
        break  # DLQ empty

    # Strip x-death so a fresh failure starts a clean count, keep idempotency key
    headers = dict(props.headers or {})
    headers.pop("x-death", None)
    new_props = pika.BasicProperties(
        delivery_mode=2,                 # persistent
        headers=headers,
        message_id=props.message_id,     # preserve idempotency key
        content_type=props.content_type,
    )
    try:
        ch.basic_publish(exchange="", routing_key="orders", body=body, properties=new_props)
        ch.basic_ack(method.delivery_tag)       # remove from DLQ only after confirmed publish
        moved += 1
    except pika.exceptions.UnroutableError:
        ch.basic_nack(method.delivery_tag, requeue=True)  # leave it in the DLQ
        break

print(f"Replayed {moved} messages")
conn.close()

Publisher confirms (confirm_delivery) are the safety hinge: the source message is only acknowledged — and thus deleted from the DLQ — after the broker confirms the republish. If the process dies mid-batch, the in-flight message stays in the DLQ rather than disappearing.

Step 4 — Enforce Idempotency on the Consumer

Some of the replayed messages may have completed part of their work before failing on an earlier attempt. The consumer must treat replays as possible duplicates. Key off the preserved message_id and a deduplication store.

# consumer.py — reject already-processed messages before doing work
import redis
r = redis.Redis()

def on_message(ch, method, props, body):
    key = f"processed:{props.message_id}"
    # SET NX returns False if the key already exists -> already handled
    if not r.set(key, "1", nx=True, ex=86400):   # 24h dedup window
        ch.basic_ack(method.delivery_tag)        # drop the duplicate, do not reprocess
        return
    try:
        process(body)
        ch.basic_ack(method.delivery_tag)
    except Exception:
        r.delete(key)                            # release the key so a real retry can run
        ch.basic_reject(method.delivery_tag, requeue=False)

This is the same principle covered in depth in Preventing duplicate job execution with idempotency and rooted in the broker's at-least-once delivery contract.

Verification

Confirm both queues reached the expected state. The DLQ should drain to zero and the work queue should absorb and then process the replayed messages.

# DLQ should report messages: 0 after a full replay
curl -s -u guest:guest http://rabbitmq-host:15672/api/queues/%2F/orders.dlq \
  | python3 -c "import sys,json; d=json.load(sys.stdin); print('dlq:', d['messages'])"

# Work queue should show messages drained by consumers (deliver rate > 0)
curl -s -u guest:guest http://rabbitmq-host:15672/api/queues/%2F/orders \
  | python3 -c "import sys,json; d=json.load(sys.stdin); \
print('orders:', d['messages'], 'ack_rate:', d.get('message_stats',{}).get('ack_details',{}).get('rate'))"

In your application logs or metrics, the deduplication counter (duplicates dropped) confirms idempotency is working: a healthy replay of partially-processed messages should show some duplicates rejected, not reprocessed.

Gotchas & Edge Cases

Re-poisoning by replaying too early. If the consumer fix is not actually deployed, every replayed message fails again and lands right back in the DLQ — often faster, since there is now a backlog. Always verify the fix on a single message before bulk replay.

Stale x-death inflating counts. If you keep the x-death header on replay and your maxReceiveCount logic reads it, the messages may dead-letter again on the first new failure because their counter is already at the limit. Strip x-death (as the script does) so replayed messages get a fresh retry budget.

Lost messages without publisher confirms. Acknowledging the source message before confirming the republish creates a window where a crash loses the message entirely. Always enable confirm_delivery and ack the DLQ only after the destination confirms.

Routing-key mismatch. Re-publishing to the wrong exchange or routing key silently drops messages (or returns them unroutable). Publishing to the default exchange ("") with the routing key equal to the queue name is the reliable path back into a named queue.

Replay storms overwhelming downstream. Dumping thousands of messages back at once can overload a database or third-party API that just recovered. Batch the replay and watch downstream latency; combine with backpressure strategies for fast producers if needed.

Related