Sidekiq Batch Jobs & Workflows

This guide extends the throughput work in Sidekiq Performance Tuning and the broader Backend Frameworks & Worker Scaling material into multi-job coordination — running hundreds of jobs in parallel and reliably knowing when they have all finished.

A single Sidekiq job is easy. The hard part is the workflow: fan out a thousand image-resize jobs and then send one "your gallery is ready" email only after every resize succeeds. The naive approaches all break — counting completed jobs in Redis with INCR races under concurrency, and polling for "is the queue empty" gives false positives the instant a worker picks up the last job. The symptom is a completion callback that fires too early (before all children are done) or never (because a counter was lost on a retry). This page shows the reliable patterns: Sidekiq::Batch from Sidekiq Pro for fan-out/fan-in with on(:success) and on(:complete) callbacks, an open-source equivalent for teams without Pro, batch progress tracking, and chaining dependent jobs into multi-stage pipelines.

Prerequisites

  • A working Sidekiq install (Sidekiq 7.x assumed) with Redis configured. See Tuning the Sidekiq Redis connection pool — batches push many jobs at once and lean on the client pool.
  • For native batches: a Sidekiq Pro license and the sidekiq-pro gem. For the open-source path: the gush gem or a small custom coordinator (both shown).
  • Idempotent child jobs — batch jobs retry like any other, so a child may run more than once.
  • Enough worker concurrency to actually run the fan-out in parallel; a batch of 1,000 jobs against a single thread is just a slow loop.

Step 1 — Fan out work with Sidekiq::Batch (Pro)

Sidekiq::Batch groups jobs pushed inside its jobs block and tracks their collective completion in Redis. Define the batch, attach a callback class, then enqueue children inside the block.

# app/services/gallery_processor.rb
class GalleryProcessor
  def self.run(gallery_id, image_ids)
    batch = Sidekiq::Batch.new
    batch.description = "Resize gallery #{gallery_id}"
    batch.on(:success, GalleryCallbacks, gallery_id: gallery_id)
    batch.jobs do
      image_ids.each { |id| ResizeImageJob.perform_async(id) }  # fan-out
    end
    batch.bid  # the batch id, persist this to query progress later
  end
end

Every perform_async issued inside batch.jobs is registered as a child of the batch. Sidekiq increments and decrements the batch's pending counter atomically in Redis as children complete, sidestepping the race conditions of a hand-rolled counter.

Step 2 — React with on(:success) and on(:complete) callbacks

The two callback events have a critical difference. :complete fires when every child has finished or exhausted its retries (some may have failed permanently). :success fires only when every child succeeded with no failures. Use :success for "all-or-nothing" workflows and :complete for "we're done regardless" cleanup.

# app/callbacks/gallery_callbacks.rb
class GalleryCallbacks
  def on_success(status, options)
    # every child succeeded — safe to publish the result
    GalleryMailer.ready(options["gallery_id"]).deliver_later
  end

  def on_complete(status, options)
    # all children done; some may have failed permanently
    if status.failures > 0
      AdminAlert.batch_partial_failure(options["gallery_id"], status.failures)
    end
  end
end

The status object carries total, pending, and failures counts, so a single callback can branch on whether the workflow was fully or partially successful.

Step 3 — Nest batches for fan-out/fan-in stages

Real workflows have stages: process all items, then aggregate. You can open a new batch inside a parent batch's success callback, so the aggregation stage starts only after the processing stage fully succeeds. This is fan-in.

# app/callbacks/gallery_callbacks.rb
class GalleryCallbacks
  def on_success(status, options)
    # processing stage done -> start the aggregation stage as a child batch
    aggregate = Sidekiq::Batch.new
    aggregate.on(:success, FinalizeCallbacks, gallery_id: options["gallery_id"])
    aggregate.jobs do
      BuildGalleryManifestJob.perform_async(options["gallery_id"])
      GenerateThumbnailSpriteJob.perform_async(options["gallery_id"])
    end
  end
end

Nesting gives you a dependency graph without polling: each stage's success callback is the trigger for the next, and Sidekiq guarantees the callback fires exactly once when the stage's counter reaches zero.

Step 4 — Open-source alternative without Sidekiq Pro

Without Pro, use the gush gem, which models workflows as an explicit DAG on top of Sidekiq (ActiveJob). Declare jobs and their dependencies; gush handles fan-out/fan-in ordering.

# app/workflows/gallery_workflow.rb
class GalleryWorkflow < Gush::Workflow
  def configure(gallery_id, image_ids)
    resize_jobs = image_ids.map do |id|
      run ResizeImageJob, params: { image_id: id }     # fan-out, run in parallel
    end
    # manifest runs only after ALL resize jobs finish (fan-in)
    run BuildManifestJob, params: { gallery_id: gallery_id }, after: resize_jobs
  end
end

# kick it off
flow = GalleryWorkflow.create(gallery_id, image_ids)
flow.start!

If you want zero extra dependencies, a minimal coordinator works for simple fan-in: track remaining children with Redis DECR (which is atomic) and trigger the finalizer when it hits zero.

# app/jobs/resize_image_job.rb
class ResizeImageJob
  include Sidekiq::Job
  def perform(image_id, batch_key)
    ImageResizer.call(image_id)
    # atomic decrement avoids the read-modify-write race
    remaining = Sidekiq.redis { |r| r.decr(batch_key) }
    FinalizeGalleryJob.perform_async(batch_key) if remaining.zero?
  end
end

The DECR approach is reliable for counting but lacks Pro's failure accounting — a child that exhausts retries never decrements, so pair it with a dead-letter strategy so a stuck child doesn't wedge the finalizer forever.

Step 5 — Track batch progress

For a progress bar or status endpoint, query the batch status by its id. Pro exposes Sidekiq::Batch::Status; the custom path reads the Redis counter.

# app/controllers/batches_controller.rb
def show
  status = Sidekiq::Batch::Status.new(params[:bid])
  render json: {
    total:    status.total,
    pending:  status.pending,
    failures: status.failures,
    complete: status.complete?,
    percent:  ((status.total - status.pending) * 100.0 / status.total).round(1),
  }
end

Persist the bid returned in Step 1 alongside the owning record (the gallery row) so the UI can look up progress without scanning Redis.

Step 6 — Chain dependent jobs

When stage B simply needs to run after stage A and there is no fan-out, you do not need a batch — just enqueue the next job from the end of the first. Keep the chain explicit so retries of stage A re-trigger stage B correctly.

# app/jobs/import_job.rb
class ImportJob
  include Sidekiq::Job
  def perform(file_id)
    rows = Importer.call(file_id)
    # enqueue the next stage only after this one's work is committed
    ValidateImportJob.perform_async(file_id, rows)
  end
end

For chains with retries, make each link idempotent so a redelivered ImportJob does not enqueue duplicate ValidateImportJob runs — use a unique job key or a guard on already-imported state.

Verification

Confirm the workflow coordinates correctly before relying on it.

Watch the batch drain and the callback fire in the Sidekiq logs:

# tail the worker log; expect child job completions then the on_success callback line
bundle exec sidekiq -C config/sidekiq.yml

Assert the success callback fires exactly once after all children, in a test:

# spec/services/gallery_processor_spec.rb
it "publishes only after every child succeeds" do
  Sidekiq::Testing.inline! do
    expect(GalleryMailer).to receive(:ready).once  # callback fires once, not per child
    GalleryProcessor.run(gallery.id, [1, 2, 3])
  end
end

Inspect live batch state from the console:

# rails console
status = Sidekiq::Batch::Status.new(bid)
puts "#{status.pending} of #{status.total} pending, #{status.failures} failed"

Gotchas & edge cases

  • on(:complete) fires even with failures. Using it as the "everything worked" trigger publishes results when some children failed permanently. Use on(:success) for all-or-nothing workflows.
  • Children must be idempotent. A batch child retries like any Sidekiq job; a retried resize that runs twice must not corrupt state or double-decrement a custom counter.
  • A custom DECR counter never recovers from a dead child. If a child exhausts retries it never decrements, so the finalizer never fires. Decrement in an exhausted-retries handler too, or use Pro's failure-aware batches.
  • Pushing thousands of children in one batch.jobs block bursts the client Redis pool. Enqueue in chunks and ensure the client pool is sized for the burst — see Tuning the Sidekiq Redis connection pool.
  • Batch metadata expires. Sidekiq Pro batches have a Redis TTL (default ~30 days). Querying progress for an old bid returns an empty status, not an error — handle the nil-ish case in status endpoints.

Related