What is the right size for background jobs?

In previous post I wrote about pre-generating cache via background jobs. I described an example of an online banking app where we pre-generate cache of recent_transactions. This helps even load on the system by pushing some of the data into cache before visitors come to the site.

One job for all records
One job for each record
Loop through records in slices
Different queues and workers

One job for all records

The simplest design is to loop through all records in one job.

class SomeModel
  def some_method
    Rails.cache.fetch([cache_key, __method__]) do
      # code here
    end    
  end
end
class PreGenerateCacheJob < ApplicationJob
  def perform
    SomeModel.some_filter.each do |record|
      record.some_method
    end
  end
end

The downside of this approach is that if we have millions of MyModel records it can take a very long time for this job to complete. And what if we need to deploy code which restarts background job workers? We won’t know which records have been processed and which have not. Best practices for background jobs recommend keeping them small and idempotent.

One job for each record

We can queue one job per record by separating our code into two jobs.

class EnqueuePreGenerateCacheJob < ApplicationJob
  def perform
    SomeModel.some_filter.each do |record|
      PreGenerateCacheJob.perform_later(record: record)
    end
  end
end
class PreGenerateCacheJob < ApplicationJob
  def perform(record:)
    record.some_method
  end
end

Each job will complete very quickly and they will run in parallel. Since it is not recommended to serialize complete objects into queue we will use some kind of record identifier (like globalid). But this will cause lot of queries against the primary DB to look up records one at a time.

Loop through records in slices

And now we come to the Goldilocks solution - not too big and not too small. We want to break up the process into smaller chunks but instead of processing one record at a time we will process several (let’s say 10).

class EnqueuePreGenerateCacheJob < ApplicationJob
  def perform
    model_ids = SomeModel.some_filter.pluck(:id)
    model_ids.each_slice(10) do |ids_slice|
      PreGenerateCacheJob.perform_later(record_ids: ids_slice)
    end
  end
end
class PreGenerateCacheJob < ApplicationJob
  def perform(record_ids:)
    # query for 10 records at a time
    records = MyModel.where(id: record_ids)
    records.each do |record|
      record.some_method
    end
  end
end

One downside of this approach is that pluck will request IDs for ALL records from the primary DB. Then it will store them in array and loop through them. Different ORMs support batch_size for querying records so we can do equivalent of select id from TableName limit 10 offset ....

Different queues and workers

The same approach can be applied to other situations (not just cache pre-generating). When a record is created/updated we might have a callback (previous post) to update various reports. The primary UpdateReportsJob will be called from after_save callback. We want it to complete as quickly as possibly and queue separate UpdateEachReportJob passing appropriate report ID. We can process these jobs through separate queues.

class UpdateReportsJob < ApplicationJob
  queue_as :high  # must be processed right away
  def perform
    UpdateEachReportJob.perform_later(report_id: some_id)
  end
end
class UpdateEachReportJob < ApplicationJob
  queue_as :low   # lots of small jobs that may take longer to complete
  def perform(report_id:)
    # code here
  end
end

We can even assign dedicated Sidekiq workers to watch only specific queues. Here is sample configuration for capistrano-sidekiq:

# deploy.rb
set :sidekiq_processes, 4
set :sidekiq_options_per_process, [
  "--queue high",
  "--queue default --queue low",
  "--queue default --queue low",
  "--queue default --queue low",
]

This way each server will have a dedicated process watching only the high queue to ensure that those jobs complete as quickly as possible and not get backlogged. The other three workers will process the default (used for other jobs) and low queues (used for reports).

I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.

One job for all records

One job for each record

Loop through records in slices

Different queues and workers