I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Sidekiq batches
Recently I was finally able to implement background job importer (see previous post on Importing LOTS of data). It proved to be very interesting and challenging. I have been successfully running Sidekiq in production for several months but only individual jobs. For this I needed to run job batches, temporarily store results of each job and email results when all jobs were done.
Sidekiq Pro supports concept of batches but we did not need all the extra features. Plus batching was only part of our challenge. I also came cross active_job_status gem but did not use it. So here is the solution that I went with.
When user uploads a spreadsheet with records I parse it one row at a time and queue up each job. But first I setup these batch parameters:
def setup_batch_params num_rows, current_user
@batch_id = "#{Time.now.to_i}_#{current_user.id}"
REDIS.set("#{@batch_id}:size", batch_size) # total size
REDIS.set("#{@batch_id}:counter", batch_size) # decremented by each job
REDIS.set("#{@batch_id}:owner", current_user.email) # email results to
end
After processing each row of data I get results and run this code. This could be called via after_perform ActiveJob callback
def after_process_row row, result
# result could be success or error, store data in Redis lists
REDIS.rpush("#{@batch_id}:#{result}", row.to_json)
# decrement batch_counter
REDIS.decr(@batch_counter)
# check if batch_size is 0, last job completed
after_process_batch if REDIS.get("#{@batch_id}:counter").to_i <= 0
end
I used axlsx gem to create output XLSX file with success_sheet and error_sheet.
def after_process_batch
REDIS.lrange("#{@batch_id}:success", 0, -1).each do |record|
# process success queue
success_sheet.add_row(JSON.parse(record))
end
REDIS.lrange("#{@batch_id}:error", 0, -1).each do |record|
# process error queue
error_sheet.add_row(JSON.parse(record))
end
# lookup email address and send results
email_to = REDIS.get("#{@batch_id}:owner"))
YourMailer.send_results(...)
# set expiration for 5 batch keys, useful to keep them around just in case
REDIS.expire("#{@batch_id}:size", 60*60*24*7)
REDIS.expire("#{@batch_id}:counter", 60*60*24*7)
...
end
So far we tested it on several imports (biggest was over 50K rows) and Sidekiq just worked through each job and emailed results when done. Much better than the old solution where a server reset due to deploy stopped the import process.
The slow part turned out to be queueing of the jobs, I might turn it into a background job itself. I also plan to setup different priority queues as we have other jobs running all the time don’t want this process to block them.