Recently I was finally able to implement background job importer (see previous post on Importing LOTS of data). It proved to be very interesting and challenging. I have been successfully running Sidekiq in production for several months but only individual jobs. For this I needed to run job batches, temporarily store results of each job and email results when all jobs were done.
Sidekiq Pro supports concept of batches but we did not need all the extra features. Plus batching was only part of our challenge. I also came cross active_job_status gem but did not use it. So here is the solution that I went with.
When user uploads a spreadsheet with records I parse it one row at a time and queue up each job. But first I setup these batch parameters:
After processing each row of data I get results and run this code. This could be called via after_perform ActiveJob callback
I used axlsx gem to create output XLSX file with success_sheet and error_sheet.
So far we tested it on several imports (biggest was over 50K rows) and Sidekiq just worked through each job and emailed results when done. Much better than the old solution where a server reset due to deploy stopped the import process.
The slow part turned out to be queueing of the jobs, I might turn it into a background job itself. I also plan to setup different priority queues as we have other jobs running all the time don’t want this process to block them.