In building complex applications we can use callbacks to perform additonal actions before / after records are created / updated / deleted. The challenge is that these callbacks often fire additional DB queries which slows down with scale.
How can we use background jobs to separate saving the primary record from the secondary process of creating/updating other records?
First let’s look at a really simple example. counter_cache is a common pattern of pre-generating data.
Behind the scenes creating / deleting
Article fires updates to increment / decrement
User.articles_count. Now we can sort users by the number of articles they have written w/o having to do
User table can be a problem too so there is an interesting library counter-cache which queues these updates in the background.
But counter cache updates are pretty fast and we have to reach REALLY large scale before it starts impacting system performance. Let’s look at a more real world example.
More complex DB updates
At my day job I work on a fundraising platform where on behalf of our customers (large universities) we send emails to prospective donors asking for donations . Here are our basic models (rather oversimplified):
In a previous post I wrote how we use background jobs to increment opens and clicks so our customers can see which recepients interacted with emails. But the final step in the conversion process is whether specific email resulted in user donating money to the fundraiser.
To do that we built a callback in our
These queries take time and we do not want to keep the user waiting during donation process. We created a background job with ActiveJob.
We also want the queueing to be as fast as possible otherwise it can still slow down our primary DB update. For that we use Sidekiq which in turn uses Redis. Alternatively we could have used other queueing solutions such as AWS SQS.
This creates a small delay between the primary record creation and the time when the summarized data is updated but for us that time differnce is insignificant.
The same pattern can be extended to many other tasks (such as generating reporting data in separate OLAP DB). Instead of running few large periodic jobs we can constantly run lots of small jobs. Data is more in sync between different systems AND there is less likely to be issues of large job not completing in time before the next job is scheduled to start.