I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Model Callbacks and Background Jobs
In building complex applications we can use callbacks to perform additonal actions before / after records are created / updated / deleted. The challenge is that these callbacks often fire additional DB queries which slows down with scale.
How can we use background jobs to separate saving the primary record from the secondary process of creating/updating other records?
Counter caches
First let’s look at a really simple example. counter_cache is a common pattern of pre-generating data.
class User
has_many :articles
field :articles_count, type: Integer
end
class Articles
belongs_to :user, counter_cache: true
end
Behind the scenes creating / deleting Article
fires updates to increment / decrement User.articles_count
. Now we can sort users by the number of articles they have written w/o having to do join
and group_by
to Articles
table.
Constantly updating User
table can be a problem too so there is an interesting library counter-cache which queues these updates in the background.
But counter cache updates are pretty fast and we have to reach REALLY large scale before it starts impacting system performance. Let’s look at a more real world example.
More complex DB updates
At my day job I work on a fundraising platform where on behalf of our customers (large universities) we send emails to prospective donors asking for donations . Here are our basic models (rather oversimplified):
class User
field :name
field :email
has_many :donations
has_many :emails
end
class Fundraiser
field :name
has_many :donations
has_many :emails
end
class Email
belongs_to :user
belongs_to :fundraiser
field :opens, type: Integer
field :clicks, type: Integer
field :donated, type: Boolean
end
class Donation
field :amount, type: Money
belongs_to :user
belongs_to :fundraiser
end
In a previous post I wrote how we use background jobs to increment opens and clicks so our customers can see which recepients interacted with emails. But the final step in the conversion process is whether specific email resulted in user donating money to the fundraiser.
To do that we built a callback in our Donation
model.
class Donation
after_save { update_email_donation }
def update_email_donation
# check if donation was succesfully processed by the credit card processor
# use the unique email_id in the URL to find email record
# if email_id is blank/invalid query if the email address used during
# donation matches existing user and look for an email record
if email_record.present?
email_record.update(donated: true)
end
end
end
These queries take time and we do not want to keep the user waiting during donation process. We created a background job with ActiveJob.
class Donation
after_save { update_email_donation }
def update_email_donation
# if donation was processed succesfully
UpdateEmailDonationJob.perform_later(donation: self)
end
end
class UpdateEmailDonationJob < ApplicationJob
def perform(donation:)
# same logic to query DB and update record
end
end
We also want the queueing to be as fast as possible otherwise it can still slow down our primary DB update. For that we use Sidekiq which in turn uses Redis. Alternatively we could have used other queueing solutions such as AWS SQS.
This creates a small delay between the primary record creation and the time when the summarized data is updated but for us that time differnce is insignificant.
The same pattern can be extended to many other tasks (such as generating reporting data in separate OLAP DB). Instead of running few large periodic jobs we can constantly run lots of small jobs. Data is more in sync between different systems AND there is less likely to be issues of large job not completing in time before the next job is scheduled to start.