I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Batch email sending
In our application we send out LOTS of emails. And our clients need to control when the emails are sent and the exact content. Here is a previous post on how we attempted to solve it. We later switched to use SendGrid bulk sending API to avoid making individual API calls for every email.
Here are the basic models implemented with Mongoid:
# app/models/client.rb
class Client
has_many :users
has_many :newsletters
end
# app/models/user.rb
class User
belongs_to :client
field :email, type: String
end
# app/models/newsletter.rb
class Newsletter
belongs_to :client
field :subject, type: String
field :body, type: String
field :sent_at, type: Time
field :status, type: String, default: :draft
extend Enumerize
enumerize :status, in: [:draft, :approved, :sending, :sent]
scope :to_send, ->{ where(status: :approved, :sent_at.lte => DateTime.now) }
end
We created a simple job and cron it to run every 5 minutes SendNewslettersJob.perform_later
. If there are no newsletters to send, it does nothing.
# app/jobs/send_newsletters_job.rb
class SendNewslettersJob < ApplicationJob
def perform
Newsletter.to_send.each do |newsletter|
newsletter.update(status: :sending)
Sendgrd.new.perform newsletter
newsletter.update(status: :sent)
end
end
end
# app/services/sendgrid.rb
class Sendgrd
def perform newsletter
users = newsletter.client.users
# pass user emails and text to SendGrid API
end
end
The problem with this approach is that newsletter might go to 100 users or 100K users. And the process runs sequentially so one large sending can delay others. And it’s best to pass email addresses to SendGrid in reasonable sizes chunks (say 100 at a time).
The first step is to break up each newsletter sending into separate job so they can run in parallel.
# app/jobs/send_newsletters_job.rb
class SendNewslettersJob < ApplicationJob
def perform
Newsletter.to_send.each do |newsletter|
newsletter.update(status: :sending)
SendEachNewsletterJob.perform_later newsletter
end
end
end
# app/jobs/send_each_newsletter_job.rb
class SendEachNewsletterJob < ApplicationJob
def perform newsletter
Sendgrd.new.perform newsletter
newsletter.update(status: :sent)
end
end
Next let’s change it so each sending goes to a group of 100 users.
# app/jobs/send_each_newsletter_job.rb
class SendEachNewsletterJob < ApplicationJob
def perform newsletter
user_ids = newsletter.client.users.pluck(:_id)
user_ids.in_groups_of(100).each do |user_id_group|
SendNewsletterUserGroupJob.perform_later(newsletter, user_id_group)
end
newsletter.update(status: :sent)
end
end
# app/jobs/send_newsletter_user_group_job.rb
class SendNewsletterUserGroupJob < ApplicationJob
def perform newsletter, users_ids
# sendgrid code here
end
end
One problem with this approach is newsletter.update(status: :sent)
. We did not actually send the emails to the users yet, the jobs are simply queued. What we really want to do is run each sending job and update newsletter status when the last job completes.
We need to record the IDs of all individual jobs in the batch. I like using Redis for storing this kind of ephemeral data. For unique list of IDs Redis SETs are a good data structure.
# config/initializer/redis.rb
redis_conn = Redis.new(host: 'localhost', port: 6379, db: 0)
SEND_NEWSLTTER_BATCH = Redis::Namespace.new('news_batch', redis: redis_conn)
We create uniuque batch_id, grab job_id and record them using SADD.
# app/jobs/send_each_newsletter_job.rb
class SendEachNewsletterJob < ApplicationJob
def perform newsletter
newsletter.update(status: :sending)
user_ids = newsletter.client.users.pluck(:_id)
batch_id = SecureRandom.uuid
user_ids.in_groups_of(100).each do |user_id_group|
job = SendNewsletterUserGroupJob.perform_later(newsletter, user_id_group, batch_id)
# record job ID in Redis SET
SEND_NEWSLTTER_BATCH.sadd(batch_id, job.job_id)
end
end
end
Now in each sending job upon completion we can remove its own job ID from Redis and check whether there are other jobs left.
# app/jobs/send_newsletter_user_group_job.rb
class SendNewsletterUserGroupJob < ApplicationJob
after_perform :batch_tasks
def perform newsletter, users_ids, batch_id
...
end
private
def batch_tasks
# remove own ID
SEND_NEWSLTTER_BATCH.srem(batch_id, self.job_id)
# check if other IDs are present
if SEND_NEWSLTTER_BATCH.scard(batch_id) == 0
newsletter.update(status: :sent)
SEND_NEWSLTTER_BATCH.del batch_id
end
end
end
We can now consolidate our jobs so SendNewslettersJob
calls SendNewsletterUserGroupJob
directly.
class SendNewslettersJob < ApplicationJob
def perform
Newsletter.to_send.each do |newsletter|
newsletter.update(status: :sending)
user_ids = newsletter.client.users.pluck(:_id)
batch_id = SecureRandom.uuid
user_ids.in_groups_of(100).each do |user_id_group|
job = SendNewsletterUserGroupJob.perform_later(newsletter, user_id_group, batch_id)
# record job ID in Redis SET
SEND_NEWSLTTER_BATCH.sadd(batch_id, job.job_id)
end
end
end
end
Also, here is a relevant post on using Sidekiq batches for data import.