I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Redis and async microservices - part deux
I previously wrote about Microservices with Sidekiq and Redis and async microservices. In this post I will continue expanding on those ideas.
I spent a number of years working in internet advertising and will use relevant examples from my past experience (appropriately abstracted into more general use cases). A large scale ad platform can serve billions of ads and process millions of clicks per day. You need to be able to quickly cap accounts as they run out of budget. When end user clicks on the ad the request goes to the click server which records the click and forwards end user to the destination.
You also need UI to manage ads. Typical ad will contain the following attributes: CPC (cost per click), budget, title, body and link to the destination site. For each click you usually track IP, User Agent, URL of the page where click took place and when it happened.
Separately you track impressions but you can aggregate data by hour to see how often the ad was shown in that period of time. Recoding each impression will put significant load on your DB. It also can be useful to aggregate which keywords you are getting ad requests for. All this information helps you analyze ad performance. To demo these concepts I built a sample app.
UI
It is build on top of Rails 5 with SQL DB and RailsAdmin CRUD dashboard so you can view the Ads
, Clicks
and Impressions
tables at http://localhost:3001/admin
. After cloning the repo you need to cd ui && bundle && rake db:seed && rails s -p 3001
. You can then login with admin@email.com / password
.
The basic models are:
class Ad < ApplicationRecord
has_many :clicks
has_many :impressions
end
class Click < ApplicationRecord
# records every click
belongs_to :ad
validates :ad, presence: true
end
class Impression < ApplicationRecord
# records impressions counter for ad, date and hour
belongs_to :ad
validates :ad, presence: true
end
Ad Server
It is built using Rails 5 API and only talks to Redis (not SQL DB). After cloning the repo you need to cd adserver && bundle && rails s
. To keep controller light we move the logic into GetAds
service object.
# app/controllers/ad_controller.rb
class AdController < ApplicationController
def index
ads = GetAds.new(keyword: request.params[:kw]).perform
render json: ads
end
end
# config/initializers/redis.rb
redis_conn = Redis.new(host: Rails.application.config.redis_host, port: 6379, db: 0, driver: :hiredis)
# I prefer using namespaces to separate Redis keys
REDIS_ADS = Redis::Namespace.new('ads', redis: redis_conn)
# app/services/get_ads.rb
class GetAds
# you will see later why @keyword and @ads are instance variables
def initialize(keyword:)
@keyword = keyword
end
def perform
@ads = REDIS_ADS.smembers @keyword
return @ads
end
end
Ads are stored in Redis SET with keyword
as key and various ads as SET members. When you browse to http://localhost:3000/?kw=keyword1
Ad controller will respond with JSON:
[
"{"ad_id":11,"title":"title 1","body":"body 1","cpc":4,
"link":"http://localhost:3000/click?ad_id=11&url=aHR0cDovL3dlYnNpdGU3LmNvbQ}",
...
]
url
param in link
is a simple Base64 encoding of the destination URL for that ad. In real ad server you would have complex logic to show the best match that is most likely result in a click.
Ads Cache
Redis is a great cache for storing ads. To populate it we utilize a callback in UI app Ad model.
class Ad < ApplicationRecord
after_save :update_ads_cache
private
def update_ads_cache
# => check if any important attributes changed
return unless keywords_changed? or cpc_changed? or budget_changed?
or title_changed? or body_changed? or link_changed?
# keywords are comma separated strings
REDIS_ADS.pipelined do
keywords.split(',').each do |kw|
REDIS_ADS.srem kw, ad_content # => remove ad
# => insert ad if there is budget
REDIS_ADS.sadd kw, ad_content if budget > 0
end
end
end
def ad_content
# => encode link into redirect URL
{ad_id: id, title: title, body: body, cpc: cpc, link: redirect_url}.to_json
end
def redirect_url
query = { ad_id: id, url: Base64.encode64(link) }.to_query
"http://localhost:3000/click?#{query}"
end
end
REDIS_ADS.sadd
and REDIS_ADS.srem
will add / remove appropriate ads. SETS allow us to have max 4294967295 ads per keyword and time complexity for SADD is O(N).
Click Processing
When end user clicks the link http://localhost:3000/click?ad_id=88&url=aHR0cDovL3dlYnNpdGU3LmNvbQ
the request is routed to Click controller (part of Adserver but could be inside UI or a separate app).
class ClickController < ApplicationController
def index
ProcessClickJob.perform_later({ad_id: request.params[:ad_id]})
redirect_url = Base64.decode64(request.params[:url])
redirect_to redirect_url
end
end
class ProcessClickJob < ApplicationJob
queue_as :click
def perform(ad_id:)
# simply queue the job
end
end
Notice the special click
queue which you can set to high priority in Sidekiq. Queueing the job with Redis/Sidekiq is very fast. To actually process the click we have ProcessClickJob
in UI app. In true microservice architecture it could be a separate application. This records the click and decrements ad budget (which triggers update_ads_cache
).
class ProcessClickJob < ApplicationJob
queue_as :click
def perform(*args)
ad_id = args.first[:ad_id].to_i
ad = Ad.find(ad_id)
# decrement ad budget
ad.update(budget: ad.budget - ad.cpc)
# record the click
ad.clicks.create
end
end
Data storage in Redis
So now we have seen how data flows between UI and Ad Server via Redis. From UI there is a direct access to Redis API via model callback. From Ad Server a Sidekiq background job is queued. But we also want to aggregate stats on how many impressions we served and which keywords are getting requests. How can Redis help us with that?
Temporary data storage
We add a method to GetAds
class in AdServer. It loops through @ads
and increments Redis counters that look like this AD_ID:20160922:HOUR
. Redis helps us count impressions with minimum impact to ad serving.
# config/initializers/redis.rb
REDIS_IMPR = Redis::Namespace.new('impr', redis: redis_conn)
# app/services/get_ads.rb
class GetAds
def perform
@ads = REDIS_ADS.smembers @keyword
record_impressions
return @ads
end
private
# keep track of number of impressions for each by hour. Data gets moved into main DB
def record_impressions
# => current date and hour
date_hour = Time.now.strftime("%Y%m%d:%H")
REDIS_IMPR.pipelined do
@ads.each do |ad|
# => grab ad_id from each JSON
ad2 = JSON.parse(ad)
ad_id = ad2['ad_id']
key = [ad_id, date_hour].join(':')
REDIS_IMPR.incr key
end
end
end
end
Inside UI app we create an hourly job. It will move data from temporary Redis storage into permanent SQL DB Impressions table.
# config/initializers/redis.rb
REDIS_IMPR = Redis::Namespace.new('impr', redis: redis_conn)
# app/jobs/process_impression_job.rb
class ProcessImpressionJob < ApplicationJob
queue_as :low
def perform
REDIS_IMPR.keys.each do |key|
counter = REDIS_IMPR.get(key)
# split 459:20160922:17 ad_id:date:hour
key2 = key.split(':')
ad_id = key2[0]
date = key2[1]
hour = key2[2]
ad = Ad.find(ad_id)
# => create impression record in main DB
ad.impressions.create(date: date, hour: hour, counter: counter)
REDIS_IMPR.del(key) # => delete the key
end
end
end
Permanent data storage
But we also want to track which keywords are getting requested at least once a week. We add another method to GetAds
. This time the key is keyword and value is the counter.
# config/initializers/redis.rb
REDIS_KW = Redis::Namespace.new('kw', redis: redis_conn)
# app/services/get_ads.rb
class GetAds
def perform
@ads = REDIS_ADS.smembers @keyword
record_impressions
record_keyword
return @ads
end
private
# keep track which keywords get requested at least once a week, data remains in Redis
def record_keyword
REDIS_KW.pipelined do
REDIS_KW.incr @keyword
REDIS_KW.expire @keyword, 1.week.to_i
end
end
end
By re-setting TTL on every request Redis will automatically purge keywords that get requested infrequently. To display this data in our UI we built a simple page with you can see at http://localhost:3001/admin/keywords
(ui\app\views\rails_admin\main\keywords.html.erb)
<% REDIS_KW.keys.each do |keyword| %>
<tr>
<td><%= keyword %></td>
<td><%= REDIS_KW.get keyword %></td>
</tr>
<% end %>
But there is an obvious downside is that you cannot sort these records by value so we cannot see which keywords are requested more often. For that we need to build a Redis secondary index. I will cover that in a different blog post.
Testing
Previously I have written about testing your code with Redis. You can either setup real Redis instance or use mock_redis gem.
# config/initializers/redis.rb
if Rails.env.test?
REDIS_ADS = Redis::Namespace.new('ads', redis: MockRedis.new )
...
else
# real Redis connections here
end
# spec/rails_helper.rb
require 'mock_redis'
...
config.before(:each) do
# data is not saved into real Redis but you still need to clear it
REDIS_ADS.flushdb
end
Then in your tests for ProcessImpressionJob
you can setup data with REDIS_IMPR.incrby(keyword, 10)
and in tests for GetAds
check expect(REDIS_IMPR.keys).to eq ...
Since there are no live HTTP calls between your microservices you do not need to use gems like webmock, VCR or discoball. For real production system I would still recommend a good overall integration test pass. But as long as you define message format for how data flows between your applications via Redis you can stub and test components separately.