I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Redis and Cache Pre-Generation
A common pattern is to use Redis as a cache store where the first application request forces code to execute and then caches the results. Subsequent requests use the cached data until Redis purges it via TTL.
- Basic caching
- Pre-generating cache
- Busting the cache
- Selecting which data to cache
- Iteration overrun
- More complex example
- Links
Basic caching
Here is how it can be implemented with Ruby on Rails but there are similar patterns in other frameworks.
cache_key is a method that generates unique Redis key like this user/user_id-user_updated_at_timestamp
. Specifying touch: true
on the Article
relationship will modify User.updated_at
when articles are created / deleted which will force new cache_key
. Appending the method name to cache_key
ensures Redis key uniqueness if we cache other methods on the User model. Here is what data looks like stored in Redis:
When browsing to a page showing all users and the number of articles they authored the first request will be slow but the subsequent ones will be faster with cached data.
In reality this specific task can be better accomplished with a well written SQL JOIN but usually the biz logic is more complex.
Pre-generating cache
What if we need / want to pre-generate cached data in Redis so that the first user does not have to wait? We could run a background job.
The downside is that if users do not write articles very often then the job will keep re-generating the same data for articles_count
as Redis will flush it every hour with the TTL specified. How can we make this process more scalable?
Busting the cache
In previous post I covered various cache busting techniques. In caching articles_count
we are trying to find a balance between how long we want to cache the current data vs how often it changes.
If the user has not written any new articles there is no reason to re-generate articles_count
so we could make TTL longer. But if new articles are created / deleted frequently (or User.updated_at
changes for other reasons) that will cause Redis to store more cache_key
records with previous updated_at
timestamps.
Setting TTL to never expire will force Redis to hold the data forever. And if we exclude updated_at
timestamp from cache_key
then we can to re-use the same Redis key. But how do we avoid showing the same articles_count
value that was calculated the first time? Rails cache has an option force: true
. That will force a cache miss which will execute the code and create / update data in Redis.
Selecting which data to cache
This cached data will be stale until we re-generate it. We want to run the job frequently (say every 5 minutes) and in the job we specify force: true
. But this will keep re-creating the cache for all users. We need a way to filter out which users have created / deleted articles. For that we can use updated_at
timestamp. If the user has created / deleted an article the Article touch: true
will change User updated_at
. There could be other reasons we want to re-generate the cache so we encapsulate the logic in User model scope.
If we want to optimize the code further than instead of using Article touch: true
we can build a custom callback to only update User when articles are created and deleted (not edited).
One downside with using updated_at
is that if users edit their info (but not create / delete articles) or create other records ( with belongs_to: user, touch: true
) that will force re-generation of user.articles_count
. So it’s not a perfect solution but works for many use cases.
Iteration overrun
Our job runs every 5 minutes but it could take > 5 minutes to complete. And we might not want to have 2 instances of this job running at the same time. Here are different ways to address it:
Set special Redis key (like PID file) at the beginning of the job with TTL to remove the key.
Alternatively we could use APIs provided by the background job library to check if another job with same class name is running at the beginning. That implementation will vary on the underlying library.
More complex example
The example above with users and articles is way too simple. Let’s imagine we are building the backend system for an online banking app. Customers use their phones to check the latest transactions on their way to work. As a result we have a HUGE spike in DB load in the early morning hours (which requires powerful hardware and expensive software license).
What if we could even out that load and push data into cache during the earlier hours when stress on the overall system is lower? We don’t need to push ALL transactions into cache as most people are likely to look at only the first page (say 10 most recent records). And we don’t need to do it for all customers, just the ones that check their accounts frequently (say twice a week).
This will load the most recent transactions for all customers. How do we track customers that login frequently enough to pre-generate cache only for them? We could have a field in the DB to track most recent logins but we could also use Redis counters and TTL.
redis-objects creates a Redis key based on model name, record ID and method name. Every time customer logs in we call customer.num_logins.incr
which will be very fast. But if the customer does not login w/in a week that key will expire using Redis TTL and next time the num_logins
counter will start at 1. Then we create a job to move the data to the primary DB. The job might be slow but it will only run once a week.
In PreGenerateCacheJob
instead of calling Customer.all.each ...
we call Customer.frequent_logins.each ...
This approach is also not perfect but it enables to cache data for customers most likely to login. Requests for other customers will require DB queries. And in this case we do NOT want to hold on to cached data indefinitely as new transactions are going to come in so we expire the cache and include timestamp in cache_key
.
Approaches described above introduce complexity. The logic in determining which methods to cache and for which records will vary widely depending on the biz requirements / resources available. And what if the job fails to run? The system will show stale cached data. So we want to monitor the job process and also ensure there is enough RAM in Redis. But applied wisely these solutions allow us to trade slight delays in data freshness for significant scalability gains.