Recently I had a chance to present at RedisConf on various ways Redis can be used to quickly scale Rails applications. This is a blog post is an expansion of the ideas that I discussed. You also might want to read my previous posts about Redis here and here.
- Caching vs pre-generating data in your DB
- Background jobs
- Dev and test
- Additional links
When we think of scalability we usually think of dealing with terabytes of data and millions of users. We think of a dev team laboring for many months to build great technologies. But there is another kind of scalabilty when your MVP starts getting more traffic and you need to QUICKLY scale the application, often by yourself.
First I want to say that caching should not be used as bandaid solution. We want to have well written code with optimized DB queries / indexes. So use caching wisely.
At the core of Rails caching is the concept of Cache Key and Cached Content. Once you create it, you do not change it. You do not need to write observers or use callbacks to update your cache when your data changes. You simply create new key with new content and you let the old one expire using TTL. This approach obviously takes more RAM to store stale content but it significantly simplifies your code. Here is a more in-depth article by DHH.
To enable caching you put this in your production.rb.
You can implement caching in different layers in Rails. You can do Fragment (view) layer caching where you will cache actual HTML of all or portions of your pages.
If the user record changes or you deploy a new template, it will require new cache key.
To cache index.html.erb you need to create a key based on all records on the page:
The same can be done with Jbuilder files
Let’s imagine you have a online fundraising system where Users give Donations to Fundraisers.
By appending method name to cache_key I ensure their uniqueness. You also see how to override default application cache_key expiration.
You might be asking what if a donation is given after this content is cached. Currently the application will show stale data until Redis purges the key (which might be OK). Better solution is to define
touch: true on the parent-child relationship. When child record is created/updated it will touch timestamp of a parent (forcing new key).
You also can have a situation where you want to bust child’s cache when specific parent attribute changes (you are using parent attribute w/in child’s method). Best way is to use a callback to conditionally update children timestamps if specific parent attribute changed.
This will bust ALL cached methods for all children. If you want to selectively bust specific cached method then do this:
Method level cache can be applied to other classes, not just models. Here I am using object.cache_key and also including self.class.name to ensure key uniqueness. And you can include parameters in cache keys.
Caching data froml external APIs
Let’s imagine an app where users can view weather data for specific zipcodes. Obviously you are going to call 3rd party API to get the data. In your MVP you can call on every user request but with more traffic this will become slow and $ expensive. You could create an hourly background job to fetch weather data for all zipcodes, parse JSON and load it into your DB.
Or you could implement 1 line caching solution (which also will cost less $ because it will only fetch data users actually need). In this case we have no user ID or timestamp so the cache key is based on class, method and parameter. We will need to expire it using default Redis TTL.
Caching controller action output
Caching vs pre-generating data in your DB
As you can see caching can be implemented fairly quickly and changed easily but the downside is you cannot use results in your DB queries. Earlier in this post I described caching total_raised and number_of_donations methods. In order to get fundraisers that have at least X number of donations or raised Y dollars I would need fetch all records and loop through them. Here is a simple way to pregenerate data in DB.
Storing data in the main DB AND in Redis
redis-objects is an interesting gem. Let’s imagine we are building an application for TV show like American Idol where in a few minutes we have to record millions of votes cast by viewers. In our main DB we will have a table Performers with usual info (name, bio, etc). But to increment record in traditional DB and persist it to disk will put a lot of stress on your servers. This is where Redis can help us. We create tmp_vote_counter in Redis and only save to main DB once the voting is done. But as far as your Performer model is concerned it’s just a method call.
Ohm gem provides even more functionality but it’s more suited for situations where Redis is your primary DB. I am not quite ready to let go of features provided by SQL (or Mongo) and prefer to use Redis as my secondary DB.
In an earlier post I described using Sidekiq to import records in batches of jobs. Here are additional ideas for background jobs:
Cache warming - you could create a job to run periodically and warm up your cache. This will help so the first user that hits specific page will have faster load time and keep system load more even. But you need to be careful so that duration of these jobs is not greater than frequency. In the past I selectively pre-cached only specific pages and let less traffic pages load on demand.
Report generation - some of our reports were getting slow so in addition to doing code optimization we switched them to run in the background and email user the results.
Dev and test
You often find that you need to implement caching once your application begins to slow down. In an earlier blog post I write how to use rack-mini-profiler to analyze which pages have slow methods or large number of DB queries. You also can you use New Relic gem. Just browse to http://localhost:3000/newrelic and you will see lots of useful stats. And you don’t even need to sign up for their service.
Functional testing of your code is pretty much the same when you implement caching. But what is really important is performance testing. The approach I usually take is to identify the bottlenecks, implement caching (or other code improvements), deploy to one of the prod servers and run a series of tests using tools like Siege or wrk. Then if I am satisfied, deploy the code to other prod servers.
The reason I like readthis gem is because of Readthis.fault_tolerant = true option. This way Rails app will not crash if it can’t connect to Redis. redis-store gem has an issue on this with some monkey patch ideas but readthis solution is more robust.
We run our sites on AWS and use ElastiCache with multi AZ replication group. Use Group Primary Endpoint in your config files (NOT individual cache clusters). AWS will sometimes switch your nodes so read-write will become read only. If you are talking directly to a node you will get an error “You can’t write against a read only slave”.
You could run your own Redis on EC2 instances (which also gives you controler over which version of Redis to use) but then you are responsible for backups, failover, etc. And you could also use a hosted solution like RedisLabs. Services like Heroku provide very simple integrations with free trials but higher long term costs.
For monitoring I primarily use AWS BytesUsedForCache to make sure I do not run out of RAM. I also have a /health controller endpoint in my application where I check for Redis connection. The URL gets pinged every minute by 3rd party service and will email/text an alert in case of errors. I also would like implement a monitor on my Sidekiq background jobs (threshold of number of jobs in error state or number of retries).
redis-dump is not actively maintained but allows you to dump Redis data into JSON files and then restore on a different server. I’ve used it to move data from prod into dev.
If you do not want to integrate with service like LogEntries or Rollbar you can try Logster gem. Just browse to http://yourwebsite.com/logs. Your can trigger your own alerts based on certain thresholds.
- Sample app that I built - https://github.com/dmitrypol/rails_redis