I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Redis and IP Throttling
Recently one of our websites was hit by a scraper. We could see the requests in our logs as they were querying our site for different keywords. Instead of adding a bunch of IPs to our firewalls we decided to implement more intelligent throttling.
We chose rack-attack Ruby gem. It leverages Redis which we already were using for caching and background jobs.
Basic configuration
rack-attack allows us to limit
the number of requests our application will accept from the same IP in a given time period
. It then builds a Redis key based on Time.now.to_i/:period
and request.ip
. On each request it does Redis INCR
operation (which will either create a key if it doesn’t exist or increment it). During creation it sets TTL
equal to our time period
.
Once the Redis key value exceeds the limit it will block the request at rack middleware layer. When the key expires access will be allowed again. Here is the wiki page with more details. Data in Redis will look like this (4977978
is Time.now.to_i/:period
)
This approach will keep out most scrapers but someone determined can easily figure out the thresholds. It also depends whether we want to truly restrict someone from abusing the system or just limit the stress on out servers.
Advanced configuration
To keep out more malicious users we can implement exponential backoff. This will create multiple keys for each IP and time period (using more Redis RAM). There is a clever example on the wiki page showing us to how create multiple levels in the same loop.
But what if we have lots of legitimate users behind the same IP? We can add IPs to safelist or blocklist. We could put IPs in config file but that would require a code deploy to change. Why not use Redis to store these IPs in separate keys?
To add/remove these records we built a simple GUI so our internal users can respond quickly if needed. We also set default TTL of 1 week so these IPs do not remain in the system permanently.
Customer specific configuration for APIs
IP throttling can be used for websites but it is also very common for APIs. We may have multiple customers using our API and we want to control access for each one. The configuration examples above apply to entire application so we need something more flexible. Full confession - I have not implemented this solution in production so be careful and please share feedback in comments below.
Let’s assume that when request hits our servers there is a customer_id
param. Let’s also assume that we have Free, Pro and Enterprise tiers with the following limits:
- Free - 100 requests per hour.
- Pro - 100 requests per minute and 5K requests per hour.
- Enterprise - 200 requests per minute and 10K requests per hour.
We do not want to query our primary DB during the IP check so we will store this data in Redis with the help of redis-objects gem.
We are storing tier
in both primary DB and in Redis (with before_save
callback) because we need to query customers by tier
. Data in Redis will look like this:
Now the throttle
check can be modified. The challenge is that this check occurs in initializer in Rack layer and we need to grab customer_id from request to dynamically determine throttling.
To have even more flexibility we can store unique configuration for each customer in Redis hashes.
This will allow 100 requests per minute, 1K requests per hour and 10K requests per day. Key is period
(number of seconds) and value is limit
(max requests). We would then use hash to configure throttle
.
The problem is that we would need to restart the app to pick up these custom configurations. Honestly I am not sure the custom Hash approach really delivers much value and significantly complicates things. If anyone has suggestions feel free to share them.
Links
- http://stackoverflow.com/questions/34774086/how-do-i-rate-limit-page-requests-by-ip-address
- https://stripe.com/blog/rate-limiters
- https://github.com/dryruby/rack-throttle
- https://github.com/jeremy/rack-ratelimit
- http://nginx.org/en/docs/http/ngx_http_limit_req_module.html
- https://blog.codinghorror.com/dictionary-attacks-101/
- https://devcentral.f5.com/articles/implementing-the-exponential-backoff-algorithm-to-thwart-dictionary-attacks