One of the limitations of Redis is that all data is stored in RAM. If we cannot scale up we need to scale out and shard our data.
I want to share one of my first experiences with Redis sharding. I was not directly invovled in the project but worked closely with the team. The system was tracking phone calls so we sharded on the last integer of the number dialed. This gave us very uniform 10 shards.
In this case we are separating keys by hosts but we could just as easily specify different namespaces, ports or DB values. Since our goal is scalability, we do not want to increase the size of our keys (and consume more RAM) by adding namespace to it.
Here we have an API that recevies HTTP requests with 2 params
More complex example
Alas, in many situations we do not have a number that we can split into 10 buckets. What if we are tracking unique visitors to our website using combination of IP and user agent? Here is a preivous post where I used murmurhash. To split data into 4 shards we simply
% 4 to get the right Redis connection.
Rebalancing data after adding / removing nodes
What if later we need to add more Redis servers (going from 4 to 6)? We can change our code to
% 6 but we also need to move some of the records from the current 4 Redis servers to the new ones? Full confession - I have not implemented this in real production system, these are just general thoughts.
The challenge is to run this on a real production system while data is actively used. Can’t say I am looking forward to trying this for real if I ever have to ;-). Read the links below for better ideas on paritioning and clustering.