I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Monitoring Redis with Sentinels
Running Redis in production can be a complicated undertaking. Ideally our cloud provider will offer a managed service but sometimes it is not an option. In this article we will expore how to run Redis and monitor it ourselves.
Redis Sentinel provides high availability for Redis. This allows us to setup Redis instances that can recover from certain types of failures. The advantage of this approach is that we can control the exact version of Redis we will be running. We also will be able to connect to our Redis instances and do extensive troubleshooting if necessary.
- Local environment
- Sentinel dockerfile
- Setting up Redis replication
- Configuring Redis for Sentinel monitoring
- Testing the failover process
- sentinel.conf
- Running Redis in production
- Links
Local environment
We will prototype it in local environment with docker-compose
. Save this as docker-compose.yml
.
Sentinel dockerfile
Redis Sentinel requires a local sentinel.conf
file. Save this as sentinel.Dockerfile
and run docker-compose up --build -d
.
Setting up Redis replication
Now we can manually setup replication between Redis instances.
Configuring Redis for Sentinel monitoring
Now we need to register Redis primary (master) with each Redis Sentinel. Sentinel will then automatically discover the replicas.
Testing the failover process
We can subscribe to Redis Sentinels events via Pub/Sub in one bash tab:
We can also subscribe to other Sentinels by repeating the step above in separate bash tabs. Now we will simulate Redis failure on our primary instance in another bash tab.
In the first bash tab where we subscribe to Redis Sentinel we will see:
We can see how Sentinels perform various tasks in the process. Separately we could write another tool to capture these messages and alert our engineers appropriately but that is outside the scope of this article.
Now if we switch back to redis1
to check replication status we will see it has been demoted to replica.
We can connect to redis2
and redis3
and one of them should be the new primary. We can use debug sleep 60
command to perform another failover.
We can also ask Sentinels who is the current primary via sentinel get-master-addr-by-name my_redis
. We can force failover by sending sentinel failover my_redis
command to a specific Sentinel. If we check get-master-addr-by-name
we will get new information. Other Sentinels will also be informed of the new primary. Another useful command is sentinel ckquorum my_redis
which should return OK 3 usable Sentinels. Quorum and failover authorization can be reached
.
sentinel.conf
Sentinel will write updated info to sentinel.conf
file overwriting the blank file we created during docker build.
Running Redis in production
Now that we prototyped this locally we need to setup appropriate prod infrastructure. Sentinels do not require that much capacity so we can make those instances smaller. But we will need to carefully think how much data we will likely store in our Redis instances and get enough RAM.
Adding instances
In production we will need to launch new instances. We can practice this locally by modifying our docker-compose.yml
and running docker-compose up --build -d
.
Now we need to find out the current primary by running this against various Redis containers docker exec -it redis2 redis-cli info replication
and checking for response containing role:master
.
Then we can make new container a replica of current primary docker exec -it redis4 redis-cli replicaof REDIS_PRIMARY_NAME_HERE 6379
.
If we run docker exec -it REDIS_PRIMARY_NAME_HERE redis-cli info replication
we will see connected_slaves:3
We can check that Sentinels automatically became aware of the replica with docker exec -it sentinel1 redis-cli -p 26379 sentinel replicas my_redis
. The output should contain 3 records.
Removing instances
In production we will terminate Redis instances but locally we will run docker stop redis4
. Now docker exec -it REDIS_PRIMARY_NAME_HERE redis-cli info replication
we tell us connected_slaves:2
.
But docker exec -it sentinel1 redis-cli -p 26379 sentinel replicas my_redis
still has 3 records. One of them should contain 9) "flags" 10) "s_down,slave"
. That is the container we just killed but Sentinel still knows about it. Which is what we want if this was unintentional. If we subscribed to Sentinel pubsub we would have received a message:
To make Sentinel forget about this instance that we killed manually we need to docker exec -it sentinel1 redis-cli -p 26379 sentinel reset my_redis
. Repeat the process for sentinel2
and sentinel3
. Now running docker exec -it sentinel1 redis-cli -p 26379 sentinel replicas my_redis
will return only 2 records.
Increasing / decreasing amount of RAM on each instance
- If decreasing the amount of RAM we need to make sure that the amount of data currently stored in Redis can fit into the new amount of RAM.
- The process is largely a repetition of various steps above.
- Launch new instances with appropriate amount of RAM.
- Make new instances replicas of the current primary.
- Terminate original replicas one at a time.
- Stop if there are any issues and rollback.
- Failover the primary by sending
sentinel failover my_redis
to one of the Sentinels and terminate the instance. - Do
sentinel reset my_redis
on all Sentinels.
Upgrading / downgrading Redis version
- The process is also largely a repetition of various steps above.
- Launch new instances with new version of Redis.
- Make new instances replicas of the current primary.
- Terminate original replicas one at a time.
- Stop if there are any issues and rollback.
- Failover the primary (with previous Redis version) and terminate the instance.
- Do
sentinel reset my_redis
on all Sentinels.
Links
- https://redis.io/topics/sentinel
- https://github.com/redis/redis/blob/unstable/sentinel.conf