Queues can be useful tool to scale applications or integrate complex systems. Here is a basic use case. User registers and we need to send a welcome email. We record data in the User table and separately call API of email service provider. Sending email via background process will be faster UX plus we can retry in case of failure. But which technology should we use a queue backend?
Ruby on Rails has a robust delayed_job library (other languages / frameworks have alternative tools).
delayed_job will create a table in MySQL / Postgres (there is also
delayed_job_mongoid for MongoDB).
Using primary DB as a queue means there is no need to introduce other technologies. It is usually faster to record data in local DB than talking to an external API (sending email).
delayed_job also has integrations with deployment (Capistrano, Chef) and monitoring (New Relic) tools. It is a good choice to get started with.
Ruby on Rails provides Active Job framework which allows to configure queue backend globally at application level but also to customize it per environment (dev vs prod) or even per job.
When job fails it will go back to DB and be executed again.
delayed_job supports scheduling the job to execute in the future and we can configure recurring job with delayed_cron_job
We can create separate queues w/in
delayed_job and start different
delayed_job processes on different servers. We can give job higher priority w/in a queue. Data is stored in regular DB (just columns in a table) so we can view contents of
delayed_jobs table or use
Delayed::Job class to build simple GUI. Having rich data structure such as SQL table gives us lots of flexibility.
The biggest downside is scalability. As our application grows the primary DB will become very busy. Plus it will need to persist this data to disk. Not a problem when running thousands of daily jobs but can be a challenge when running millions.
Redis can be used for variety of tasks (caching, pub/sub) but it also makes a great queue with its Lists data structure. Since adding items to list is O(1) operation queueing jobs is very fast. Sidekiq is a mature library with free and commercial versions with uses many of the powerful data structures Redis provides. Sidekiq also has a number of plugins which create their own Redis records.
It uses Lists to queue individual jobs.
A set stores the names of different queues.
When job fails it will be scheduled for retry and stored in Sorted Set. Score
1509474704.5617971 is the time to execute it and Sidekiq implements exponential backoff in case of multiple failures. Different Sorted Set is used to store jobs simply scheduled for later execution (far in the future if necessary).
Sidekiq/Redis also supports multiple queues (which can be given different weights). To prioritize jobs w/in a queue we can use Sorted Sets.
Hashes are used for various statistics (sidekiq-statistic gem).
And to store recurring jobs with sidekiq-cron.
sidekiq-unique-jobs creates separate Redis Strings to track jobs with their parameters and ensure uniqueness.
activejob-traffic_control uses Redis strings to control how fast jobs execute with throttling and concurrency.
gush serializes JSON to strings for various workflows we want to enforce.
Hosting Redis does introduce more complexity to our infrastructure. Fortunately there are many reliable and affordable hosting services (AWS ElastiCache, RedisCloud). We can implement Redis with Multi-AZ failover (important if we are using Redis to store other data). Sidekiq has rich UI / API and since data is stored in Redis it can be easily extracted for custom purposes.
One thing to be cautious of is running out of Redis RAM. Often Redis is used for variety of purposes and we do not want to evict important jobs because too much RAM is used for caching. Queueing 1 million simple jobs used up about 1 GB of RAM (this will vary on how many params are passed to the job).
To integrate we set
config.active_job.queue_adapter = :shoryuken and provide AWS creds. Make sure the queues are created in SQS otherwise we get
The specified queue default does not exist. (Aws::SQS::Errors::NonExistentQueue).
SQS does not have the rich data structures of Redis so it does not support all the options listed above. Retrying failed jobs will happen automatically unless the messages is explicitly deleted. Read about visibility timeout. We can only delay jobs for 15 minutes (or we get
The maximum allowed delay is 15 minutes (RuntimeError)).
Recurring jobs are not supported but there are workarounds with AWS lambda and CloudWatch. AWS SQS UI is decent and we can use AWS SDK to access data directly. SQS has other interesting features such as long polling, batch operations and dead letter queues.
SQS also has FIFO queues which guarantee he order in which messages are sent and received (and does not allow dupes). However, FIFO queues only allow 300 TPS (much less that regular SQS). Shoryuken works with standard and FIFO queues.
Hosting - easy to setup and cheap (pay for what you use) but obviously only available on AWS. SQS is a great choice when we need to run LOTS of jobs or when we do not care about more advanced options such as scheduling.
RabbitMQ supports powerful message routing via exchange. This is very important when we need to run the same job on a specific server, group of servers or all servers. Our application sends one message and exchange will route it. RabbitMQ also has vhosts so that multiple applications can share the same RabbitMQ server but be isolated from each other (we can create unique logins for separate applications to access their vhosts). RabbitMQ can be setup in clusters for redundancy / failover and will acknowledge receipt of messages.
Delaying jobs is not supported natively with RabbitMQ and we will get
This queueing backend does not support scheduling jobs if we try run
MyJob.set(wait: 1.minute).perform_later. There is a plugin we can install and here is an overview. RabbitMQ also does not support recurring jobs so we would need to use alternative mechanism to send the message to RabbitMQ.
RabbitMQ has a powerful GUI which is accessible http://localhost:15672/. API is available via http://localhost:15672/api/. It also provides us statistics on how our jobs are running but we can’t just access data structures directly as in Redis.
So which queue technology should we use? There is no easy answer and it really depends on our needs. Personally I really like the abstraction provided by Active Job (even though it does not support all features provided by some queue backends). It makes it easier to structure jobs in a standard way and switch between queues when needed. If I were building a simple system I would start with DelayedJob. Then I would upgrade to Sidekiq (especially if I were already using Redis). Then investigate SQS for very large scale and RabbitMQ for complex workflows.
One interesting possibility is to use two queue backends w/in the same application. Perhaps we really like Sidekiq and it works for most of our needs. But some of our jobs need to run millions of times a day and we do not want to pay for a large Redis instance. We simply install and configure both
Another option is to use Redis as datastore for throttling or workflow and use RabbitMQ to run our jobs on multiple servers.