Queues - DB vs Redis vs RabbitMQ vs SQS

Queues can be useful tool to scale applications or integrate complex systems. Here is a basic use case. User registers and we need to send a welcome email. We record data in the User table and separately call API of email service provider. Sending email via background process will be faster UX plus we can retry in case of failure. But which technology should we use a queue backend?

Primary DB
Redis
- Lists
- Sets
- Sorted Sets
- Hashes
- Strings
AWS SQS
RabbitMQ
Conclusion
Links

Primary DB

Ruby on Rails has a robust delayed_job library (other languages / frameworks have alternative tools). delayed_job will create a table in MySQL / Postgres (there is also delayed_job_mongoid for MongoDB).

Using primary DB as a queue means there is no need to introduce other technologies. It is usually faster to record data in local DB than talking to an external API (sending email). delayed_job also has integrations with deployment (Capistrano, Chef) and monitoring (New Relic) tools. It is a good choice to get started with.

Ruby on Rails provides Active Job framework which allows to configure queue backend globally at application level but also to customize it per environment (dev vs prod) or even per job.

config.active_job.queue_adapter = :delayed_job
# app/jobs/
class MyJob < ApplicationJob
  self.queue_adapter = :sidekiq
  def perform
  end
end

When job fails it will go back to DB and be executed again. delayed_job supports scheduling the job to execute in the future and we can configure recurring job with delayed_cron_job

MyJob.set(wait: 1.week).perform_later
MyJob.set(cron: '*/5 * * * *').perform_later

We can create separate queues w/in delayed_job and start different delayed_job processes on different servers. We can give job higher priority w/in a queue. Data is stored in regular DB (just columns in a table) so we can view contents of delayed_jobs table or use Delayed::Job class to build simple GUI. Having rich data structure such as SQL table gives us lots of flexibility.

The biggest downside is scalability. As our application grows the primary DB will become very busy. Plus it will need to persist this data to disk. Not a problem when running thousands of daily jobs but can be a challenge when running millions.

Redis

Redis can be used for variety of tasks (caching, pub/sub) but it also makes a great queue with its Lists data structure. Since adding items to list is O(1) operation queueing jobs is very fast. Sidekiq is a mature library with free and commercial versions with uses many of the powerful data structures Redis provides. Sidekiq also has a number of plugins which create their own Redis records.

Lists

It uses Lists to queue individual jobs.

# queue:my_queue list.  Each item is a JSON encoded string
{
  "class": "ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper",
  "wrapped": "MyJob",
  "queue": "my_queue",
  "args": [
    {
      "job_class": "MyJob",
      "job_id": "b841f3a1-8292-4894-83dd-3fb3abbb0b05",
      "provider_job_id": null,
      "queue_name": "my_queue",
      "priority": null,
      "arguments": [],
      "executions": 0,
      "locale": "en"
    }
  ],
  "retry": true,
  "jid": "26f81b9f4f04b6195581fa50",
  "created_at": 1509478374.233956,
  "enqueued_at": 1509478374.234389
}

Sets

A set stores the names of different queues.

{"db":1,"key":"queues","ttl":-1,"type":"set","value":["default","high","low"],..}

Sorted Sets

When job fails it will be scheduled for retry and stored in Sorted Set. Score 1509474704.5617971 is the time to execute it and Sidekiq implements exponential backoff in case of multiple failures. Different Sorted Set is used to store jobs simply scheduled for later execution (far in the future if necessary).

# failed zset
{
  "class": "ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper",
  "wrapped": "MyJob",
  "queue": "my_queue",
  "args": [
    {
      "job_class": "MyJob",
      "job_id": "cc6909a3-b033-4e97-afd8-5bebd1f66460",
      "provider_job_id": null,
      "queue_name": "my_queue",
      "priority": null,
      "arguments": [],
      "executions": 0,
      "locale": "en"
    }
  ],
  "retry": true,
  "jid": "8ad1f65c8b1a1a7540be5914",
  "created_at": 1509474704.542902,
  "enqueued_at": 1509474704.542999,
  "error_message": "some kind of error",
  "error_class": "RuntimeError",
  "failed_at": 1509474704.561714,
  "retry_count": 0,
  "processor": "...local:68224"
}

Sidekiq/Redis also supports multiple queues (which can be given different weights). To prioritize jobs w/in a queue we can use Sorted Sets.

Hashes

Hashes are used for various statistics (sidekiq-statistic gem).

# sidekiq:statistic hash
{
  "2017-10-31:MyJob:passed": 35,
  "2017-10-31:MyJob:last_job_status": "passed",
  "2017-10-31:MyJob:last_time": "2017-10-31 18:35:12 UTC",
  "2017-10-31:MyJob:queue": "my_queue",
  "2017-10-31:MyJob:average_time": 0.005952380952380952,
  "2017-10-31:MyJob:min_time": 0,
  "2017-10-31:MyJob:max_time": 0.028,
  "2017-10-31:MyJob:total_time": 0.125,
  "2017-10-31:MyJob:failed": 1,
  ...
}

And to store recurring jobs with sidekiq-cron.

# cron_job:schedule_name hash
{
  "name": "schedule_name",
  "klass": "MyJob",
  "cron": "1 * * * *",
  "description": "",
  "args": [],
  "message": {
    "queue": "my_queue",
    "class": "MyJob",
    "args": []
  },
  "status": "enabled",
  "active_job": false,
  "queue_name_prefix": "",
  "queue_name_delimiter": "",
  "last_enqueue_time": "2017-10-31 18:35:12 UTC"
}

Strings

sidekiq-unique-jobs creates separate Redis Strings to track jobs with their parameters and ensure uniqueness.

# uniquejobs:a35ea078baa09ea090c613233c786072 string
23d6bbd93891f04c3fef9f7e
# uniquejobs:ea6ae821c80f8d14f37932d52803b81a string
668b179fe44718ee7c2b1f6f

activejob-traffic_control uses Redis strings to control how fast jobs execute with throttling and concurrency.

{"db":0,"key":"traffic_control:concurrency:MyJob","ttl":-1,"type":"string","value":"�","size":1}
{"db":0,"key":"traffic_control:throttle:AnotherJob","ttl":-1,"type":"string","value":"���A֍�,\t�ȸNPmxxHRB78S2LFJ2ItNg9Q==��A֍�,%�\u0005�9Ft1AWZLrtvQr3e7JYz1/A==","size":68}

gush serializes JSON to strings for various workflows we want to enforce.

{
  "name": "MyJob-64c62ac3-c78a-4e0a-a894-5a3160b2d6a7",
  "klass": "MyJob",
  "incoming": [],
  "outgoing": [],
  "finished_at": null,
  "enqueued_at": null,
  "started_at": null,
  "failed_at": null,
  "params": ["..."],
  "workflow_id": "898c2c76-cc1c-4e5e-bd53-a8895dfeb8c0",
  "output_payload": null
}

Hosting Redis does introduce more complexity to our infrastructure. Fortunately there are many reliable and affordable hosting services (AWS ElastiCache, RedisCloud). We can implement Redis with multi-AZ failover (important if we are using Redis to store other data). Sidekiq has rich UI / API and since data is stored in Redis it can be easily extracted for custom purposes.

One thing to be cautious of is running out of Redis RAM. Often Redis is used for variety of purposes and we do not want to evict important jobs because too much RAM is used for caching. Queueing 1 million simple jobs used up about 1 GB of RAM (this will vary on how many params are passed to the job).

AWS SQS

AWS SQS takes care of managing our queues. AWS SDK offers low level access but shoryuken is a higher level integration (shoryuken author acknowledges sidekiq as inspiration).

To integrate we set config.active_job.queue_adapter = :shoryuken and provide AWS creds. Make sure the queues are created in SQS otherwise we get The specified queue default does not exist. (Aws::SQS::Errors::NonExistentQueue).

SQS does not have the rich data structures of Redis so it does not support all the options listed above. Retrying failed jobs will happen automatically unless the messages is explicitly deleted. Read about visibility timeout. We can only delay jobs for 15 minutes (or we get The maximum allowed delay is 15 minutes (RuntimeError)).

Recurring jobs are not supported but there are workarounds with AWS lambda and CloudWatch. AWS SQS UI is decent and we can use AWS SDK to access data directly. SQS has other interesting features such as long polling, batch operations and dead letter queues.

SQS also has FIFO queues which guarantee he order in which messages are sent and received (and does not allow dupes). However, FIFO queues only allow 300 TPS (much less that regular SQS). Shoryuken works with standard and FIFO queues.

Hosting - easy to setup and cheap (pay for what you use) but obviously only available on AWS. SQS is a great choice when we need to run LOTS of jobs or when we do not care about more advanced options such as scheduling.

RabbitMQ

RabbitMQ offers other interesting features. bunny allows to create producers and consumers directly but sneakers is a higher level integration (gem author also acknowledges sidekiq).

RabbitMQ supports powerful message routing via exchange. This is very important when we need to run the same job on a specific server, group of servers or all servers. Our application sends one message and exchange will route it. RabbitMQ also has vhosts so that multiple applications can share the same RabbitMQ server but be isolated from each other (we can create unique logins for separate applications to access their vhosts). RabbitMQ can be setup in clusters for redundancy / failover and will acknowledge receipt of messages.

Delaying jobs is not supported natively with RabbitMQ and we will get This queueing backend does not support scheduling jobs if we try run MyJob.set(wait: 1.minute).perform_later. There is a plugin we can install and here is an overview. RabbitMQ also does not support recurring jobs so we would need to use alternative mechanism to send the message to RabbitMQ.

RabbitMQ has a powerful GUI which is accessible http://localhost:15672/. API is available via http://localhost:15672/api/. It also provides us statistics on how our jobs are running but we can’t just access data structures directly as in Redis.

Hosting for RabbitMQ offers fewer choices than Redis and is more expensive. CouldAMPQ runs on several cloud providers and has free tier (plus Herolu integration). IBM Compose is another option.

Conclusion

So which queue technology should we use? There is no easy answer and it really depends on our needs. Personally I really like the abstraction provided by Active Job (even though it does not support all features provided by some queue backends). It makes it easier to structure jobs in a standard way and switch between queues when needed. If I were building a simple system I would start with DelayedJob. Then I would upgrade to Sidekiq (especially if I were already using Redis). Then investigate SQS for very large scale and RabbitMQ for complex workflows.

One interesting possibility is to use two queue backends w/in the same application. Perhaps we really like Sidekiq and it works for most of our needs. But some of our jobs need to run millions of times a day and we do not want to pay for a large Redis instance. We simply install and configure both sidekiq and shoryuken.

config.active_job.queue_adapter = :sidekiq
# app/jobs/
class MyJob < ApplicationJob
  self.queue_adapter = :shoryuken
end

Another option is to use Redis as datastore for throttling or workflow and use RabbitMQ to run our jobs on multiple servers.

class MyJob < ApplicationJob
  self.queue_adapter = :sneakers
  # https://github.com/nickelser/activejob-traffic_control
  concurrency 5
end

I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.