It is easy to find articles online debating pros and cons of different databases. Often they have titles like “Why you should never use X DB”. And yes, different databases have different strengths and weaknesses. Choosing a DB that does not fit the long term needs can be a costly decision.
The question I’d like to ask is why should we choose? Why can’t we use different databases w/in the same application for different purposes? Obviously it introduces additional complexity into our code and ops infrastructure but it can be a tremendous benefit too. We can use MySQL as the main relational database but leverage Redis for caching and MongoDB for advanced data aggregation.
Let’s imagine we are building a blogging platform with Ruby on Rails. We will have UI where users can manage their profiles, create articles, etc. We also need a separate publishing server that can display millions of pages per day. And we need a service to run various background processes, generate reports, aggregate page views, etc.
Disclaimer - this post will cover topics (caching, microservices, background jobs) that I already discussed in previous articles. But instead of going into details I want to focus on combining various approaches.
SQL gives us a very rich ecosystem of various other gems/libraries that work with it. Ability to use Joins and Transactions is crucial for many applications. It is supported by various cloud providers such as Google and AWS. We have been using MongoDB with Mongoid extensively and while it’s ecosystem is broad, we sometimes encounter gems that only work with ActiveRecord.
In our UI we need to have basic things like authentication, authorization, admin CRUD, etc. We can choose RailsAdmin or Administrate for CRUD, pundit or cancancan for authorization, devise or clearance for authentication. counter_culture helps us create powerful cache counters right in our DB.
To scale our platform we can implement caching.
That will cache entire page. We can also implement method level caching (modify Article
belongs_to :user to have
touch: true to bust cache):
We can also use Redis to store data, for example to count unique visitors (combination of IP and UserAgent). We do not even need a background job since Redis is very fast.
We do not need to limit ourselves to just one Redis server. We could use one for caching, another for background jobs and third for data storage. We could even implement application level data sharding.
Ability to have flexible schema and aggregate data in one document is a useful tool. And with Mongo (unlike Redis) we can query by values. Our
ArticleShowJob can push data using Mongo Ruby driver into Mongo
page_views collection (we are NOT using an ORM like Mongoid).
We are using Mongo upsert (update/insert) feature will create a document if it doesn’t exist. Then it will increment a specific field in the document for the date in question.
To display data in our reporting dashboard we can do this:
We can aggregate data on which domains are driving our traffic, which IPs users are coming from, etc. We can create different collections in Mongo for grouping this data by different time periods (daily vs. monthly) and then use Mongo TTL indexes to purge old records.
Similar as with multiple Redis servers above we could talk to different Mongo servers, databases or collections.
I hope the ideas above were a useful overview. As I said at the beginning of this post, I did not go into the details but focused on general design. This approach can be used to integrate Rails apps with other DBs (Neo4j, RethinkDB, etc).