When I first started using Redis I loved the speed and the powerful data structures. Over the years I used Redis for data analysis, caching, queuing background jobs and permanent data storage.
The one feature I missed is built in support for searching records by value (not key). Coming from strong SQL background it was frustrating to be unable to do equivalent of
select first_name, last_name from users where email = .... In this post I will provide high level overview of different approaches on how to implement search in Redis.
Read the next post on RediSearch module.
Using client library
As POC we will build an application using Redis as the primary DB with Ruby Ohm library. To keep things simple we will start with only one model
When we create user records we will have a Redis Hash with User name and email. The key will be combination of model and ID.
Ohm library will also create several Redis Sets. It will create a Set with key
User:all and list of IDs for all users. This way we can find User by ID.
Separately there will be Sets with keys prefixed with
User:indices and based on attributes (
name:...). Set members are IDs of records that match the criteria (in this case only
). This enables search for user by name or email since we defined
index for those fields.
And it will create a Set for each record with the list of indexes that this record matches.
To search for users by their attributes we can do this:
To make things a little bit more complicated we will introduce
Article model which belongs_to
Since Ohm library allows us to search only by exact match (
email: 'email@example.com') it does not make sense to index
body as it will be very long. But we could index
Here is the Hash with core Article data:
Here are the Sets with index info. Notice that the library automatically created index for
user_id in addition to
To search Articles by user we can do this:
To search by article title we can do this
Article.find(title: 'Redis Search') and data comes back in the same
Now are are able to create indexes and search for exact match on our records or relationship attributes. We are using regular Redis DB to both store and search our data.
At RedisConf 2016 they announced support for modules to extend Redis capabilities. I found RediSearch module to be interesting. It adds commands such as
FT.SEARCH to build full text search indexes (not just exact match) in Redis. Module installation instructions can be found at http://redisearch.io/Quick_Start/
To simplify development I have been working on redi_search_rails gem. It integrates into application models and provides handy methods like
ft_search. We can install it from RubyGems or GitHub.
Since RediSearch supports full text search we can index the
body and search for keywords w/in it. In this application we are using Redis to actually store the Users and Articles records AND to create RediSearch indexes. But we could store Users and Articles in MySQL or MongoDB and redi_search_rails will work the same.
To index data we run these commands in
Now completely different records are created in Redis. RediSearch module will create Hashes to store the indexed attributes. In my RediSearchRails library I am using GlobalID to create unique IDs.
RediSearch will also create custom data types. There will
ft_index, one for each Index that we created:
And multiple keys of
ft_invidx data type based on different keywords:
Now we can execute full text search commands. RediSearch module will use the custom indexes to find appropriate keys and return search results.
I started with a simple text file with 10K users with names and email addresses. On disk file size was about 400KB. Once loaded into RediSearch it created 22.5K keys and RDB file was 1.7 MB in size.
I then indexed 1 million users. The indexing process took about 6 minutes. It created 1.5 million keys and RDB file was 202MB. Last I indexed 10 million users which took almost 60 minutes. There were 12 million keys and RDB file was 1.9GB.
In all three cases search results via Ruby
User.ft_search(keyword: 'John') and via redis-cli
FT.SEARCH User john) were nearly instanteneous. Tests were performed on a Dell workstation with 16GB RAM. Obviously the results will vary widely depending on the types of records indexed.
As we can see the two approaches are very different. RediSearch allows us to implement full text search across documents. The library is under active development by RedisLabs and other contributors.
RediSearch supports other interesting features such as indexing numeric values (prices, dates, …) and
FT.SUGGET for auto-completing suggestions. I plan to cover those in a future blog post. I look forward to when it officially moves out of beta and becomes supported by Redis hosting providers.
On the other hand Ohm secondary indexes allow us to do exact match and build relationships between records bringing it close to the ORM like functionality. It also works with regular Redis w/o requiring installing additonal modules directly on the server.