I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Redis Search
When I first started using Redis I loved the speed and the powerful data structures. Over the years I used Redis for data analysis, caching, queuing background jobs and permanent data storage.
The one feature I missed is built in support for searching records by value (not key). Coming from strong SQL background it was frustrating to be unable to do equivalent of select first_name, last_name from users where email = ...
. In this post I will provide high level overview of different approaches on how to implement search in Redis.
Read the next post on RediSearch module.
Using client library
As POC we will build an application using Redis as the primary DB with Ruby Ohm library. To keep things simple we will start with only one model User
with name
and email
attributes.
Users
When we create user records we will have a Redis Hash with User name and email. The key will be combination of model and ID.
Ohm library will also create several Redis Sets. It will create a Set with key User:all
and list of IDs for all users. This way we can find User by ID.
Separately there will be Sets with keys prefixed with User:indices
and based on attributes (email:..
and name:...
). Set members are IDs of records that match the criteria (in this case only [1]
). This enables search for user by name or email since we defined index
for those fields.
And it will create a Set for each record with the list of indexes that this record matches.
To search for users by their attributes we can do this:
Articles
To make things a little bit more complicated we will introduce Article
model which belongs_to User
.
Since Ohm library allows us to search only by exact match (email: 'john.smith@gmail.com'
) it does not make sense to index body
as it will be very long. But we could index title
.
Here is the Hash with core Article data:
Here are the Sets with index info. Notice that the library automatically created index for user_id
in addition to title
.
To search Articles by user we can do this:
To search by article title we can do this Article.find(title: 'Redis Search')
and data comes back in the same Ohm::Set
format.
Now are are able to create indexes and search for exact match on our records or relationship attributes. We are using regular Redis DB to both store and search our data.
RediSearch module
At RedisConf 2016 they announced support for modules to extend Redis capabilities. I found RediSearch module to be interesting. It adds commands such as FT.CREATE
, FT.ADD
and FT.SEARCH
to build full text search indexes (not just exact match) in Redis. Module installation instructions can be found at http://redisearch.io/Quick_Start/
To simplify development I have been working on redi_search_rails gem. It integrates into application models and provides handy methods like ft_create
and ft_search
. We can install it from RubyGems or GitHub.
Since RediSearch supports full text search we can index the body
and search for keywords w/in it. In this application we are using Redis to actually store the Users and Articles records AND to create RediSearch indexes. But we could store Users and Articles in MySQL or MongoDB and redi_search_rails will work the same.
To index data we run these commands in rails console
Now completely different records are created in Redis. RediSearch module will create Hashes to store the indexed attributes. In my RediSearchRails library I am using GlobalID to create unique IDs.
RediSearch will also create custom data types. There will ft_index
, one for each Index that we created:
And multiple keys of ft_invidx
data type based on different keywords:
Now we can execute full text search commands. RediSearch module will use the custom indexes to find appropriate keys and return search results.
Benchmark(eting) stats
I started with a simple text file with 10K users with names and email addresses. On disk file size was about 400KB. Once loaded into RediSearch it created 22.5K keys and RDB file was 1.7 MB in size.
I then indexed 1 million users. The indexing process took about 6 minutes. It created 1.5 million keys and RDB file was 202MB. Last I indexed 10 million users which took almost 60 minutes. There were 12 million keys and RDB file was 1.9GB.
In all three cases search results via Ruby User.ft_search(keyword: 'John')
and via redis-cli FT.SEARCH User john
) were nearly instanteneous. Tests were performed on a Dell workstation with 16GB RAM. Obviously the results will vary widely depending on the types of records indexed.
Conclusion
As we can see the two approaches are very different. RediSearch allows us to implement full text search across documents. The library is under active development by RedisLabs and other contributors.
RediSearch supports other interesting features such as indexing numeric values (prices, dates, …) and FT.SUGADD
/ FT.SUGGET
for auto-completing suggestions. I plan to cover those in a future blog post. I look forward to when it officially moves out of beta and becomes supported by Redis hosting providers.
On the other hand Ohm secondary indexes allow us to do exact match and build relationships between records bringing it close to the ORM like functionality. It also works with regular Redis w/o requiring installing additonal modules directly on the server.
Links
- https://redislabs.com/solutions/use-cases/redis-full-text-search/
- https://github.com/RedisLabsModules/secondary
- https://redis.io/topics/indexes
- http://patshaughnessy.net/2011/11/29/two-ways-of-using-redis-to-build-a-nosql-autocomplete-search-index
- http://josephndungu.com/tutorials/fast-autocomplete-search-terms-rails
- http://vladigleba.com/blog/2014/05/30/how-to-do-autocomplete-in-rails-using-redis/
- https://github.com/huacnlee/redis-search