I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
RediSearch and time series data
In previous post we explored integration between Redis and Elasticsearch for time series data. Now we will take deeper dive into how to search for time series data w/in Redis with RediSearch module.
We will be using the same POC app for nationwide retail chain built using Ruby on Rails framework. We want to search for various user interactions on the website such as which zipcodes users are coming from and what products are they looking for.
A common approach for time series data is to create periodic (usually daily) indexes. Then we can run regular process where older indexes are removed (or moved to different data store) and we only keep the last X days of data in the primary Redis DB.
Separate daily indexes
To encapsulate logic we will create a separate class to create / insert records into appropriate indexes.
We are passing in time because data processing might be delayed and we do not want to insert data from yesterday into today’s index. We are also passing in optional parameters to determine the naming pattern for indexes and whether we will create daily indexes (by appending date stamp to index pattern).
To query across multiple indexes we will have to make separate requests to Redis using FT.SEARCH
command and then merge the results in our code. To return list of indexes we get Redis keys that match a pattern.
Data in Redis will look like this. For each date we will have one ft_index0
and multiple ft_invidx
keys.
To purge old indexes we will call FT.DROP
passing appropriate time (Time.now - X.days
) to class initializer.
Once we drop an index RediSearch will remove ft_index0
and ft_invidx
keys plus Redis Hashes used to store documents themselves.
This code still needs a lot of work to support other methods in RediSearch and to further abstract the index SCHEMA and document fields. But it is simply meant to show a pattern we can follow to manage these multiple related indexes in our application.
One index for all data
We might not want the challenge of managing multiple indexes and merging the results from multiple searches. Instead we could use one index and build logic to delete old documents.
We can still use the same class only now we specify @index_per_day = false
which will exclude date stamp from index name when creating records.
FT.SEARCH
returns number of records matching our query as the first parameter. We will use it to loop through all documents in the index, check their IDs (derived from timestamps) and use FT.DEL
command to remove each document.
Specifying DD
will also remove the document (stored in Redis Hash). It will leave behind the ft:search_log/redis ft_invidx
keys.
The big downside of this approach is the necessity of making multiple Redis calls to query and remove documents. It is MUCH more complex when compared to Redis TTL approach of expiring keys.
Links
- http://redisearch.io/
- https://github.com/danni-m/redis-timeseries
- https://redis.io/commands/ttl