I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
-
Monitoring Redis with Sentinels
Running Redis in production can be a complicated undertaking. Ideally our cloud provider will offer a managed service but sometimes it is not an option. In this article we will expore how to run Redis and monitor it ourselves.
-
Caching: Redis vs Nginx
In previous articles in this blog we expored various options for using Redis for caching. This time we will compare Redis to Nginx as a caching technology. Code is avaiable at https://github.com/dmitrypol/cache_nginx_redis
-
Redis for Data Engineering and Data Science
Recently I spoke at RedisDay Seattle about using Redis for Data Engineering and Data Science. In this article I want to revisit these ideas.
-
Envoy Proxy with Redis
In the previous article we explored using Redis Cluster. Now we will discuss using Envoy Proxy to scale our Redis infrastructure. This article assumes that the reader is familiar with Redis and Docker Compose.
-
1000 node Redis Cluster
In this article we will expore how to launch 1000 (one thousand) node Redis Cluster running on bare metal servers in the cloud. We will then run 2000 thousand workers (with Kubernetes pods) to create load. It will perform 1 billion writes in about an hour generating over 300GB of data.
-
Redis the Red-Nosed Reindeer
During Christmas Santa delivers presents to children around the world. To do this he supposedly “makes his list and checks it twice”. But what if Santa had better software tools to keep track of all the presents that need to be delivered to millions of homes?
-
KubeCon18
Recently I attended KubeCon18 in Seattle. I also was able to do couple of lightning talks on using Ansible, Terraform and Packer (my previous 2 blog posts were the foundation for these presentations).
-
Managing dynamic inventory in the cloud with Ansible
Ansible is a useful tool for provisioning infrastructure (instaling software, modifying config files). Usualy Ansible expects an inventory file which specifies servers and their IP addresses. The challenge is that in the cloud they can change frequently.
-
Infrastructure as Code with Packer, Ansible and Terraform
This article is meant demonstrate one possible way of integrating Packer, Ansible and Terraform. It assumes that the reader is somewhat familiar with these (or similar) tools. For more in depth info on each tool please consult other resources.
-
Using Redis probabilistic data structures to track unique events
On a previous project we had to develop a process to de-dupe events from our web log files as we were processing them through a data pipeline. We decided to use a hash of IP & UserAgent combination to “uniquely” identify users responsible for the events and temporarily stored data in Redis.
-
RedisConf 2018
Last week I presented at RedisConf18. My talk was about integrating Redis with Elasticsearch (here are my slides). I spoke on how to use Redis as a temporary store during data processing and touched on Redis Streams (new data type coming soon).
-
Processing time series data
Modern software systems can collect LOTS of time series data. It could be an analytics platform tracking user interactions or it could be IoT system receiving measurements from sensors. How do we process this data in timely and cost effective way? We will explore two different approaches below.
-
Redis and Rube Goldberg machines
Rube Goldberg machines are deliberately complex contraptions that require the designer to perform a series of excessively convoluted steps to accomplish a very simple task (turn on a light switch). In my career I worked on some applications that also were a little too complex.
-
RediSearch and time series data
In previous post we explored integration between Redis and Elasticsearch for time series data. Now we will take deeper dive into how to search for time series data w/in Redis with RediSearch module.
-
Distributing DB load when running background jobs
What if we had a multi-tenant system where we needed to generate various reports? Typically we would do it at night as the load on the DB is usually less at that time.
-
Elasticsearch and Redis Pub/Sub
In previous posts we discussed integration between Elasticsearch and Redis and using Redis Streams to work with time series data. Now we will explore Redis PubSub using the same example of Ruby on Rails website for national retail chain.
-
Elasticsearch and Redis streams
Redis Lists can be used as queues for jobs to move data from primary data store to Elasticsearch. What if we have time-series data that needs to stay in Redis AND be copied to Elasticsearch?
-
Elasticsearch and Redis
Elasticsearch and Redis are powerful technologies with different strengths. They are very flexible and can be used for a variety of purposes. We will explore different ways to integrate them.
-
Queues - DB vs Redis vs RabbitMQ vs SQS
Queues can be useful tool to scale applications or integrate complex systems. Here is a basic use case. User registers and we need to send a welcome email. We record data in the User table and separately call API of email service provider. Sending email via background process will be faster UX plus we can retry in case of failure. But which technology should we use a queue backend?
-
Redis Workflow Engine
When we need to scale applications Redis can be a great tool. Slow tasks such as sending emails can be done via background process which is an easy win for user experience. In many situations we do not care about the order in which different jobs are executed.
-
Storing complex data structures in Redis
We use various data structures (linked lists, arrays, hashes, etc) in our applications. They are usually implemented in memory but sometimes we need persistence AND speed. This is where in memory DB like Redis can be very useful.
-
Does SQL plus Key Value equal Document?
Storing records in SQL DB requires a fixed set of columns. User has
first name
,last name
,email
, etc. But in a mutltitenant applications some records may require specific fields. We can have optional fields (middle name
) but having too many of them is not practical. -
Keeping our tests DRY
As we add more features to our applications we inevitably have to refactor existing code. At one point I had to introduce polymorphic relation to a model. Previously it was simply this with Mongoid and Ruby on Rails
-
RedisConf 2017
Last week I presented at RedisConf. My presentation was about storing volatile data in Redis and searching it with RediSearch (here are the slides). I was also able to attend other interesting presentations and got a chance to spend time with @antirez discussing future Redis features.
-
Trying out Rethink DB
Sometimes there is a technology which we love right away until we really use it in depth and then we start encountering it’s limitations. When I first used MongoDB with Mongoid ORM I loved the flexible schema and ability to declare fields right in my model classes (no need for schema migrations). But now after using Mongo for a number of years on different projects I really miss some traditional SQL features (like JOINs and transactions).
-
Redis data sharding
One of the limitations of Redis is that all data is stored in RAM. If we cannot scale up we need to scale out and shard our data.
-
Redis and ETL
Frequently our application captures highly volatile data in Redis but we also need to ETL some of those data elements to a different DB or data warehouse. We can change the same value (increment a counter) tens of thousands of times per second in Redis but we cannot (and don’t really need to) make the same updates in our SQL DB (where data is persisted to disk).
-
Rails session storage
When we build applications on a singleton server things are very simple. But then we need to start scaling out (usually better approach than scaling up) and we need to worry about session state management. Here is a great article by Justin Weiss and video of his talk on Rails sessions.
-
Bulk data import - part two
In previous post I wrote about using Redis and Sidekiq to do bulk data imports. But as with all scalability challenges this solution works up to a certain level. What if we have very large imports with millions of records?
-
Rails cache with variable TTL
Method level caching can be a useful tool to scale our applications. When the underlying data changes we need to bust cache by creating new
cache_key
. But the old cached content still remains in RAM (Redis or Memcached) until it is purged with TTL. -
Sidekiq with multiple queues
Sidekiq is a great library for background job processing. It uses Redis as a backend which makes queuing jobs extremely fast. In this article I will discuss various options for scaling and managing job processing with greater control.
-
RediSearch Module
In previous post I wrote about different ways we can search for records in Redis. In this article I want to do a deeper dive on RediSearch module.
-
Redis Search
When I first started using Redis I loved the speed and the powerful data structures. Over the years I used Redis for data analysis, caching, queuing background jobs and permanent data storage.
-
Redis and IP Throttling
Recently one of our websites was hit by a scraper. We could see the requests in our logs as they were querying our site for different keywords. Instead of adding a bunch of IPs to our firewalls we decided to implement more intelligent throttling.
-
What is the right size for background jobs?
In previous post I wrote about pre-generating cache via background jobs. I described an example of an online banking app where we pre-generate cache of
recent_transactions
. This helps even load on the system by pushing some of the data into cache before visitors come to the site. -
API Integration via Background Jobs
Often our applications integrate with different APIs. User may click one button but the application will make multiple API calls behind the scenes. That can take time and lead to situations where one API call succeeds but others fail. How can we make the process faster and more reliable?
-
Redis and Cache Pre-Generation
A common pattern is to use Redis as a cache store where the first application request forces code to execute and then caches the results. Subsequent requests use the cached data until Redis purges it via TTL.
-
Model Callbacks and Background Jobs
In building complex applications we can use callbacks to perform additonal actions before / after records are created / updated / deleted. The challenge is that these callbacks often fire additional DB queries which slows down with scale.
-
SendGrid Webhooks and Background Jobs
We use SendGrid for sending emails from our Rails application. SendGrid Webhooks sends us notifications when the emails are opened / clicked. We then use the
email_id
to find appropriate record in our DB and incrementopens
andclicks
counters. This enables us to quickly aggregate stats on how each mailing is performing. -
Redis NFL Leaderboard
In a previous post I discussed using Redis for Leaderboards. Let’s expand on these ideas. Recently at work we upgraded our fundraiser leaderboard and switched to use Redis as the data store with leaderboard gem.
-
Rails leaderboards
Leaderboard is a usefull way to show ranking of various records by specific criteria. Let’s imagine a system where Users have Purchases. We want to display users by the following metrics: number of purchases, total amount spent and average purchase amount.
-
Mongoid has_and_belongs_to_many with inverse_of :nil
Mongoid has_and_belongs_to_many gives us new ways of modeling relationships by not creating mapping tables/collections.
-
Rails and static content
Often in our applications we need to add pages with fairly static content (FAQ, About Us, etc). We could implement a full blown CMS or we could create a few HTML/ERB files. Let’s explore different approaches.
-
Rails Rspec mock tests
In object oriented programming classes call methods on other classes. While it’s important to test the integration between classes it is very useful to test them in isolation, simulating valid and invalid responses from the dependent objects.
-
RailsAdmin background import
rails_admin_import is a great gem allowing us to import records. But sometimes we have to import many thousands of records and this gem does not scale well. What we want to do is display “import has began” message to the user and queue up a background process to import records. Here is a pattern I have been following.
-
Rails testing DB indexes
With ActiveRecord to create indexes we need to run migrations. But with Mongoid we simply specify indexes in our models. Here is a hypothetical User model. We want email and name to be required and email to be unique.
-
Rails static data and system settings
Usually application data is stored in the DB. We use controllers and models to read and write it. But sometimes that data is static (system settings, list of countries, etc) so it does not make sense to put it in DB. Plus storing data in file(s) guarantees that when we deploy the application, the data will be there. Otherwise we have enter it manually via UI or load via SQL script.
-
Rails cache busting
Rails caching is a great tool for scaling websites. We can use different cache stores (Redis and Memcached being common choices).
-
Polymorphic relation to Single Table Inhertiance records
Previously I have written about single table inheritance and polymorphic relations. Here is an interesting combination of the two.
-
Cloning records in Rails
Sometimes we need to enable our users to clone their records instead of creating new ones from scratch (huge time saver). Let’s imagine a system where users have many accounts.
-
Batch email sending
In our application we send out LOTS of emails. And our clients need to control when the emails are sent and the exact content. Here is a previous post on how we attempted to solve it. We later switched to use SendGrid bulk sending API to avoid making individual API calls for every email.
-
Rails with many different DBs
It is easy to find articles online debating pros and cons of different databases. Often they have titles like “Why you should never use X DB”. And yes, different databases have different strengths and weaknesses. Choosing a DB that does not fit the long term needs can be a costly decision.
-
Rails application object pattern
Ruby on Rails has patterns for ApplicationRecord, ApplicationMailer, ApplicationJob, etc. Other gems follow this approach. Pundit has ApplicationPolicy and ActiveModelSerializers has AppliciationSerializer.
-
Rails concerns
By default Rails 4 and higher applications come with concerns in
app/models/concerns/*
andapp/controllers/concerns/*
. It can be a useful place to put code that needs to be shared across classes. It is also a way to implement multiple inheritance. -
Rake tasks vs. Ruby classes
When several years ago I started working with Ruby on Rails I liked Rake tasks. To me they were a great step up from ad hoc bash and SQL scripts. It was a way to build powerful CLIs to do basis sysadmin tasks, generate ad hoc reports, upload/download files, etc.
-
Rails and complex data migrations
When working with NoSQL DBs we do not worry about schema changes but we still need to do data migrations. We have been using mongoid_rails_migrations for this.
-
Rails nested routes and polymorphic associations
Recently at work we had to implement Rails nested resources with a polymorphic association. I thought it would create an interesting blog post.
-
redis_app_join ruby gem
Last week I wrote a post Redis as temp cache for application-side joins. I kept thinking of ways to make the process easier so I decided to create a redis_app_join gem.
-
Rails scopes inside other scopes
Rails scopes are a useful feature. We can define biz logic in the scopes and use them from controller actions or other model methods. We can also pass parameters into scopes and chain scopes together. I am not going to go into all the options but instead share how I recently started using scopes inside other scopes in the models.
-
Redis and production code coverage
Would you like to know how much of your code in production is actually getting used? And how often? When we run our tests we can use code coverage metrics (like simplecov) to see which parts of our code are tested or not.
-
Redis as temp cache for application-side joins
SQL joins are a powerful feature that enables using DB functionality to bring back records from different tables needed w/o making multiple queries. Unfortunately some of the new NoSQL DBs do not support them.
-
Roles and permissions - switching from CanCanCan to Pundit
Recently we switched our application from CanCanCan to pundit. CanCanCan is a great gem but we outgrew it. Here are the various lessons learned.
-
Rails validators
Rails model validations are very important for ensuring data integrity. You usually start with really simple inline
validates :name, presence: true
. -
Choosing a DB hosting service
Small tech startups often use cloud services like AWS, Azure or Google Cloud. When you are just getting started (perhaps paying for it yourself) you can get by with a single EC2 instance hosting both DB and application on the same server. But with success come scalabilty problems.
-
Redis and async microservices - part deux
I previously wrote about Microservices with Sidekiq and Redis and async microservices. In this post I will continue expanding on those ideas.
-
Storing ephemeral data in Redis
Usually our applications have a DB (MySQL, Postgres, etc) that we use to permanently store information about our users and other records. But there are also situations where we need to temporary store data used by a background process. This data might be structured very differently and would not fit into our relational DB.
-
Redis and asynchronous microservices
Previously I blogged about creating Microservices with Sidekiq. This artcile is an expansion on those ideas.
-
Mongo Atlas
We have been running MongoDB as the primary database for our Rails app for close to two years. And gradualy the amount of data has increased so we started having scalability issues. Problems were primarily in disk IO as we did not do a good job optimizing the drives in our EC2 instances.
-
Rails routes for wp-login.php and other URLs
You work hard and build your awesome Rails app. You launch your MVP and slowly users start coming to your site. But then you start noticing exceptions in your logs for URLs like “wp-login.php” and “login.aspx”.
-
Redis modules
A couple of months ago I had a chance to attend RedisConf and present about using Rails with Redis. You can read my blog post about it here or watch the presentation.
-
What app version are you running?
How do you know which specific version of your code is running on each server? Even with automated deployment tools (chef, capistrano, puppet) it’s easy to make a mistake and deploy the wrong code. And then you are wondering why the new feature is not working.
-
Global ID and has_and_belongs_to_many
Recently I had to design a feature where user could save specific reports with preset filter options to make it easier to run them. The challenge is that reports can be ran across different types of records so modeling the relationships was interesting.
-
Modular code structure
More thoughts on structuring code and running it via background jobs. This post was inspired by me trying to wrap my head around Sandi Metz’ Rules For Developers.
-
Polymorphic has_and_belongs_to_many relationships
When we first start designing Rails (or other framework) appliciations we encounter typical has_many and belongs_to relationships. User has_many Articles and Article belongs_to User.
-
Redis and testing your code
In previous posts I blogged about using Redis to store data. The question is how to test the code that leverages this data?
-
Doing more with Redis and Rails
Recently I had a chance to present at RedisConf on various ways Redis can be used to quickly scale Rails applications. This is a blog post is an expansion of the ideas that I discussed. You also might want to read my previous posts about Redis here and here.
-
Why I am writing this blog
Writing this blog certainly takes time. To come up with good ideas, do appropriate research, have the right code examples. A single post can easily take hours to write and edit before publishing. And it certainly isn’t making me rich or famous. So why do I do that?
-
Reporting frameworks
As the amount of data we deal with grows it is important to effectively present it. Users need to see high level summary and then drill in on specific details. It might not be glamorous but it’s essential for any organization.
-
Sidekiq batches
Recently I was finally able to implement background job importer (see previous post on Importing LOTS of data). It proved to be very interesting and challenging. I have been successfully running Sidekiq in production for several months but only individual jobs. For this I needed to run job batches, temporarily store results of each job and email results when all jobs were done.
-
Open source software and documentation
Many of us have come across an open source library that seems to be just what we need to solve a specific problem. Or watched a YouTube video of the project founder demoing it at a conference. Except when we actually use the library, we hit a wall. Or if we use it in a different way (due to custom requirements) and get cryptic error messages. So being open source we read the code and hit even bigger wall.
-
Single table inheritance
Often you have models with optional fields. And then you have to implement business logic that makes those fields required IF another field is set to specific value. Otherwise these fields are not allowed.
-
Microservices with Sidekiq
Much has been written about pros and cons of monolithic app vs microservices. Here is a great post by Martin Fowler. I am not going to talk about the big issues but simply share ideas on how I have been thinking of breaking up a Rails app I am working on.
-
Stackoverflow
I recently became intrigued by how StackOverflow reputation system worked. Naturally I decided to find ways to grow my own reputation. After all, with higher reputation you get permissions to do more things on the site. I thought one way to get more reputation points would be to search for questions about technologies that I am familiar with (rails_admin, devise, sidekiq, mongoid) and answer them.
-
Using Rails Admin to implement state machine workflow UI
Happy New Year. As we transition from old to new I was thinking about state machines. Well, not really but I thought it was a good opening for this post.
-
Blogging using Jekyll and GitHub
This blog is published using Jekyll and GitHub pages so I wanted to share my experience with it.
-
Sending LOTS of emails from Rails ActionMailer
ActionMailer is great. It allows you to create email templates and put logic in Mailer classes. You can use Roadie to merge in CSS, customizing look and feel. The problem arises when you have to send tens or hundreds of thousands of emails. Each one is a separate API or SMTP call to your email service provider.
-
Why contribute to open source?
Many of us use open source software to get stuff done. But few of us contribute back. I am guilty of that. I’ve done a few PRs, created one gem that I consider somewhat valuable, plus wrote few comments on StackOverflow (my reputation is not very high).
-
Lessons learned upgrading MongoDB from 2.6 to 3.0 and WiredTiger Engine
We have been using MongoDB for a while and overall it served as well. Our system has been growing and we began expereincing some pain with our DB writes. CPU would spike to over 60% and were unable perform some background jobs as fast as we wanted. Part of the problem is that we need to do better job optimizing our hard drives but that’s another story.
-
Importing LOTS of data
Often you have to enable users to load large amounts of records into your application (usually from spreadsheets). So you build a few methods on your models, create controller end points and basic upload forms. With model validations your code is fairly clean and works great. Except then users start loading very large amounts of data (many thousand of records). And they sit there waiting for controller response.
-
Using Mongo flexible schema with Rails app
One of the great things about MongoDB is the flexible schema. In the past whenever we had to store custom data attributes in relational DBs we had to create separate lookup tables. There would be a preset number of these custom fields with specific data types and separately we stored what their labels should be. Or we would create tables for key/value pairs and do complex lookups. It was a pain. That’s where flexible schema is great.
-
Logging important info
As we build websites they hopefully grow in functionality and usage. It becomes important to log appropriate information so you can later investigate issues in case something goes wrong. Greping multiple log files across many servers is quite time consuming so services like Logentries and Loggly can be helpful. Or you can roll your own with Fluentd or https://github.com/le0pard/mongodb_logger. But sometimes you just need something simple for a very specific need. Here is how I recently solved it at work in our Rails 4.1 app.
-
Hash fields vs embedded documents in MongoDB
I have been using MongoDB for a few years and really like it. One useful feature is ability to store complex data types such as Hashes or Arrays in fields. Actually MongoDB itself does not support Hashes but you can do it using ODM like Mongoid. It is much eaiser than serializing complex structure and storing it as string in the DB.
-
Tips & tricks with Redis and Rails
Much has been written about using Redis in Rails for various things. We are using it for caching data and running Sidekiq. Sidekiq web UI gives you nice visibility into how Sidekiq is doing but I wanted to have a more in-depth view of what is actually stored in my Redis DB. I came across Redis-Browser and wanted to share some lessons learned.
-
Using online perf test services to test your production site
Recently I had to run an extensive (6 hour) online stress test to prove to an important customer that our system can handle the load. Internally I have been using tools like Siege and Wrk to stress test the site. But obviously customer wanted something “official” from a third party service. I ended up using Loader.io from SendGrid.
-
RailsAdmin custom actions
RailsAdmin is a great gem and it can be extended via custom actions. But the documentation is slightly incomplete so I wanted to share my experience using it over the last couple of years. Using these custom actions we were able to significantly extend our internal admin UI. The look & feel is not as important as ability to quickly enable basic editing functionality.
-
Figuring out which DB queries to cache
I like using RailsAdmin for basic UI. To calculate certain business stats I implemented various methods on my models (for example Customer model can have total_orders method). Then in my rails_admin.rb initializer I can do this:
-
AWS T2 instances and how to NOT run out of CPU credits
When you first launch your site you are not sure of what the traffic will be and don’t want to spend too much $ on hosting. The nice thing about AWS is you can scale up as you need by adding new instances or upgrading existing ones. But in those early days you will often have mostly low volume with occasional spikes when you have influx of visitors or are running periodic background process.
-
My bad development habits
Like all developers I have some bad habits. And even though I know they are bad, they are hard to break (like flossing teeth at night). The purpose of making this list was almost like therapy, to get me to change these habits. Here are some of them:
-
Speeding up automated tests
Like many developers today I embrace automated tests. Both for unit tests and more high level integration tests. Ability to run a test suite with good (hopefully 90%) code coverage gives me a great feeling even when I make minor changes. And it’s especially important when building major features or doing significant refactoring.
-
Avoiding single points of failure
I hate Single Points of Failure (SPOF). To me it’s rolling the dice over the over hoping that it works and eventually something breaks. You code may work fine but the server behind it fails. With modern cloud computing we are largely isolated from hardware failures but there is still (however remote) possibility of OS crash. Or you could have that particular server down for maintenance.
-
Structuring background jobs
Many applications need to do certain background tasks such as sending daily emails, generating reports or downloading data. Rails 4.2 provides a really good framework with ActiveJob which also has a backport to previous Rails versions.
-
UI for backend devs
I spent most of my career in various individual contributor / team lead roles building software systems. I love dealing with data, I worry about things like redundant servers, backups and failover scenarios.
-
One way to setup staging servers for final verification before going live
Once we are done coding and testing software on our dev machines it’s often important to check it in environment similar to production (at least do basic visual verification). I worked in a lot of places where we had separate staging/demo environments with dedicated databases. Code would be deployed there and verified by biz users as the last step before launch.
-
My ideal production monitoring solution
It’s not fun getting woken up in the middle of the night when your system crashes but it’s even worse sleeping through it and waking up to much bigger mess in the morning. But heck, it’s a strong incentive to write quality code that will perform in the real world (not just on your laptop).
-
Rails debugging tips and tricks
All of us have various debugging techniques we prefer. Some like features provided by powerful IDEs (Visual Studio, RubyMine, Eclipse), others use vim, Sublime or Atom.
-
Combining MongoDB and Redis
Much has been written about NoSQL databases such as Mongo and Redis. I wanted to share some of my recent experience how I used them together but for very different purposes. This is NOT an in-depth guide to either as there are plenty of other resources for that online.