Rails and complex data migrations

When working with NoSQL DBs we do not worry about schema changes but we still need to do data migrations. We have been using mongoid_rails_migrations for this.

Private methods
Exception handling
Testing
Polymorphic relationships
Has And Belongs To Many relationships
Lots of data
Useful links

Sometimes the migrations are pretty simple.

class MyMigration < Mongoid::Migration
  def self.up
    Model.where(field: 'value1').update_all(field: 'value2')
  end
  def self.down
    Model.where(field: 'value2').update_all(field: 'value1')
  end
end

Private methods

And sometimes they are more complex. We can have 30+ lines in the up method as we are looping through records, validating / transforming the data and then updating / creating other records in our DB. Why not move that logic into separate private methods in the migration class (it’s a Ruby class after all) and call them as needed?

class AnotherMigration < Mongoid::Migration
  def self.up
    Model.where(field: 'value1').each do |record|
      result = process_record(record)
      update_related_record(result)
    end
  end
  def self.down
    Model.where(field: 'value1').each do |record|
      revert_record(record)
    end
  end
private
  # need to use self because these class methods
  def self.process_record
    # actual logic here
  end
  def self.update_related_record(result)
    ...
  end
  def self.revert_record
    ...
  end
end

Exception handling

When running these migrations it might be OK to just skip a few errors and continue. For that we can use exceptions. I also like to use limit clause to speed things up when debugging.

def self.up
  Model.where(field: 'value1').limit(10).each do |record|
    begin
      # do stuff
    rescue Exception => e
      puts e
    end
  end
end

Testing

Sometimes the migrations are so complex that we want to write actual automated tests.

# spec/migrations/user_migration_spec.rb
require 'rails_helper'
# load migration class
require Dir[Rails.root.join('db/migrate/*_user_migration.rb')].first
describe UserMigration, type: :migration do
  it 'up' do
    # create records using FactoryGirl
    user = create(:user)
    UserMigration.up
    expect(user.reload.field).to eq 'new value'
  end
  it 'down' do
    user = create(:user, field: 'new value')
    UserMigration.down
    expect(user.reload.field).to eq 'old value'
  end
end

The same approach should work with data migrations in SQL DBs. Just treat migrations as Ruby classes and test their methods.

Polymorphic relationships

Here are a few simple polymorphic models.

# app/models/user.rb
class User
  has_many :articles, as: :author, dependent: :delete
end
# app/models/article.rb
class Article
  belongs_to :article, polymorphic: true
end

We now need to rename User model to Person. We can rename the class and DB table but how do we change the article relationships? Well, as long as the IDs of indvividual person/user records did not change we can do this:

Article.where(author_type: 'User').update_all(author_type: 'Person')

Has And Belongs To Many relationships

With Mongoid has_and_belongs_to_many we can store child records in an array inside the parent.

# app/models/user.rb
class User
  has_and_belongs_to_many :groups
end
# app/models/group.rb
class Group
  has_and_belongs_to_many :users
end

It will look like this in the DB:

# User record
{
    "_id" : ObjectId("56941557213ae91d96000002"),
    "name" : "Bob Smith",
    "group_ids" : [
        ObjectId("56158d9269702d7a8c00018a")
    ]
}
# Group record
{
    "_id" : ObjectId("56158d9269702d7a8c00018a"),
    "name" : "Soccer group",
    "user_ids" : [
        ObjectId("56941557213ae91d96000002")
    ]
}

Now we need to rename Group to Team. Here is the migraiton.

User.exists(group_ids: true).rename(group_ids: :team_ids)

Lots of data

Let’s imagine a blogging platform.

class Company
  has_many :users
end
class User
  belongs_to :company
  has_many :articles
end
class Article
  belongs_to :user
  has_many :comments
end
class Comment
  belongs_to :article
end

Now we need to create a relationship between comment and article author.

class User
  has_many :article_comments, class_name: 'Comment', inverse_of: :article_author
end
class Comment
  belongs_to :article_author, class_name: 'User', inverse_of: :article_comments
end

And we need a migration to update records. But we have millions of comments and thousands of articles. This will be VERY slow as it will query for each article AND user and then do indvividual updates.

Comment.all.no_timeout.each do |c|
  c.update(article_author_id: c.article.user_id)
end

This will be faster because it will eager load related articles. But it will require lots of RAM.

Comment.all.includes(:article).no_timeout.each do |c|
  c.update(article_author_id: c.article.user_id)
end

This will be even faster because it will do bulk updates for ALL comments for specific article but will still require lots of RAM.

Article.all.includes(:comments).no_timeout.each do |a|
  a.comments.update_all(article_author_id: a.user_id)
end

This will break up work into smaller chunks for each group of users (by company). It will require far less RAM.

def self.up
  Company.all.no_timeout.each do |company|
    update_comments company.users.pluck(:_id)
  end
end
def self.update_comments user_ids
  Article.in(user_id: user_ids).includes(:comments).no_timeout.each do |art|
    art.comments.update_all(article_author_id: art.user_id)
  end
end

Alternatively we could batch users. With ActiveRecord we could use find_in_batches. For Mongoid use something like this gist

def self.up
  User.find_in_batches(batch_size: 100) do |batch|
    update_comments batch.pluck(:id)
  end
end
def update_comments user_ids
end

I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.