I am a Sr. Software Developer at Oracle Cloud. The opinions expressed here are my own and not necessarily those of my employer.
Apr 3, 2016
As the amount of data we deal with grows it is important to effectively present it. Users need to see high level summary and then drill in on specific details. It might not be glamorous but it’s essential for any organization.
When I used .NET technologies in the past I really liked SQL Reporting Services. While the reports had fairly standard look they were easy to build with WYSIWIG tols and wizards. They also gave you features such as exporting and emailing reports right out of the box.
Unfortunately I have not been able to find such an integrated framework in Ruby world but there are lots of gems that allow you to build something much more customizable to your needs.
Let’s assume our application has Users (first_name, last_name, email), Articles (title, body, belongs_to user) and Comments (body, belongs_to user and article) models. All models have created_at and updated_at. You might need reports such as articles published by day, articles with most comments, users sorted by the number of articles they published, etc.
Charts help us to easily visualize data. There are many libraries out there but I like the simplicity of chartkick with Google Charts API.
Instead of putting business logic in our View layer I prefer to do this:
This allows you to build much more complex logic for gathering your data and test it w/in your Ruby classes. If you don’t like service objects you could put it in your models or decorators. Chartkick and Google Charts support many other chart types and configuration options, read their docs for more details.
If you need more control over the presentation then you could build a custom form with form_tag and modify index action to accept additional parameters.
Another interesting gem is datagrid but I haven’t done much with it. For pagination I usually use kaminari. It works with many ORMs and is highly configurable.
Even when we build highly visual and interactive dashboards users still need to export records into Excel.
When all you need is a basic report to dump data into XLSX it can be done via simple Rake task. Create lib/tasks/reports.rake
I am using axlsx and axlsx_rails gems. You can run this task on demand or via crontab with whenever gem. Obviously this approach is not very scalable and hard to test. But it’s great for prototyping things.
For more robust solution I like to use Service Objects for report generators. I create folders app/services/reports and spec/services/reports.
But then you need to create more reports. There will be lots of common code when it comes to saving and emailing files but the business logic of each query will differ. Object oriented programming to the rescue.
You also want to give users ability to execute these reports via UI so you don’t have to run bash commands. For that I use ReportsController with index and create actions. You can download the file with send_data or send_file.
Using duck typing I am calling the appropriate report. So if I add new report type I just include it in application.rb and create new service object following the naming convention. No need to change controller or UI.
Some reports can take a long time to execute so we run them via background job using Sidekiq (or Resque or DelayedJob).
A nice feature is to enable users to opt-in to receive specific reports by email on periodic basis.
We need a new model to store user selected reports / frequency on top of which you can build standard Rails scaffold with CRUD operations.
Then we schedule a nightly job. To keep things simple users receive daily reports for previous day’s data, weekly reports on Monday (for previous week) and monthly reports on the 1st (for previous month).
I like using sidekiq-cron to schedule the jobs but there are alternatives.
The most common format I export data to is Excell. For PDF you can use prawn. Here is good RailsCasts episode.
Sometimes you have large customers that have unique data requirements that cannot be satisfied with the common reports. As much as it pains to write custom code, sometimes we need to do it. The important thing is to keep it separate from your main codebase.
I usually create folders such as app/services/custom/ and app/jobs/custom/ and place my code there. You can build these one-off reports quickly and email the files (have your customers create email distribution lists so they can add/remove recipients).
To expose these reports via UI you can create a config file to map which reports can be seen by whom.
ReportsController code will use duck_typing to run the appropriate class. I have never actually implemented such solution in production so there are might be better ideas out there.
Different users might need to be restricted from seeing sensitive reports (financial data). You can use cancancan to define permissions like this:
For more flexible solution checkout punding, less DSL and more Ruby code.
Data pre-generation and archiving
Even with declining storage costs you eventually need to start purging some of the granular data and only retain high level summary stats. I personally like to pre-generate data in all granularity levels upfront and then delete the more granular data sets. Let’s say we are tracking article views.
Use Mongo TTL indexes to expire data (no need for cron job).
Alternatively you could model data differenly using Mongo documents where each documents contains records for all dates for each article.
Another option is to rollup details stats to daily summaries (deleting detailed stats), then convert daily to monthly (and delete daily records). I personally find it more complex and error prone.
Often users need to extract data from your application and load it into another system. Excell can be a viable alternative at small scale but you will need a more robust solution as you grow.
Reporting APIs can be synchronous with data sent the data on the first request. It is simpler to build but it can timeout if report generation takes too long. Or you can create asynchronous API where client is issued a token, report generation is queued up and client needs to make second request to retrieve the data within certain time period (before file is purged).
Also I prefer to create separate URL endpoints for APIs (api.mywebsite.com vs mywebsite.com) so it can be directed to different load balancer using DNS (eaiser to scale if needed). You could even build API with different technology (express, sinatra, or rails-api).
Well, that’s it for this post, I hope you find it useful. All comments are welcomed.