Ruby on Rails is a pretty straightforward framework to get started on, it is easy to understand and learn, but to achieve mastery it requires time, just like any other framework or tool. "Rails do not scale", this is one of the biggest misconception and argument used to downplay the framework. In my opinion bad system design no matter the language or framework will always be slow and hard to scale.
N+1
The N+1 problem is one of the most common anti-patterns that affects performance in a very meaningful way. This happens when a query is executed as a result of the previous query. This issue might not be as obvious when you have a small application or dataset, but if we want to keep our application light and fast we need to remove this anti-pattern.
Here is an example:
Tweet.all.each do |tweet|
puts "#{tweet.title} was written by #{tweet.user.username}"
end
This will produce the following queries:
Tweet Load (0.4ms) SELECT "tweets".* FROM "tweets"
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]
User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 2], ["LIMIT", 1]]
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 2], ["LIMIT", 1]]
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 2], ["LIMIT", 1]]
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 2], ["LIMIT", 1]]
In this example, we first retrieve all the Tweet objects and then for every tweet we retrieve the users. There are 6 tweets in the database, this example executes 7 queries.
Imagine if we have a bigger application, where there are 15,000 tweets in the database, the application will timeout and hit a database connection limit.
Eager Loading for N+1
There are several ways to implement eager loading on Ruby on Rails, the easiest one is to use the includes method to specify the relationship that needs to be included in the result set.
Let's create a fix for the N+1 with the includes method:
Tweet Load (103.7ms) SELECT "tweets".* FROM "tweets"
User Load (32.1ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?, ?, ?) [["id", 1], ["id", 2]]
With the includes method, now we have a the same output but this time we are executing only 2 queries to load the data into memory. Even if we have 15,000 tweets on the database, this will only execute 2 queries!
How does it work?
What is happening is that Tweet.includes(:user)
is querying and retrieving the corresponding user records from the database immediately after the initial request for all tweets.
Using Scopes
Scopes are a powerful feature in Ruby on Rails that allow you to define reusable query fragments. Scopes can help you keep your code DRY (Don't Repeat Yourself) and make it easier to write complex queries.
For example, let's say we want to find all the posts that were created in the last week. We could write a scope like this:
class Tweet < ApplicationRecord
scope :recent, -> { where("created_at >= ?", 1.week.ago) }
end
# We can then use this scope in our code like this:
@tweets = Tweet.recent
This will return all the posts that were created in the last week. Scopes can be chained together to create more complex queries, like this:
@tweets = Tweet.recent.includes(:comments).order(created_at: :desc)
This will return all the posts that were created in the last week, including their comments, and order them by creation date in descending order.
Use database indexes
Indexes are a way to optimize database queries by creating a data structure that allows the database to quickly find the data that it needs. In Ruby on Rails, you can create indexes by adding a migration that creates an index on a table.