codedecoder

breaking into the unknown…

eager loading in rails

Leave a comment

recently I got a interview call from a reputed company and come to know that, I have not so good understanding of data loading in rails. So, I tried to dig deeper into data loading. To start with – data can be loaded in two ways : lazy loading and eager loading. The first one refer to loading associated data when need arises and second one refer to loading the associated data , at the time of retrieving the parent object itself. say you have below association

class Physician < ActiveRecord::Base
  has_many :appointments
  has_many :patients, :through => :appointments
  attr_accessible :name, :specialization, :mobile_no
end
 
class Appointment < ActiveRecord::Base
  belongs_to :physician
  belongs_to :patient
  attr_accessible :reason, :physician_id, :patient_id
end
 
class Patient < ActiveRecord::Base
  has_many :appointments
  has_many :physicians, :through => :appointments
  attr_accessible :name, :mobile_no
end

The corresponding tables for them will look as below :

class CreatePhysicians < ActiveRecord::Migration
  def change
    create_table :physicians do |t|
      t.string :name
      t.string :specialization
      t.integer :mobile_no
      t.timestamps
    end
  end
end

class CreateAppointments < ActiveRecord::Migration
  def change
    create_table :appointments do |t|
      t.string :reason
      t.references :physician
      t.references :patient
      t.timestamps
    end
  end
end

class CreatePatients < ActiveRecord::Migration
  def change
    create_table :patients do |t|
      t.string :name
      t.integer :mobile_no
      t.timestamps
    end
  end
end

Let us create some seed data to experiment with. Add below in your seeds.rb file

Physician.create(:name => "Arun", :specialization => "ENT", :mobile_no => "9569806453")
Physician.create(:name => "Anand", :specialization => "MS", :mobile_no => "9569807853")

Patient.create(:name => "Raman", :mobile_no => "9569807851")
Patient.create(:name => "Kapil", :mobile_no => "9569807852")
Patient.create(:name => "Gita", :mobile_no => "9569807853")
Patient.create(:name => "Santosh", :mobile_no => "9569807854")

Appointment.create(:reason => "Ear pain", :physician_id  => 1, :patient_id => 2)
Appointment.create(:reason => "Teeth pain", :physician_id  => 1, :patient_id => 3)
Appointment.create(:reason => "Teeth pain", :physician_id  => 1, :patient_id => 4)
Appointment.create(:reason => "Surgery", :physician_id  => 2, :patient_id => 4)
Appointment.create(:reason => "Surgery", :physician_id  => 2, :patient_id => 1)

run rake db:seed to populate the above data.

Now we have the association ready along with some dummy data.But before proceeding further, try to understand, why we need eager loading. Say the hospital need to generate a report for the months, indicating the number of patients handled by a individual doctor. You may do something like this.

1.9.3-p194 :052 > @doctors = Physician.all
Physician Load (0.2ms)  SELECT `physicians`.* FROM `physicians`
=> [#<Physician id: 1, name: “Arun”, specialization: “ENT”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :059 > @doctors.each do |doctor|
1.9.3-p194 :060 >     doctor.patients.each do |patient|
1.9.3-p194 :061 >       puts patient.name
1.9.3-p194 :062?>     end
1.9.3-p194 :063?>   end
Patient Load (0.2ms)  SELECT `patients`.* FROM `patients` INNER JOIN `appointments` ON `patients`.`id` = `appointments`.`patient_id` WHERE `appointments`.`physician_id` = 1
Kapil
Gita
Santosh
Patient Load (0.3ms)  SELECT `patients`.* FROM `patients` INNER JOIN `appointments` ON `patients`.`id` = `appointments`.`patient_id` WHERE `appointments`.`physician_id` = 2
Raman
Santosh

So, you can see that for 2 doctors in database, 3 queries hitting the database. the first query was to retrieve all the doctors and 2 queries to retrieve the details of there patients. So it leads to n+ 1 queries hitting database. Imagine the load on database, if there is thousands of doctors. This can be avoided by eager loading

Now we are ready to get into the detail of eager loading.The first thing to remember is that, you can do eager loading in any of the following ways

=> includes
=> preload
=> eager_load

O.K so we have too many ways to do eager loading. Let us see how they differ by trying out them on console .

1.9.3-p194 :039 > @doctors = Physician.includes(:patients)
Physician Load (0.3ms)  SELECT `physicians`.* FROM `physicians`
Appointment Load (0.2ms)  SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`physician_id` IN (1, 2)
Patient Load (0.3ms)  SELECT `patients`.* FROM `patients` WHERE `patients`.`id` IN (2, 3, 4, 1)
=> [#<Physician id: 1, name: “Arun”, specialization: “ENT”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :040 > @doctors.first.patients
=> [#<Patient id: 2, name: “Kapil”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 3, name: “Gita”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 4, name: “Santosh”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :041 > @doctors.first.appointments
=> [#<Appointment id: 1, reason: “Ear pain”, physician_id: 1, patient_id: 2, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 2, reason: “Teeth pain”, physician_id: 1, patient_id: 3, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 3, reason: “Teeth pain”, physician_id: 1, patient_id: 4, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]

 

1.9.3-p194 :043 > @doctors = Physician.preload(:patients)
Physician Load (0.1ms)  SELECT `physicians`.* FROM `physicians`
Appointment Load (0.1ms)  SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`physician_id` IN (1, 2)
Patient Load (0.1ms)  SELECT `patients`.* FROM `patients` WHERE `patients`.`id` IN (2, 3, 4, 1)
=> [#<Physician id: 1, name: “Arun”, specialization: “ENT”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :044 > @doctors.first.patients
=> [#<Patient id: 2, name: “Kapil”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 3, name: “Gita”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 4, name: “Santosh”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :045 > @doctors.first.appointments
=> [#<Appointment id: 1, reason: “Ear pain”, physician_id: 1, patient_id: 2, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 2, reason: “Teeth pain”, physician_id: 1, patient_id: 3, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 3, reason: “Teeth pain”, physician_id: 1, patient_id: 4, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]

 

1.9.3-p194 :046 > @doctors = Physician.eager_load(:patients)
SQL (0.4ms)  SELECT `physicians`.`id` AS t0_r0, `physicians`.`name` AS t0_r1, `physicians`.`specialization` AS t0_r2, `physicians`.`mobile_no` AS t0_r3, `physicians`.`created_at` AS t0_r4, `physicians`.`updated_at` AS t0_r5, `patients`.`id` AS t1_r0, `patients`.`name` AS t1_r1, `patients`.`mobile_no` AS t1_r2, `patients`.`created_at` AS t1_r3, `patients`.`updated_at` AS t1_r4 FROM `physicians` LEFT OUTER JOIN `appointments` ON `appointments`.`physician_id` = `physicians`.`id` LEFT OUTER JOIN `patients` ON `patients`.`id` = `appointments`.`patient_id`
=> [#<Physician id: 1, name: “Arun”, specialization: “ENT”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :047 > @doctors.first.patients
=> [#<Patient id: 2, name: “Kapil”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 3, name: “Gita”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 4, name: “Santosh”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :048 > @doctors.first.appointments
Appointment Load (0.3ms)  SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`physician_id` = 1
=> [#<Appointment id: 1, reason: “Ear pain”, physician_id: 1, patient_id: 2, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 2, reason: “Teeth pain”, physician_id: 1, patient_id: 3, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 3, reason: “Teeth pain”, physician_id: 1, patient_id: 4, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]

strange, appointment is not loading the through relation appointments and firing a separate query for it .let us try again by including it explicitly.

1.9.3-p194 :064 > @doctors = Physician.eager_load(:patients,:appointments)
SQL (0.4ms)  SELECT `physicians`.`id` AS t0_r0, `physicians`.`name` AS t0_r1, `physicians`.`specialization` AS t0_r2, `physicians`.`mobile_no` AS t0_r3, `physicians`.`created_at` AS t0_r4, `physicians`.`updated_at` AS t0_r5, `patients`.`id` AS t1_r0, `patients`.`name` AS t1_r1, `patients`.`mobile_no` AS t1_r2, `patients`.`created_at` AS t1_r3, `patients`.`updated_at` AS t1_r4, `appointments_physicians`.`id` AS t2_r0, `appointments_physicians`.`reason` AS t2_r1, `appointments_physicians`.`physician_id` AS t2_r2, `appointments_physicians`.`patient_id` AS t2_r3, `appointments_physicians`.`created_at` AS t2_r4, `appointments_physicians`.`updated_at` AS t2_r5 FROM `physicians` LEFT OUTER JOIN `appointments` ON `appointments`.`physician_id` = `physicians`.`id` LEFT OUTER JOIN `patients` ON `patients`.`id` = `appointments`.`patient_id` LEFT OUTER JOIN `appointments` `appointments_physicians` ON `appointments_physicians`.`physician_id` = `physicians`.`id`
=> [#<Physician id: 1, name: “Arun”, specialization: “ENT”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :065 > @doctors.first.patients
=> [#<Patient id: 2, name: “Kapil”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 3, name: “Gita”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 4, name: “Santosh”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :066 > @doctors.first.appointments
=> [#<Appointment id: 1, reason: “Ear pain”, physician_id: 1, patient_id: 2, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 2, reason: “Teeth pain”, physician_id: 1, patient_id: 3, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Appointment id: 3, reason: “Teeth pain”, physician_id: 1, patient_id: 4, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]

you can see that, this time the through association appointment also get eager loaded and no separate query fired for it.

O.K….so let us stop here and analyse the above result.

=> includes and preload made 3 queries to database . eager_load made single query to database
=> the 3 queries made by includes and preload is small queries targeted to the three table physicians, appointments and patients. the single query by eager_load involve join of all the three table
=> the query executed by includes and preload is exactly same, but eager_load query differ from both
=> eager_load do not load the through association implicitly, but you have to specify it explicitly.
=> overall time taken by includes is 0.8ms,preload is 0.3ms and eager_load is o.4ms

So, it is clear that executing a set of small queries through preload is faster than, executing a big join of these queries. Also join suffer from cartesian product overload problem as join produce a number of duplicate record which though not affect the database but hit rails on its back as it have to deal with larger number of small and short lived object.

Here rails show its smartness through includes. Actually internally rails eager load data through preload(i,e set of small queries) or eager_load(i,e through join). include just delegate the eager load to preload or eager_load depending on the nature of query you are making. In simple word, If you are not sure which is more efficient for you : preload or eager_load, just use include and it will smartly decide which one is better and delegate the job to that.

Let us see it in action on console.

1.9.3-p194 :002 > @doctors = Physician.includes(:patients).where(“patients.mobile_no != ?”, “765778989”)
SQL (0.4ms)  SELECT `physicians`.`id` AS t0_r0, `physicians`.`name` AS t0_r1, `physicians`.`specialization` AS t0_r2, `physicians`.`mobile_no` AS t0_r3, `physicians`.`created_at` AS t0_r4, `physicians`.`updated_at` AS t0_r5, `patients`.`id` AS t1_r0, `patients`.`name` AS t1_r1, `patients`.`mobile_no` AS t1_r2, `patients`.`created_at` AS t1_r3, `patients`.`updated_at` AS t1_r4 FROM `physicians` LEFT OUTER JOIN `appointments` ON `appointments`.`physician_id` = `physicians`.`id` LEFT OUTER JOIN `patients` ON `patients`.`id` = `appointments`.`patient_id` WHERE (patients.mobile_no != ‘765778989’)
=> [#<Physician id: 1, name: “Arun”, specialization: “ENT”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]

So, you can see that, this time includes fire a single query through join. Here rails saw that, patient to be retrieved has some condition imposed on it, so delegated the job to eager_load internally rather then preload. Note that you can’t eager load conditional data using preload. try the above query through preload.

1.9.3-p194 :003 > @doctors = Physician.preload(:patients).where(“patients.mobile_no != ?”, “765778989”)
Physician Load (0.3ms)  SELECT `physicians`.* FROM `physicians` WHERE (patients.mobile_no != ‘765778989’)
Mysql2::Error: Unknown column ‘patients.mobile_no’ in ‘where clause’: SELECT `physicians`.* FROM `physicians`  WHERE (patients.mobile_no != ‘765778989’)
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column ‘patients.mobile_no’ in ‘where clause’: SELECT `physicians`.* FROM `physicians`  WHERE (patients.mobile_no != ‘765778989’)

So we get the error as expected.

Now let us make some conclusion at this point
=> use includes if you want rails to decide, to load data in single join query or small set of queries
=> use eager_load if you want to make single query to db without leaving the decision to includes
=> use preload when you want to load data with set of queries

use of preload do not look convincing…right? . let us see a case, where preload can only fit the need. Say you want the physician whose patient is Raman, but at the same time you want to eager load all the patient of that physician.

The query to find doctor whose patient is raman is as below.
1.9.3-p194 :019 > @doctors=Physician.joins(:patients).where(“patients.name = ?”, “Raman”)
Physician Load (0.1ms) SELECT `physicians`.* FROM `physicians` INNER JOIN `appointments` ON `appointments`.`physician_id` = `physicians`.`id` INNER JOIN `patients` ON `patients`.`id` = `appointments`.`patient_id` WHERE (patients.name = ‘Raman’)
=> [#]

let us chain includes to the above query to eager load the patients detail

1.9.3-p194 :017 > @doctors=Physician.joins(:patients).where(“patients.name = ?”, “Raman”).includes(:patients)
SQL (0.4ms) SELECT `physicians`.`id` AS t0_r0, `physicians`.`name` AS t0_r1, `physicians`.`specialization` AS t0_r2, `physicians`.`mobile_no` AS t0_r3, `physicians`.`created_at` AS t0_r4, `physicians`.`updated_at` AS t0_r5, `patients`.`id` AS t1_r0, `patients`.`name` AS t1_r1, `patients`.`mobile_no` AS t1_r2, `patients`.`created_at` AS t1_r3, `patients`.`updated_at` AS t1_r4 FROM `physicians` INNER JOIN `appointments` ON `appointments`.`physician_id` = `physicians`.`id` INNER JOIN `patients` ON `patients`.`id` = `appointments`.`patient_id` WHERE (patients.name = ‘Raman’)
 => [#<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :018 > @doctors.first.patients
=> [#<Patient id: 1, name: “Raman”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]

So you can see that here, eager loading returning the patient Raman only as includes , seeing where clause in the query delgate the job to eager_load which create a single join with the given condition, returning only the single patient raman.

Now fire the same query, using preload.

1.9.3-p194 :015 > @doctors=Physician.joins(:patients).where(“patients.name = ?”, “Raman”).preload(:patients)
Physician Load (0.2ms)  SELECT `physicians`.* FROM `physicians` INNER JOIN `appointments` ON `appointments`.`physician_id` = `physicians`.`id` INNER JOIN `patients` ON `patients`.`id` = `appointments`.`patient_id` WHERE (patients.name = ‘Raman’)
Appointment Load (0.1ms)  SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`physician_id` IN (2)
Patient Load (0.1ms)  SELECT `patients`.* FROM `patients` WHERE `patients`.`id` IN (4, 1)
=> [#<Physician id: 2, name: “Anand”, specialization: “MS”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]
1.9.3-p194 :016 > @doctors.first.patients
=> [#<Patient id: 4, name: “Santosh”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>, #<Patient id: 1, name: “Raman”, mobile_no: 2147483647, created_at: “2014-07-21 11:51:18”, updated_at: “2014-07-21 11:51:18”>]

This time you got all the patients of the physician Anand.

So, we are done with our research on data loading in rails.

But wait….Rails still evolving, and it keep changing with each release. The includes has also changed in rails4. Now rails has stopped being super smart. thus it will not automatically delegate the job to eager_load on seing a where clause in the query. instead, you need to pass the reference of the table whose attributes used in the where clause.

Example :

@doctors = Physician.includes(:patients).where(“patients.mobile_no != ?”, “765778989”) # will throw error

@doctors = Physician.includes(:patients).where(“patients.mobile_no != ?”, “765778989”).references(:patients) # will work

 

Reference :
http://guides.rubyonrails.org/active_record_querying.html
http://stackoverflow.com/questions/1208636/rails-include-vs-joins

Advertisements

Author: arunyadav4u

over 7 years experience in web development with Ruby on Rails.Involved in all stage of development lifecycle : requirement gathering, planing, coding, deployment & Knowledge transfer. I can adept to any situation, mixup very easily with people & can be a great friend.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s