Ruby-on-rails – Rails :include vs. :joins

databasejoin;rails-activerecordrubyruby-on-rails

This is more of a "why do things work this way" question rather than a "I don't know how to do this" question…

So the gospel on pulling associated records that you know you're going to use is to use :include because you'll get a join and avoid a whole bunch of extra queries:

Post.all(:include => :comments)

However when you look at the logs, there's no join happening:

Post Load (3.7ms)   SELECT * FROM "posts"
Comment Load (0.2ms)   SELECT "comments.*" FROM "comments" 
                       WHERE ("comments".post_id IN (1,2,3,4)) 
                       ORDER BY created_at asc) 

It is taking a shortcut because it pulls all of the comments at once, but it's still not a join (which is what all the documentation seems to say). The only way I can get a join is to use :joins instead of :include:

Post.all(:joins => :comments)

And the logs show:

Post Load (6.0ms)  SELECT "posts".* FROM "posts" 
                   INNER JOIN "comments" ON "posts".id = "comments".post_id

Am I missing something? I have an app with half a dozen associations and on one screen I display data from all of them. Seems like it would be better to have one join-ed query instead of 6 individuals. I know that performance-wise it's not always better to do a join rather than individual queries (in fact if you're going by time spent, it looks like the two individual queries above are faster than the join), but after all the docs I've been reading I'm surprised to see :include not working as advertised.

Maybe Rails is cognizant of the performance issue and doesn't join except in certain cases?

Best Answer

It appears that the :include functionality was changed with Rails 2.1. Rails used to do the join in all cases, but for performance reasons it was changed to use multiple queries in some circumstances. This blog post by Fabio Akita has some good information on the change (see the section entitled "Optimized Eager Loading").