PHP Performance – Simple Ways to Combat the N+1 Problem

object-orientedperformancePHP

I'm trying to better understand performance in PHP. One issue I'm thinking about is the n+1 problem. By n+1 I mean something like this:

$posts = Posts::getPosts();

foreach($posts as $post) {
  $comments = Comments::getComments(array('post_id' => $post->id));
  // do something with comments..
}

It's quite inefficient as we have to do many queries for every post to get the comments.

Is something like this better? It's more PHP code but only two queries will be executed:

$posts = Posts::getPosts();

// get the ids into an array
$post_ids = array();
foreach($posts as $post) {
  array_push($post_ids, $post->id);
}

// get comments for ALL posts in one query by passing array of post ids (post_id IN (...))
$comments = Comments::getComments(array('post_id' => $post_ids));

// map comments to posts
foreach($posts as $key => $post) {
  foreach($comments as $comment) {
    if($post->id == $comment->post_id) {
      $post->pushComment($comment);
    }
  }
}

foreach($posts as $post) {
  $comment = $post->comments;
  // do something with comments..
}

This is much more PHP, and kinda messy too, but this time I'm only using two queries (one for posts, the other for fetching ALL comments of those posts in one query). Is this a good proposal to tackle the n+1 problem in PHP?

Also, how do frameworks generally deal with this under the hood?

Best Answer

Your original approach does lazy loading while your modified code does eager loading.

You are absolutely right that eager loading is more efficient in your situation. In most cases, minimising the number of queries is the best thing you can do to speed up your app. If you were only going to look at one of the posts then lazy loading would be faster.

Most ORMs (at least, all the good ones!) support lazy and eager loading. For example, SQLAlchemy has extensive support. You were asking about PHP; I expect the main PHP ORMs have similar features. Usually the ORM defaults to lazy loading, but you can tell it to eager load a particular table when executing a query.

In fact, it is possible load all your data in one query. SQLAlchemy has two eager loading modes: joined and subquery. joined does it all in one query, which is sometimes the most efficient way, but it does end up querying duplicate data. subquery eager load is just like your approach. I'm not sure whether PHP ORMs support this distinction.