Mongodb Explain for Aggregation framework

aggregation-frameworkmongodb

Is there an explain function for the Aggregation framework in MongoDB? I can't see it in the documentation.

If not is there some other way to check, how a query performs within the aggregation framework?

I know with find you just do

db.collection.find().explain()

But with the aggregation framework I get an error

db.collection.aggregate(
    { $project : { "Tags._id" : 1 }},
    { $unwind : "$Tags" },
    { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
    { 
        $group: 
        { 
            _id : { id: "$_id"},
            "count": { $sum:1 } 
        }
    },
    { $sort: {"count":-1}}
).explain()

Best Answer

Starting with MongoDB version 3.0, simply changing the order from

collection.aggregate(...).explain()

collection.explain().aggregate(...)

will give you the desired results (documentation here).

For older versions >= 2.6, you will need to use the explain option for aggregation pipeline operations

`explain:true`

db.collection.aggregate([
    { $project : { "Tags._id" : 1 }},
    { $unwind : "$Tags" },
    { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
    { $group: { 
        _id : "$_id",
        count: { $sum:1 } 
    }},
    {$sort: {"count":-1}}
  ],
  {
    explain:true
  }
)

An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match, $sort, $geonear at the beginning of a pipeline) as well as subsequent $lookup and $graphLookup stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project, $unwind, and $group) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse option is set).

Optimizing pipelines

In general, you can optimize aggregation pipelines by:

Starting a pipeline with a $match stage to restrict processing to relevant documents.
Ensuring the initial $match / $sort stages are supported by an efficient index.
Filtering data early using $match, $limit , and $skip .
Minimizing unnecessary stages and document manipulation (perhaps reconsidering your schema if complicated aggregation gymnastics are required).
Taking advantage of newer aggregation operators if you have upgraded your MongoDB server. For example, MongoDB 3.4 added many new aggregation stages and expressions including support for working with arrays, strings, and facets.

There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.

Limitations

As at MongoDB 3.4, the Aggregation Framework explain option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats mode for a find() query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain() query with executionStats or allPlansExecution verbosity.

There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:

Related Solutions

Sql – How to query MongoDB with “like”

That would have to be:

db.users.find({"name": /.*m.*/})

Or, similar:

db.users.find({"name": /m/})

You're looking for something that contains "m" somewhere (SQL's '%' operator is equivalent to regular expressions' '.*'), not something that has "m" anchored to the beginning of the string.

Note: MongoDB uses regular expressions which are more powerful than "LIKE" in SQL. With regular expressions you can create any pattern that you imagine.

For more information on regular expressions, refer to Regular expressions (MDN).

Mongodb – $unwind an object in aggregation framework

It is not possible to do the type of computation you are describing with the aggregation framework - and it's not because there is no $unwind method for non-arrays. Even if the person:value objects were documents in an array, $unwind would not help.

The "group by" functionality (whether in MongoDB or in any relational database) is done on the value of a field or column. We group by value of field and sum/average/etc based on the value of another field.

Simple example is a variant of what you suggest, ratings field added to the example article collection, but not as a map from user to rating but as an array like this:

{ title : title of article", ...
  ratings: [
         { voter: "user1", score: 5 },
         { voter: "user2", score: 8 },
         { voter: "user3", score: 7 }
  ]
}

Now you can aggregate this with:

[ {$unwind: "$ratings"},
  {$group : {_id : "$ratings.voter", averageScore: {$avg:"$ratings.score"} } } 
]

But this example structured as you describe it would look like this:

{ title : title of article", ...
  ratings: {
         user1: 5,
         user2: 8,
         user3: 7
  }
}

or even this:

{ title : title of article", ...
  ratings: [
         { user1: 5 },
         { user2: 8 },
         { user3: 7 }
  ]
}

Even if you could $unwind this, there is nothing to aggregate on here. Unless you know the complete list of all possible keys (users) you cannot do much with this. [*]

An analogous relational DB schema to what you have would be:

CREATE TABLE T (
   user1: integer,
   user2: integer,
   user3: integer
   ...
);

That's not what would be done, instead we would do this:

CREATE TABLE T (
   username: varchar(32),
   score: integer
);

and now we aggregate using SQL:

select username, avg(score) from T group by username;

There is an enhancement request for MongoDB that may allow you to do this in the aggregation framework in the future - the ability to project values to keys to vice versa. Meanwhile, there is always map/reduce.

[*] There is a complicated way to do this if you know all unique keys (you can find all unique keys with a method similar to this) but if you know all the keys you may as well just run a sequence of queries of the form db.articles.find({"ratings.user1":{$exists:true}},{_id:0,"ratings.user1":1}) for each userX which will return all their ratings and you can sum and average them simply enough rather than do a very complex projection the aggregation framework would require.