Indexes And MongoDB Aggregation Framework Queries

One of the most powerful tools for checking the performance of our queries is explain(). Its use is very simple, we only have to add it to them, in this way:

The explain() method has three verbosity options:

  • queryPlanner (default): It gives us information about the winning plan
  • executionStats: The same that queryPlanner plus execution times and some other statistics
  • allPlansExecution: Equal that executionStats but not only for the winning plan but also for the rejected plans

The default option is enough to know whether the winning plan uses an index or not but, for knowing the execution time we must use the ‘executionStats’ option.

Ok but, is this valid for aggregation queries? The answer is: NO. With the aggregation framework explain() will always gives us back the basic information, the winning plan.

Here you have some examples. We are going to insert 10,000 documents.

And now we execute the aggregation query with explain(), for example:

We have obtained the winning plan and we can check that the plan does not use any index, it is scanning all the documents in the collection.

What about if we create an index and execute the same query?:

The result is the same because we are asking to group all the documents in the collection.

So, what must we do for the query to be able to use the index? We have only two options:

  • Add a $sort stage at the very beginning or
  • Add a $match stage at the very beginning

Once the execution of the first stage hast ended MongoDB only knows the documents that are going to be the input for the second stage. In this moment, MongoDB does not know anything about any index. Therefore, once the execution of the first stage is index MongoDB only knows about the documents that will be the input for the second stage, but it does not know anything about indexes. Therefore, our only chance is in the first stage and only through the $match and $sort procedures.

Lets go:

In some cases, this information it is plenty enough, but sometimes we need to know the time our query is going to be running. We already know that explain() is not going to tell it to us when we use the aggregation framework. What can we do then?

The tricky is to enable the profiling and execute the query, in this manner:

Now, we only have to look for our query in the log:

The information we obtain is:

  • database
  • collection
  • query
  • planSummary
  • keysExamined
  • docsExamined
  • nreturned
  • execution time

Do not forget to disable the profiling:

I hope you have learned something with this post. As always, I will like to read your comments. Thank you!

Leave a comment

Your email address will not be published. Required fields are marked *

eleven + 6 =