MongoDB Aggregation Framework

MongoDBAggregationFrameworkToday we begin a set of posts for understanding how to work with the Aggregation Framework.

In my post First steps with MongoDB we could see how to make basic queries on the database. Therefore, this is the right moment to take a step forward and learn how to extract aggregated information from data.

This is the outline we are going to work with in the next posts:

  • Philosophy
  • Aggregation Framework characteristics
  • Limits
  • Syntax
  • MongoDB vs SQL
  • Resumen
  • Single purpose
  • Examples


This framework allows us to process our data for getting results, that we will read like a cursor or we will store in a new collection if we want to process later.

This framework is stage-based. We can think in a stage like a box doing a single function, in which the entry is a document or group of documents and the output is another processed document or processed group of documents. The number of entry documents can differ from the number of output documents.

We can join as many stages as necessary to obtain the result we need. The output of one stage is the input of the next.

This is the fundamental concept in which the Aggregation Framework is based. We must understand properly to do a good use of it.

Aggregation Framework characteristics

  • It processes data in RAM memory.
  • It makes use of indexes to improve the performance in every stage. Moreover, it has an internal optimization phase.
  • It is executed in C++ native code.
  • Flexible, functional and easy to learn.
  • aggregate command operates only in one collection at a time. Map-Reduce is the tool that MongoDB provides to solve this issue.
  • Aggregation Framework works with sharded collections.


Result Size

If only one document is obtained as a result of our aggregate command, its size can not be longer than 16 megabytes, the maximum established size for BSON documents. There is no limit if the output is a cursor or it is stored in a collection.

Memory restrictions

The maximum amount of memory each stage can use is 100 megabytes of RAM. To process large amounts of data that exceed this limit we must enable the

option which writes to temporary files.


This is the aggregation pipeline syntax:

All stages are inside an array.

We can use every stage more than once, except $out and $geoNear.

Licencia de Creative Commons
Este obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-CompartirIgual 4.0 Internacional.

Leave a comment

Your email address will not be published. Required fields are marked *

eighteen + six =