What is a MongoDB chunk?


We already know that MongoDB is capable of keeping our cluster balanced for one or more collections (Two steps to shard a MongodB collection). This is done chunk-based (per shard), it is not document-based or size-based.

What is a chunk?

But, what is a chunk? To get our data evenly distributed across the shards we need to choose one of our fields to make this division, it is called the shard key.

This field will accept values from a minimum to a maximum value. This range will be split, and each shard will store the data belonging to one or more ranges. One range will belong to only one shard.

Chunk is an abstract concept, chunk is not data, it refers to each of the ranges in which we divide the data.

This information (metadata) is stored in the configdb, which is the database that the mongos manage. Beyond this, here we have the shards data distribution. In a simple way the chunks collection has the following information:

Chunk _id Minimum value Maximum value Shard
“_id” : “testdb.presplit-x_7000.0” “min” : { “x” : 7000 } “max” : { “x” : 8000 } “shard” : “shard0001”
“_id” : “testdb.presplit-x_8000.0” “min” : { “x” : 8000 } “min” : { “x” : 9000 } “shard” : “shard0001”
“_id” : “testdb.presplit-x_9000.0” “min” : { “x” : 9000 } “min” : { “x” : 10000 } “shard” : “shard0002”
“_id” : “testdb.presplit-x_10000.0” “min” : { “x” : 10000 } “min” : { “x” : 11000 } “shard” : “shard0002”

Is our cluster balanced?

This table shows when MongoDB considers that the cluster is balanced and when not:

Chunks Diff
Fewer than 20 2
20 – 79 4
80 and greater 8

For example, suppose that our cluster is made of 2 shards and we have a collection divided in 31 chunks. If shard0000 has 17 chunks and shard0001 has 14 chunks, MongoDB will consider the system balanced due to the chunks difference is less than 4.

How MongoDB works?

When we create an empty collection MongoDB creates one chunk, assigned to the shard where the database belongs to. This chunk assumes the whole range of values for our shard key.

In this example we can see:

  • The shard key choosen is the x field.
  • We have only one chunk.
  • Data the chunk refers belong to shard0000.
  • The range for this chunk goes from the minimum possible value for the shard key to the maximum value.

When we insert data at the collection the chunk will grow until its maximum value (64MB by default). At this moment MongoDB does the split, resulting two smaller chunks.

MongoDB checks if the system is balanced. If not, a chunk will be moved from the shard that contains more chunks to the shard that contains less chunks (chunk migration).

Surely, you are wondering What is the chunk chosen for migration? That that refers to the lowest values for the shard key.

In the next post I will talk about the steps to split a chunk.

Leave a comment

Your email address will not be published. Required fields are marked *

nine − 9 =