How MongoDB balance your data? 3 comments

migration_steps3We have already studied the chunk concept What is a MongoDB chunk? and also how MongoDB splits it when grows beyond the maximum size established by default Four steps to split a MongoDB chunk. At this post we are going to study the steps that MongoDB follows to balance our cluster.

The first thing we must know is that we choose the data to balance in our shards. So, we decide the collections to divide. You can read at this post Two steps to shard a MongoDB collection how to split a collection.

The balancing process does not disturb the normal cluster work, it is a background process. There is only one migration at a time, per cluster, in order to do not overload it. Therefore, only two shards will be working on (its primaries). If we do not change the by default chunk size, 64MB, this is the maximum amount of data that MongoDB will migrate at a time. This is a size so big so that there will not be too much migrations, and at the same time, it is little in order to do not overload our database.

Balancing round

Any of the mongos can begin a round balancing, but only can be an active one at a time. So, the mongos must check this through the locks collection at the configdb.

Only when the value equals 0 the mongos can begin the balancing round.

Is the cluster balanced?

The mongos checks the number of chunks per shard and decides whether the collection/collections is balanced. At this post What is a MongoDB chunk? I explain how MongoDB determines this. When the collection is not balanced the balancer will move the necessary chunks to get it. When the migration has been finished the mongos will update the locks collection and the round balancer will be finished.

Does the chunk need to be split?

Ok, the collection is not well balanced. The mongos will choose a chunk to move it to another shard. How MongoDB chooses this chunk is studied at this post What is a MongoDB chunk?

Before moving the chunk the mongos asks the shard which owns it (shard FROM) if this is too big and must be split.

The balancing begins

The mongos orders to the shard FROM that begins the transfer, but before beginning this shard makes sure that it is not removing data from a chunk previously migrated.

Please, read this chunk

The shard FROM asks shard TO (shard chunk destination) to read the chosen chunk to been migrated.

The transfer begins

The chunk belongs to shard FROM until the transfer will be finished. Until then, there can be write operations on this chunk that must also be transferred to shard TO.

The transfer ends

The FROM TO (Alex, thanks for your input) shard updates the chunk migration at the config servers.

The transferred data is deleted

The shard FROM begins to delete the data that has been moved.

The cache is refreshed

The mongos refreshes its cache. The remaining mongos will look for this data at the shard FROM and will get an STALE CONFIG EXCEPTION. This will cause them to read the metadata at the config servers for refreshing its cache. The clients will not realize it.

I wish that this post helps you to understand how MongoDB chooses the chunks to move between shards and the steps it follows for balancing the cluster. Please, if you read something wrong or something is omitted do not hesitate to use the comments. We will learn each other.

Leave a comment

Your email address will not be published. Required fields are marked *

twenty − twelve =

3 thoughts on “How MongoDB balance your data?

  • Alex Fraseniuc

    The Transfer Ends part does not seem to coincide with mongoDB documentation for 3.2. The shard TO is the one that updates the config database.
    If we check the chunk migration procedure, in step 6 it states: “When fully synchronized, the destination shard connects to the config database and updates the cluster metadata with the new location for the chunk. ”

    Also I think it would be interesting to take into account open cursors, especially for heavy load servers. The problem is that if the chunk you wish to migrate has too many opened cursors on the associated data, the moveChunk procedure will fail(inability to acquire lock on metadata).

    • juanroy Post author

      Hello Alex.
      Thank you very much for your input. I agree with you, it is the TO shard who updates the metadata.
      Please, let me know if you realize any other mistake.

      • Asya

        The open cursors are an issue are only for deleting (cleaning up) the moved data from the “FROM” shard. The delete process will wait till all the cursors are done – so if you open cursors with “no timeout” (which is a bad idea) then it could block migration deletions from completing, which means new migrations from that shard cannot start.