In the post How to set up a MongoDB Sharded Cluster we studied how to set up a MongoDB Sharded Cluster. Its goal, as we already know, is to scale and to balance the workload uniformly across all our shards.
Today, we are going to learn what to do in order to shard a collection and get all its documents well distributed among our shards.
We must execute all administrative tasks related to shards clusters connected to a mongos:
1 2 |
$mongo mongos> |
MongoDB shards at a collection level. This means that, for a given database, we can have sharded collections and non-sharded collections.
First step
Sharding must be enabled in the database the collection belongs to is necessary before trying to shard it. We are going to do it by this way:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
mongos> sh.enableSharding("shardTestDB") { "ok" : 1 } mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54eb010dc6d2e5d19ca9df05") } shards: { "_id" : "shard0000", "host" : "PSINFW95:27000" } { "_id" : "shard0001", "host" : "PSINFW95:27001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } { "_id" : "shardTestDB", "partitioned" : true, "primary" : "shard0000" } mongos> |
Collections are sharded using a field called ‘shardkey’, hence, it is very important to choose it properly. You can read about the characteristics a shard key must have at this url: Choosing a shard key
Second step
We have this command to shard a collection (as a shardkey we use the ‘username’ field):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
mongos> sh.shardCollection("shardTestDB.users", { "username" : 1 } ) { "collectionsharded" : "shardTestDB.users", "ok" : 1 } mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54eb010dc6d2e5d19ca9df05") } shards: { "_id" : "shard0000", "host" : "PSINFW95:27000" } { "_id" : "shard0001", "host" : "PSINFW95:27001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } { "_id" : "shardTestDB", "partitioned" : true, "primary" : "shard0000" } shardTestDB.users shard key: { "username" : 1 } chunks: shard0000 1 { "username" : { "$minKey" : 1 } } -->> { "username" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0) mongos> |
When we shard a collection MongoDB creates an index on the shardkey:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
mongos> db.users.getIndexes() [ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "shardTestDB.users" }, { "v" : 1, "key" : { "username" : 1 }, "name" : "username_1", "ns" : "shardTestDB.users" } ] mongos> |
If our collection has some previous data these will be distributed among the shards.
Very easy, right?
Let’s see MongoDB in action. We are going to insert documents in our collection to check that MongoDB distributes them among the shards. First of all we have to stop de balancer, afterwards we are going to do the inserts and all the documents must be located at the shard which the collection belongs to (shard 0 by default). We are going to continue running the balancer to check that MongoDB moves uniformly our documents across all the shards.
We set the size of the chunk in 1Mb to avoid inserting too much documents.
1 2 3 4 5 6 7 8 9 |
mongos> use config switched to db config mongos> db.settings.find( { "_id" : "chunksize" } ) { "_id" : "chunksize", "value" : 64 } mongos> db.settings.save( { "_id" : "chunksize", value : 1 } ) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) mongos> db.settings.find( { "_id" : "chunksize" } ) { "_id" : "chunksize", "value" : 1 } mongos> |
We stop the balancer:
1 2 3 4 5 6 7 |
mongos> sh.stopBalancer() Waiting for active hosts... Waiting for the balancer lock... Waiting again for active hosts after balancer is off... mongos> sh.getBalancerState() false mongos> |
We insert the documents:
1 2 3 4 5 6 7 |
mongos> use shardTestDB switched to db shardTestDB mongos> for (var i=0; i<100000; i++) { ... db.users.insert( { "username" : "user"+i, "created at" : new Date() } ); ... } WriteResult({ "nInserted" : 1 }) mongos> mongos> db.users.count() 100000 mongos> |
We check that all the documents have been stored at the shard the collection belongs to.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
mongos> sh.status( { verbose : 1 } ) --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54eb010dc6d2e5d19ca9df05") } shards: { "_id" : "shard0000", "host" : "PSINFW95:27000" } { "_id" : "shard0001", "host" : "PSINFW95:27001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } { "_id" : "shardTestDB", "partitioned" : true, "primary" : "shard0000" } shardTestDB.users shard key: { "username" : 1 } chunks: shard0000 21 { "username" : { "$minKey" : 1 } } -->> { "username" : "user0" } on : shard0000 Timestamp(1, 1) { "username" : "user0" } -->> { "username" : "user1421" } on : shard0000 Timestamp(1, 7) { "username" : "user1421" } -->> { "username" : "user18423" } on : shard0000 Timestamp(1, 9) { "username" : "user18423" } -->> { "username" : "user22636" } on : shard0000 Timestamp(1, 11) { "username" : "user22636" } -->> { "username" : "user3131" } on : shard0000 Timestamp(1, 12) { "username" : "user3131" } -->> { "username" : "user35523" } on : shard0000 Timestamp(1, 15) { "username" : "user35523" } -->> { "username" : "user39737" } on : shard0000 Timestamp(1, 17) { "username" : "user39737" } -->> { "username" : "user4395" } on : shard0000 Timestamp(1, 19) { "username" : "user4395" } -->> { "username" : "user48162" } on : shard0000 Timestamp(1, 21) { "username" : "user48162" } -->> { "username" : "user5314" } on : shard0000 Timestamp(1, 22) { "username" : "user5314" } -->> { "username" : "user57353" } on : shard0000 Timestamp(1, 23) { "username" : "user57353" } -->> { "username" : "user61566" } on : shard0000 Timestamp(1, 25) { "username" : "user61566" } -->> { "username" : "user6578" } on : shard0000 Timestamp(1, 27) { "username" : "user6578" } -->> { "username" : "user69993" } on : shard0000 Timestamp(1, 29) { "username" : "user69993" } -->> { "username" : "user74204" } on : shard0000 Timestamp(1, 31) { "username" : "user74204" } -->> { "username" : "user78418" } on : shard0000 Timestamp(1, 33) { "username" : "user78418" } -->> { "username" : "user82630" } on : shard0000 Timestamp(1, 35) { "username" : "user82630" } -->> { "username" : "user86844" } on : shard0000 Timestamp(1, 37) { "username" : "user86844" } -->> { "username" : "user91056" } on : shard0000 Timestamp(1, 39) { "username" : "user91056" } -->> { "username" : "user999" } on : shard0000 Timestamp(1, 40) { "username" : "user999" } -->> { "username" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 4) mongos> |
We start the balancer (automatically MongoDB moves the documents):
1 |
mongos> sh.startBalancer() |
And, finally, we can check that all the documents have been moved as we expected.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
mongos> sh.status( { verbose : 1 } ) --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54eb010dc6d2e5d19ca9df05") } shards: { "_id" : "shard0000", "host" : "PSINFW95:27000" } { "_id" : "shard0001", "host" : "PSINFW95:27001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } { "_id" : "shardTestDB", "partitioned" : true, "primary" : "shard0000" } shardTestDB.users shard key: { "username" : 1 } chunks: shard0001 2 shard0000 19 { "username" : { "$minKey" : 1 } } -->> { "username" : "user0" } on : shard0001 Timestamp(2, 0) { "username" : "user0" } -->> { "username" : "user1421" } on : shard0001 Timestamp(3, 0) { "username" : "user1421" } -->> { "username" : "user18423" } on : shard0000 Timestamp(3, 1) { "username" : "user18423" } -->> { "username" : "user22636" } on : shard0000 Timestamp(1, 11) { "username" : "user22636" } -->> { "username" : "user3131" } on : shard0000 Timestamp(1, 12) { "username" : "user3131" } -->> { "username" : "user35523" } on : shard0000 Timestamp(1, 15) { "username" : "user35523" } -->> { "username" : "user39737" } on : shard0000 Timestamp(1, 17) { "username" : "user39737" } -->> { "username" : "user4395" } on : shard0000 Timestamp(1, 19) { "username" : "user4395" } -->> { "username" : "user48162" } on : shard0000 Timestamp(1, 21) { "username" : "user48162" } -->> { "username" : "user5314" } on : shard0000 Timestamp(1, 22) { "username" : "user5314" } -->> { "username" : "user57353" } on : shard0000 Timestamp(1, 23) { "username" : "user57353" } -->> { "username" : "user61566" } on : shard0000 Timestamp(1, 25) { "username" : "user61566" } -->> { "username" : "user6578" } on : shard0000 Timestamp(1, 27) { "username" : "user6578" } -->> { "username" : "user69993" } on : shard0000 Timestamp(1, 29) { "username" : "user69993" } -->> { "username" : "user74204" } on : shard0000 Timestamp(1, 31) { "username" : "user74204" } -->> { "username" : "user78418" } on : shard0000 Timestamp(1, 33) { "username" : "user78418" } -->> { "username" : "user82630" } on : shard0000 Timestamp(1, 35) { "username" : "user82630" } -->> { "username" : "user86844" } on : shard0000 Timestamp(1, 37) { "username" : "user86844" } -->> { "username" : "user91056" } on : shard0000 Timestamp(1, 39) { "username" : "user91056" } -->> { "username" : "user999" } on : shard0000 Timestamp(1, 40) { "username" : "user999" } -->> { "username" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 4) mongos> sh.status( { verbose : 1 } ) --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54eb010dc6d2e5d19ca9df05") } shards: { "_id" : "shard0000", "host" : "PSINFW95:27000" } { "_id" : "shard0001", "host" : "PSINFW95:27001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } { "_id" : "shardTestDB", "partitioned" : true, "primary" : "shard0000" } shardTestDB.users shard key: { "username" : 1 } chunks: shard0001 9 shard0000 12 { "username" : { "$minKey" : 1 } } -->> { "username" : "user0" } on : shard0001 Timestamp(2, 0) { "username" : "user0" } -->> { "username" : "user1421" } on : shard0001 Timestamp(3, 0) { "username" : "user1421" } -->> { "username" : "user18423" } on : shard0001 Timestamp(4, 0) { "username" : "user18423" } -->> { "username" : "user22636" } on : shard0001 Timestamp(5, 0) { "username" : "user22636" } -->> { "username" : "user3131" } on : shard0001 Timestamp(6, 0) { "username" : "user3131" } -->> { "username" : "user35523" } on : shard0001 Timestamp(7, 0) { "username" : "user35523" } -->> { "username" : "user39737" } on : shard0001 Timestamp(8, 0) { "username" : "user39737" } -->> { "username" : "user4395" } on : shard0001 Timestamp(9, 0) { "username" : "user4395" } -->> { "username" : "user48162" } on : shard0001 Timestamp(10, 0) { "username" : "user48162" } -->> { "username" : "user5314" } on : shard0000 Timestamp(10, 1) { "username" : "user5314" } -->> { "username" : "user57353" } on : shard0000 Timestamp(1, 23) { "username" : "user57353" } -->> { "username" : "user61566" } on : shard0000 Timestamp(1, 25) { "username" : "user61566" } -->> { "username" : "user6578" } on : shard0000 Timestamp(1, 27) { "username" : "user6578" } -->> { "username" : "user69993" } on : shard0000 Timestamp(1, 29) { "username" : "user69993" } -->> { "username" : "user74204" } on : shard0000 Timestamp(1, 31) { "username" : "user74204" } -->> { "username" : "user78418" } on : shard0000 Timestamp(1, 33) { "username" : "user78418" } -->> { "username" : "user82630" } on : shard0000 Timestamp(1, 35) { "username" : "user82630" } -->> { "username" : "user86844" } on : shard0000 Timestamp(1, 37) { "username" : "user86844" } -->> { "username" : "user91056" } on : shard0000 Timestamp(1, 39) { "username" : "user91056" } -->> { "username" : "user999" } on : shard0000 Timestamp(1, 40) { "username" : "user999" } -->> { "username" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 4) mongos> sh.status( { verbose : 1 } ) --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54eb010dc6d2e5d19ca9df05") } shards: { "_id" : "shard0000", "host" : "PSINFW95:27000" } { "_id" : "shard0001", "host" : "PSINFW95:27001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } { "_id" : "shardTestDB", "partitioned" : true, "primary" : "shard0000" } shardTestDB.users shard key: { "username" : 1 } chunks: shard0001 10 shard0000 11 { "username" : { "$minKey" : 1 } } -->> { "username" : "user0" } on : shard0001 Timestamp(2, 0) { "username" : "user0" } -->> { "username" : "user1421" } on : shard0001 Timestamp(3, 0) { "username" : "user1421" } -->> { "username" : "user18423" } on : shard0001 Timestamp(4, 0) { "username" : "user18423" } -->> { "username" : "user22636" } on : shard0001 Timestamp(5, 0) { "username" : "user22636" } -->> { "username" : "user3131" } on : shard0001 Timestamp(6, 0) { "username" : "user3131" } -->> { "username" : "user35523" } on : shard0001 Timestamp(7, 0) { "username" : "user35523" } -->> { "username" : "user39737" } on : shard0001 Timestamp(8, 0) { "username" : "user39737" } -->> { "username" : "user4395" } on : shard0001 Timestamp(9, 0) { "username" : "user4395" } -->> { "username" : "user48162" } on : shard0001 Timestamp(10, 0) { "username" : "user48162" } -->> { "username" : "user5314" } on : shard0001 Timestamp(11, 0) { "username" : "user5314" } -->> { "username" : "user57353" } on : shard0000 Timestamp(11, 1) { "username" : "user57353" } -->> { "username" : "user61566" } on : shard0000 Timestamp(1, 25) { "username" : "user61566" } -->> { "username" : "user6578" } on : shard0000 Timestamp(1, 27) { "username" : "user6578" } -->> { "username" : "user69993" } on : shard0000 Timestamp(1, 29) { "username" : "user69993" } -->> { "username" : "user74204" } on : shard0000 Timestamp(1, 31) { "username" : "user74204" } -->> { "username" : "user78418" } on : shard0000 Timestamp(1, 33) { "username" : "user78418" } -->> { "username" : "user82630" } on : shard0000 Timestamp(1, 35) { "username" : "user82630" } -->> { "username" : "user86844" } on : shard0000 Timestamp(1, 37) { "username" : "user86844" } -->> { "username" : "user91056" } on : shard0000 Timestamp(1, 39) { "username" : "user91056" } -->> { "username" : "user999" } on : shard0000 Timestamp(1, 40) { "username" : "user999" } -->> { "username" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 4) mongos> |
If you are asking yourself how do I know the shard in which the data I need is stored?, do not worry, you only have to request it to the mongos and it will retrieve it for you.
This is the end of the post, I wish that you have understood all the steps and you can get the most of your MongoDB Sharded Cluster.