共计 10454 个字符,预计需要花费 27 分钟才能阅读完成。
前面文章弄了个商品中心服务,就近想把对应的商品评价信息也捞点下来,并使用mongodb分片来存储,所以记录下mongodb分片的使用。此处只涉及范围分片
使用分片无非就是数据量过大,为了提高查询写入的速度和带宽,所以将数据均衡切分到不同节点上处理,这便是分片集群的好处。然而,不是拥有分片集群就能提高你的业务查询和读写带宽,因为还要考虑你是否会使用分片健的配置,配置的不好,则是大材小用。。。。注意,如何在使用分片集群的时候,需要给集合设置数据分片,如果没有设置,那么数据会被集中在一个shard节点内!
范围分片
mongodb按照片健的值范围将数据拆分到不同的chunk里,每个chunk包含了一段范围内的数据。这种方式适用于存在一个相对固定的范围的变化,该片键的值不是:单调递增或递减,范围查询业务。
- 优点:mongos可以快速定位请求所需的数据,请求到对应的shard节点处理
- 缺点:可能导致数据在shard节点分布不均衡,容易造成分片数据倾斜
使用python的faker造点数据
[root@mongodb-server ~]# cat comment.py
from faker import Faker
from pymongo import MongoClient
# 创建 Faker 实例
fake = Faker()
# 创建 MongoDB 客户端
client = MongoClient('mongodb://127.0.0.1:38017/')
# 获取数据库和集合
# 生成随机数据
db = client['mydatabase']
collection = db['productReviews']
for i in range(10000000):
review = {
'reviewId': i + 1,
'spu': fake.random_int(min=100000, max=999999),
'sku': fake.random_int(min=1000000000, max=9999999999),
'userName': fake.name(),
'rating': fake.random_int(min=1, max=5),
'title': fake.sentence(nb_words=6),
'content': fake.paragraph(nb_sentences=3),
'createDate': fake.date_time_between(start_date='-30d', end_date='now'),
'updateTime': fake.date_time_between(start_date='-30d', end_date='now')
}
collection.insert_one(review)
print(review)
数据库开启分片
mongos> use mydatabase
switched to db mydatabase
mongos> sh.enableSharding('mydatabase')
{
"ok" : 1,
"operationTime" : Timestamp(1678692768, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1678692768, 2),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
# 查看此时db状态,所有数据都在shard03
mongos> db.stats()
{
"raw" : {
"shard03/mongodb-server:38025,mongodb-server:38026,mongodb-server:38027" : {
"db" : "mydatabase",
"collections" : 1,
"views" : 0,
"objects" : 1443422,
"avgObjSize" : 291.4877062979503,
"dataSize" : 420739768,
"storageSize" : 263249920,
"indexes" : 1,
"indexSize" : 24645632,
"totalSize" : 287895552,
"scaleFactor" : 1,
"fsUsedSize" : 9247023104,
"fsTotalSize" : 39700664320,
"ok" : 1
},
"shard01/mongodb-server:38019,mongodb-server:38020,mongodb-server:38021" : {
"db" : "mydatabase",
"collections" : 0,
"views" : 0,
"objects" : 0,
"avgObjSize" : 0,
"dataSize" : 0,
"storageSize" : 0,
"totalSize" : 0,
"indexes" : 0,
"indexSize" : 0,
"scaleFactor" : 1,
"fileSize" : 0,
"fsUsedSize" : 0,
"fsTotalSize" : 0,
"ok" : 1
},
"shard02/mongodb-server:38022,mongodb-server:38023,mongodb-server:38024" : {
"db" : "mydatabase",
"collections" : 0,
"views" : 0,
"objects" : 0,
"avgObjSize" : 0,
"dataSize" : 0,
"storageSize" : 0,
"totalSize" : 0,
"indexes" : 0,
"indexSize" : 0,
"scaleFactor" : 1,
"fileSize" : 0,
"fsUsedSize" : 0,
"fsTotalSize" : 0,
"ok" : 1
}
},
"objects" : 1443422,
"avgObjSize" : 291,
"dataSize" : 420739768,
"storageSize" : 263249920,
"totalSize" : 287895552,
"indexes" : 1,
"indexSize" : 24645632,
"scaleFactor" : 1,
"fileSize" : 0,
"ok" : 1,
"operationTime" : Timestamp(1678692775, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1678692777, 3),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
字段sku
建立索引
mongos> db.productReviews.findOne()
{
"_id" : ObjectId("640eba227572fb6e8d819d9c"),
"reviewId" : 1,
"spu" : 196170,
"sku" : NumberLong("6291367729"),
"userName" : "Jeffrey Durham",
"rating" : 5,
"title" : "For well exactly sound perform hotel sell.",
"content" : "Answer candidate hit. Determine interesting society. Include science evidence begin data wish vote.",
"createDate" : ISODate("2023-02-12T07:42:50Z"),
"updateTime" : ISODate("2023-02-15T18:24:33Z")
}
mongos> db.productReviews.createIndex({"sku": 1})
{
"raw" : {
"shard03/mongodb-server:38025,mongodb-server:38026,mongodb-server:38027" : {
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"commitQuorum" : "votingMembers",
"ok" : 1
}
},
"ok" : 1,
"operationTime" : Timestamp(1678692925, 5),
"$clusterTime" : {
"clusterTime" : Timestamp(1678692925, 5),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
字段sku
创建分片索引
注意:分片键一旦设置就不在支持修改调整
mongos> use config
switched to db config
mongos> sh.shardCollection('mydatabase.productReviews', {"sku": 1})
{
"collectionsharded" : "mydatabase.productReviews",
"collectionUUID" : UUID("b75a543a-4fd7-4295-bf6c-dbff2dfa6ac4"),
"ok" : 1,
"operationTime" : Timestamp(1678693048, 13),
"$clusterTime" : {
"clusterTime" : Timestamp(1678693048, 13),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
此时再查看我们的shard状态
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("640eb4e77a504d88d33c6581")
}
shards:
{ "_id" : "shard01", "host" : "shard01/mongodb-server:38019,mongodb-server:38020,mongodb-server:38021", "state" : 1 }
{ "_id" : "shard02", "host" : "shard02/mongodb-server:38022,mongodb-server:38023,mongodb-server:38024", "state" : 1 }
{ "_id" : "shard03", "host" : "shard03/mongodb-server:38025,mongodb-server:38026,mongodb-server:38027", "state" : 1 }
active mongoses:
"4.4.19" : 2
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
686 : Success
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
config.system.sessions
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
shard01 342
shard02 341
shard03 341
too many chunks to print, use verbose if you want to force print
{ "_id" : "mydatabase", "primary" : "shard03", "partitioned" : true, "version" : { "uuid" : UUID("d257af67-e4ae-4368-b313-0e6eea87a8f2"), "lastMod" : 1 } }
mydatabase.productReviews
shard key: { "sku" : 1 }
unique: false
balancing: true
chunks:
shard01 2
shard02 2
shard03 3
{ "sku" : { "$minKey" : 1 } } -->> { "sku" : NumberLong("2439394823") } on : shard01 Timestamp(2, 0)
{ "sku" : NumberLong("2439394823") } -->> { "sku" : NumberLong("3880973244") } on : shard02 Timestamp(3, 0)
{ "sku" : NumberLong("3880973244") } -->> { "sku" : NumberLong("5313913946") } on : shard02 Timestamp(4, 0)
{ "sku" : NumberLong("5313913946") } -->> { "sku" : NumberLong("6483173253") } on : shard01 Timestamp(5, 0)
{ "sku" : NumberLong("6483173253") } -->> { "sku" : NumberLong("7656267505") } on : shard03 Timestamp(5, 1)
{ "sku" : NumberLong("7656267505") } -->> { "sku" : NumberLong("8825674039") } on : shard03 Timestamp(1, 5)
{ "sku" : NumberLong("8825674039") } -->> { "sku" : { "$maxKey" : 1 } } on : shard03 Timestamp(1, 6)
查看下此时该db的状态
mongos> db.stats()
{
"raw" : {
"shard01/mongodb-server:38019,mongodb-server:38020,mongodb-server:38021" : {
"db" : "mydatabase",
"collections" : 1,
"views" : 0,
"objects" : 418507,
"avgObjSize" : 290.25706380060547,
"dataSize" : 121474613,
"storageSize" : 78262272,
"indexes" : 2,
"indexSize" : 25698304,
"totalSize" : 103960576,
"scaleFactor" : 1,
"fsUsedSize" : 11241979904,
"fsTotalSize" : 39700664320,
"ok" : 1
},
"shard02/mongodb-server:38022,mongodb-server:38023,mongodb-server:38024" : {
"db" : "mydatabase",
"collections" : 1,
"views" : 0,
"objects" : 461228,
"avgObjSize" : 292.03945987667703,
"dataSize" : 134696776,
"storageSize" : 86073344,
"indexes" : 2,
"indexSize" : 22794240,
"totalSize" : 108867584,
"scaleFactor" : 1,
"fsUsedSize" : 11241979904,
"fsTotalSize" : 39700664320,
"ok" : 1
},
"shard03/mongodb-server:38025,mongodb-server:38026,mongodb-server:38027" : {
"db" : "mydatabase",
"collections" : 2,
"views" : 0,
"objects" : 1443422,
"avgObjSize" : 291.4877062979503,
"dataSize" : 420739768,
"storageSize" : 263254016,
"indexes" : 4,
"indexSize" : 42885120,
"totalSize" : 306139136,
"scaleFactor" : 1,
"fsUsedSize" : 11241979904,
"fsTotalSize" : 39700664320,
"ok" : 1
}
},
"objects" : 2323157,
"avgObjSize" : 291.0183892005577,
"dataSize" : 676911157,
"storageSize" : 427589632,
"totalSize" : 518967296,
"indexes" : 8,
"indexSize" : 91377664,
"scaleFactor" : 1,
"fileSize" : 0,
"ok" : 1,
"operationTime" : Timestamp(1678693218, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1678693224, 3),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
开启balancer负载均衡
从上面可以看到我们的数据并不是很均衡的分配在各个节点上,我们可以开启balance自动均衡功能。当此命令被执行并将参数设置为 true 时,MongoDB 的 balancer 进程将会自动将集合和数据库中的数据均匀地分配到不同的 shard 节点上,以实现负载均衡和最大化整个集群的性能
mongos> sh.enableBalancing(true)
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
过一段时间后,数据相对之前分布叫均匀了许多
mongos> db.stats()
{
"raw" : {
"shard02/mongodb-server:38022,mongodb-server:38023,mongodb-server:38024" : {
"db" : "mydatabase",
"collections" : 1,
"views" : 0,
"objects" : 461228,
"avgObjSize" : 292.03945987667703,
"dataSize" : 134696776,
"storageSize" : 86073344,
"indexes" : 2,
"indexSize" : 22794240,
"totalSize" : 108867584,
"scaleFactor" : 1,
"fsUsedSize" : 11576156160,
"fsTotalSize" : 39700664320,
"ok" : 1
},
"shard01/mongodb-server:38019,mongodb-server:38020,mongodb-server:38021" : {
"db" : "mydatabase",
"collections" : 1,
"views" : 0,
"objects" : 418507,
"avgObjSize" : 290.25706380060547,
"dataSize" : 121474613,
"storageSize" : 78262272,
"indexes" : 2,
"indexSize" : 25698304,
"totalSize" : 103960576,
"scaleFactor" : 1,
"fsUsedSize" : 11576156160,
"fsTotalSize" : 39700664320,
"ok" : 1
},
"shard03/mongodb-server:38025,mongodb-server:38026,mongodb-server:38027" : {
"db" : "mydatabase",
"collections" : 2,
"views" : 0,
"objects" : 563687,
"avgObjSize" : 291.94992788551093,
"dataSize" : 164568379,
"storageSize" : 543461376,
"indexes" : 4,
"indexSize" : 85598208,
"totalSize" : 629059584,
"scaleFactor" : 1,
"fsUsedSize" : 11576156160,
"fsTotalSize" : 39700664320,
"ok" : 1
}
},
"objects" : 1443422,
"avgObjSize" : 291.0295970270648,
"dataSize" : 420739768,
"storageSize" : 707796992,
"totalSize" : 841887744,
"indexes" : 8,
"indexSize" : 134090752,
"scaleFactor" : 1,
"fileSize" : 0,
"ok" : 1,
"operationTime" : Timestamp(1678694656, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1678694656, 2),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
关闭自动balancer
注意:均衡器需要在数据的修改操作(如插入、更新、删除)之间运行,以确保数据始终分布在整个集群中。因此,在执行这些操作时,可能会出现一些性能开销。所以我们不会将它设置为自动开启
mongos> sh.enableBalancing(false)
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
如果发现集群中存在业务大量读写,此时又开启了自动balancer,便会增加集群负载,所有一般将自动balance设置为false。同时也可以手动关闭正在运行的balancer
sh.stopBalancer()
定时开启balancer
use config
db.settings.update(
{ _id: "balancer" },
{ $set: { activeWindow : { start : "<start-time>", stop : "<stop-time>" } } },
{ upsert: true }
)
- <start-time>:开始时间,时间格式为HH:MM(实例所在地域的当地时间),HH取值范围为00 – 23,MM取值范围为00 – 59。
- <stop-time>:结束时间,时间格式为HH:MM(实例所在地域的当地时间),HH取值范围为00 – 23,MM取值范围为00 – 59。
mongos> use config
switched to db config
mongos> db.settings.update(
... { _id: "balancer" },
... { $set: { activeWindow : { start : "03:00", stop : "06:30" } } },
... { upsert: true }
... )
WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "balancer" })
mongos>
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("640eb4e77a504d88d33c6581")
}
shards:
{ "_id" : "shard01", "host" : "shard01/mongodb-server:38019,mongodb-server:38020,mongodb-server:38021", "state" : 1 }
{ "_id" : "shard02", "host" : "shard02/mongodb-server:38022,mongodb-server:38023,mongodb-server:38024", "state" : 1 }
{ "_id" : "shard03", "host" : "shard03/mongodb-server:38025,mongodb-server:38026,mongodb-server:38027", "state" : 1 }
active mongoses:
"4.4.19" : 2
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Balancer active window is set between 03:00 and 06:30 server local time
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
686 : Success
关闭该定时窗口
db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true } })