共计 2599 个字符,预计需要花费 7 分钟才能阅读完成。
最近在用python倒腾es,跟着项目开发一起学习,虽然他用的是java,而我用的是python。。。。今天和他吃饭时提到了批量插入数据,遂有感而发
你们数据写入es时是如何操作的?一条一条操作吗?如果有很多数据时,逐条操作效率低下,以下是博主测试
安装faker利器
pip3 install faker
模拟数据插入
import time
from faker import Faker
from elasticsearch import Elasticsearch
es = Elasticsearch(
hosts=["192.168.44.142:9200", "192.168.44.143:9200", "192.168.44.144:9200", ],
http_auth=('elastic', 'elastic'),
)
print(es.ping())
# 创建中文实例
faker = Faker(locale="zh-CN")
# 创建计时装饰器
def timer(func):
def wrapper(*args, **kwargs):
print("开始计时......")
start = time.time()
res = func(*args, **kwargs)
print('共耗时约%.3f秒' % (time.time() - start))
return res
return wrapper
@timer
def es_write():
for i in range(10000):
mydoc = {
"name": faker.name(),
"phone": faker.phone_number(),
"address": faker.address(),
"car_license": faker.license_plate(),
"company": faker.company(),
"job": faker.job(),
"message": faker.paragraph(nb_sentences=10, variable_nb_sentences=True, ext_word_list=None)
}
res = es.index(index='test-index-write', body=mydoc)
if __name__ == '__main__':
es_write()
测试运行结果
/home/xadocker/PycharmProjects/untitled2/venv/bin/python /home/xadocker/PycharmProjects/untitled2/test_buil.py
True
开始计时......
共耗时约85.466秒
使用bulk批量插入
import time
from faker import Faker
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch(
hosts=["192.168.44.142:9200", "192.168.44.143:9200", "192.168.44.144:9200", ],
http_auth=('elastic', 'elastic'),
)
print(es.ping())
faker = Faker(locale="zh-CN")
def timer(func):
def wrapper(*args, **kwargs):
print("开始计时......")
start = time.time()
res = func(*args, **kwargs)
print('共耗时约%.3f秒' % (time.time() - start))
return res
return wrapper
@timer
def es_write2():
data = [
{
"_index": "test-index-write",
"_source": {
"name": faker.name(),
"phone": faker.phone_number(),
"address": faker.address(),
"car_license": faker.license_plate(),
"company": faker.company(),
"job": faker.job(),
"message": faker.paragraph(nb_sentences=10, variable_nb_sentences=True, ext_word_list=None)
}
} for i in range(10000)
]
helpers.bulk(es, data)
if __name__ == '__main__':
es_write2()
此时测试时间为
/home/xadocker/PycharmProjects/untitled2/venv/bin/python /home/xadocker/PycharmProjects/untitled2/test_buil.py
True
开始计时......
共耗时约7.766秒
不过考虑到每次测试的数据都是临时生成的,且都包含在计时装饰器中,会有偏差。单独看下生成10000数据的时间
@timer
def data_genera():
data = [
{
"_index": "test-index-write",
"_source": {
"name": faker.name(),
"phone": faker.phone_number(),
"address": faker.address(),
"car_license": faker.license_plate(),
"company": faker.company(),
"job": faker.job(),
"message": faker.paragraph(nb_sentences=10, variable_nb_sentences=True, ext_word_list=None)
}
} for i in range(10000)
]
试了几次,预计生成数据的时间在4.2s,所以我们的数据批量插入时间应该是3.6s,读者也可以将数据生成逻辑提取到计时器之外,略
/home/xadocker/PycharmProjects/untitled2/venv/bin/python /home/xadocker/PycharmProjects/untitled2/test_buil.py
True
开始计时......
共耗时约4.193秒
正文完