elasticsearch查询之大数据集分页性能测试

一、测试环境

python 3.7
elasticsearch 6.8
elasticsearch-dsl 7

安装elasticsearch-dsl

pip install elasticsearch-dsl

测试elasticsearch连通性

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search


client = Elasticsearch(hosts=['http://127.0.0.1:9200'])
s = Search(using=client, index="my_store_index") .query("match_phrase_prefix", name="us")
s = s.source(['id'])
s = s.params(http_auth=["test", "test"])
response = s.execute()

for hit in response:
    print(hit.meta.score, hit.name)

11.642133 945d0426-033e-4a8a-86db-b776c6c9a082
11.642133 3c1aead4-aa6f-4256-a126-f29f84c9ac89
11.642133 77782add-ab58-4eb6-85af-bcbe79be9623
11.642133 75a02b9a-be31-4a78-a3d9-9af72f98cbf9
11.642133 d5aacf16-61fc-4f0c-b05d-3d57c8ab6236
11.642133 30912e1d-4662-4f24-bd5b-5a997e44c290
11.642133 95c28501-66a6-4786-917b-0f1e38707648
11.642133 605f4e11-08c8-4d60-b803-7925cf325cea
11.642133 5dd93a29-e75c-44e3-9f26-bd90e588bc1d
11.642133 84e97af5-4e99-466f-bd82-10cd2b79aa18

二、from + size一次性返回大量数据性能测试

通过以下code,直接使用from + size返回100000记录,耗时17279ms;

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

def from_size_query(client):
    s = Search(using=client, index="my_store_index")
    s = s.params(http_auth=["test", "test"], request_timeout=50);
    q = Q('bool',
        must_not=[Q('match_phrase_prefix', name='us')]
    )
    s = s.query(q)
    
    s = s.source(['id'])
    s = s[0:100000]
    response = s.execute()
    
    print(f'hit total {response.hits.total}')
    print(f'request time {response.took}ms')

client = Elasticsearch(hosts=['http://127.0.0.1:9200'])
from_size_query(client)

hit total 485070
request time 17279ms

三、使用search after分页返回大量数据性能测试

通过以下code,使用search_after分多次共返回100000记录;从执行结果可以看到当每页获取记录达到5000时,执行的时间基本变化不大;考虑到size增大对cpu和内存的影响,在测试数据情况下,size设置为3000或者4000比较合适;

def search_after_query(client, result):
    s = Search(using=client, index="my_store_index")
    s = s.params(http_auth=["test", "test"], request_timeout=50);
    q = Q('bool',
          must_not=[Q('match_phrase_prefix', name='us')]
          )
    s = s.query(q)
    if result['after_value']:
        s = s.extra(search_after= [result['after_value']])


    s = s.source(['id'])
    s = s[:result['size']]
    s = s.sort('id')
    response = s.execute()

    fetch = len(response.hits)
    result['total'] += response.took
    result['times'] -= 1


    while fetch == result['size'] and  result['times'] > 0:
        sort_val = response.hits.hits[-1].sort[-1]
        s = s.extra(search_after=[sort_val])
        response = s.execute()

        fetch = len(response.hits)
        result['total'] += response.took
        result['times'] -= 1




client = Elasticsearch(hosts=['http://127.0.0.1:9200'])
times = 100
result = {"total": 0, "times":times, "size": 1000, "after_value":None}
search_after_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 50
result = {"total": 0, "times":times, "size": 2000, "after_value":None}
search_after_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 25
result = {"total": 0, "times":times, "size": 4000, "after_value":None}
search_after_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 20
result = {"total": 0, "times":times, "size": 5000, "after_value":None}
search_after_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 10
result = {"total": 0, "times":times, "size": 10000, "after_value":None}
search_after_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 5
result = {"total": 0, "times":times, "size": 20000, "after_value":None}
search_after_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 2
result = {"total": 0, "times":times, "size": 50000, "after_value":None}
search_after_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')



size 1000  request  100 times total 14111ms 
size 2000  request  50 times total 11987ms 
size 4000  request  25 times total 11167ms 
size 5000  request  20 times total 10589ms 
size 10000  request  10 times total 9930ms 
size 20000  request  5 times total 9978ms  
size 50000  request  2 times total 9946ms 

四、使用scroll分页返回大量数据性能测试

通过以下code,使用search_after分多次共取回100000记录;从执行结果通过不同的size获取数据,执行的时间变化不大,所以elasticsearch官方也不建议使用scroll;

def search_scroll_query(client, result):
    s = Search(using=client, index="my_store_index")
    s = s.params( request_timeout=50, scroll='1m');
    q = Q('bool',
          must_not=[Q('match_phrase_prefix', name='us')]
          )
    s = s.query(q)

    s = s.source(['id'])
    s = s[:result['size']]
    response = s.execute()

    fetch = len(response.hits)
    result['total'] += response.took
    result['times'] -= 1
    scroll_id = response._scroll_id


    while fetch == result['size']  and  result['times'] > 0:
        response = client.scroll(scroll_id=scroll_id, scroll='1m', request_timeout=50)
        scroll_id = response['_scroll_id']
        fetch = len(response['hits']['hits'])
        result['total'] += response['took']
        result['times'] -= 1

client = Elasticsearch(hosts=['http://127.0.0.1:9200'], http_auth=["test", "test"])

times = 100
result = {"total": 0, "times":times, "size": 1000}
search_scroll_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 50
result = {"total": 0, "times":times, "size": 2000}
search_scroll_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 25
result = {"total": 0, "times":times, "size": 4000}
search_scroll_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 20
result = {"total": 0, "times":times, "size": 5000}
search_scroll_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 10
result = {"total": 0, "times":times, "size": 10000}
search_scroll_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 5
result = {"total": 0, "times":times, "size": 20000}
search_scroll_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')

times = 2
result = {"total": 0, "times":times, "size": 50000}
search_scroll_query(client, result)
print(f'size {result["size"]}  request  {times} times total {result["total"]}ms ')


size 1000  request  100 times total 16573ms 
size 2000  request  50 times total 17678ms 
size 4000  request  25 times total 16719ms 
size 5000  request  20 times total 16031ms 
size 10000  request  10 times total 16008ms 
size 20000  request  5 times total 16074ms 
size 50000  request  2 times total 14390ms 

推荐这些文章:

elastaicsearch练习--高级查询-page-分页查询

请求resful: GET
请求:http://119.91.127.xxx:9200/myuser/_search
请求方式: json
{ "query": { "match_all": {} }, "sort": [ { "age": { ...

elasticsearch练习--高级查询-bool-组合查询

请求resful: GET
请求:http://119.91.127.xxx:9200/myuser/_search
请求方式: json
{ "query": { "bool": { "must": [ { "mat...

elasticsearch 基本查询

GET _search{ "query": { "match_all": {} }}
PUT chuyuan
GET chuyuan/_settings
DELETE chuyuan
 
PUT book?include_type_name=true{ "settings": { "number_of_sh...

Elasticsearch按照某个字段去重查询

索引较多:
da-1_t_d_order
da-2_t_d_order
da-32_t_d_order
根据waybill_no去重。
说明:1、collapse:去重得到去重后的记录,配合"from": 0, "size": 1分页得到结果
2、cardinality:去重得到统计结果

GET /da-*_t_d_orde...

elasticsearch 按分类ID查询

GET kejian/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"typeid": {
"value": "1" //此处指定typeid,将只在此分类下检索
...

中文分词后如何查询数据?

问题
分词后如何根据分词查询数据库,匹配越多显示越靠前,谢谢!

最佳回答
http://home.cnblogs.com/group/topic/4683.html
 
可以参考下
 

//ProductAnalyzer.GenerateAna...

.NetCore的分页查询显示

一   首先显示需要搭建好显示页面,显示的字段以及格式,因为用的是Element ui,所以在Element ui吧table 表单复制好,然后改成我们需要的样式
  然后创建方法,调用后台api,然后写Axios方法,吧后台的接口的地址复制过来,然后吧值赋给前台。
二    路由守卫:

 
三...

利用DMV找出查询最慢的语句

1 SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
2
3 SELECT TOP 20
4 CAST(qs.total_elapsed_time/1000000.0 AS DECIMAL(28,2)) AS [Total Elaspsed Duration(s)],
5 ...

MongoDB使用mongotemplate进行模糊查询

Pattern实现
//完全匹配
Pattern pattern = Pattern.compile("^" + "张" + "$");
//右匹配
Pattern pattern = Pattern.compile("^.*" + "张" + "$");
//左匹配
Pattern pattern = Pattern.compile(...

elasticsearch查询之大数据集分页查询

一、 要解决的问题

search命中的记录特别多,使用from+size分页,直接触发了elasticsearch的max_result_window的最大值;

{
"error": {
"root_cause": [
{
"type": "query_phase_execution_exce...

文章标题:elasticsearch查询之大数据集分页性能测试
文章链接:https://www.dianjilingqu.com/51078.html
本文章来源于网络,版权归原作者所有,如果本站文章侵犯了您的权益,请联系我们删除,联系邮箱:saisai#email.cn,感谢支持理解。
THE END
< <上一篇
下一篇>>