Elasticsearch 7.x 深入【10】Aggregation

1. 借鑒

極客時(shí)間 阮一鳴老師的Elasticsearch核心技術(shù)與實(shí)戰(zhàn)
Elasticsearch--Aggregation詳細(xì)總結(jié)(聚合統(tǒng)計(jì))
Elasticsearch聚合——Bucket Aggregations
Elasticsearch聚合——Metrics Aggregations
Elasticsearch聚合——Pipeline Aggregations
官網(wǎng) search-aggregations
地理距離過(guò)濾器
Elasticsearch:aggregation介紹
ES aggregation詳解
aggregation 詳解1(概述)
aggregation 詳解2(metrics aggregations)
aggregation 詳解3(bucket aggregation)
aggregation 詳解4(pipeline aggregations)
[Elasticsearch] 過(guò)濾查詢(xún)以及聚合(Filtering Queries and Aggregations)
官網(wǎng) search-aggregations-bucket
官網(wǎng) search-aggregations-metrics
官網(wǎng) search-aggregations-pipeline
官網(wǎng) search-aggregations-matrix
Using a bucket script aggregation inside filter aggreagtion
問(wèn)題:nested查詢(xún),內(nèi)部需要聚合,再刷選,怎么弄?

2. 開(kāi)始

數(shù)據(jù)準(zhǔn)備:<Elasticsearch 7.x 深入 數(shù)據(jù)準(zhǔn)備>

Aggregation 分類(lèi)

aggregations提供基于搜索查詢(xún)的聚合數(shù)據(jù),它有以下分類(lèi)

  • Bucket
    一組構(gòu)建bucket的聚合,其中每個(gè)bucket與一個(gè)鍵和一個(gè)文檔條件相關(guān)聯(lián)。當(dāng)執(zhí)行聚合時(shí),將對(duì)上下文中每個(gè)文檔計(jì)算所有bucket條件,當(dāng)某個(gè)條件匹配時(shí),將認(rèn)為文檔“落在”相關(guān)bucket中。在聚合過(guò)程的最后,我們將得到一個(gè)存儲(chǔ)段列表——每個(gè)存儲(chǔ)段都有一組“屬于”它的文檔。
  • Metric
    在一組文檔上跟蹤和計(jì)算指標(biāo)的聚合。
  • Matrix
    操作多個(gè)字段并根據(jù)從請(qǐng)求的文檔字段中提取的值生成矩陣結(jié)果的一組聚合。與Bucket和Metric不同,這個(gè)聚合還不支持腳本。
  • Pipeline
    聚合,聚合其他聚合及其相關(guān)指標(biāo)的輸出

聚合的語(yǔ)法

"aggregations" : { // 關(guān)鍵詞
    "<aggregation_name>" : { // 自定義的聚合名字
        "<aggregation_type>" : { // 聚合的類(lèi)型
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?  // 子聚合
    }
    [,"<aggregation_name_2>" : { ... } ]*  // 同級(jí)聚合
}

下面我們依次來(lái)看一下

Bucket

在es的文檔中有好多類(lèi)型,這里就不一一列舉了

  • Terms
  • Range
  • Date Range
  • Histogram
  • Date Histogram
  • ...
栗子1: terms

我們舉個(gè)栗子,看下有訂單中有幾種商品

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_goodsName": {
      "terms": {
        "field": "goodsName.keyword",
        "size": 10
      }
    }
  }
}

我們看下結(jié)果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_goodsName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "IPhone 8 Plus",
          "doc_count" : 2
        },
        {
          "key" : "IPhone 9 Plus",
          "doc_count" : 2
        },
        {
          "key" : "IPhone 10 Plus",
          "doc_count" : 1
        }
      ]
    }
  }
}
  • 優(yōu)化terms聚合的性能[在mapping時(shí)指定eager_global_ordinals為true]
    在字段需要經(jīng)常被聚合;同時(shí)不斷有新文檔寫(xiě)入時(shí),可以增加這個(gè)屬性
  • min_doc_count:我們可以在聚合時(shí)指定最小的文檔數(shù)目,只有滿足這個(gè)參數(shù)要求的個(gè)數(shù)的詞條才會(huì)被記錄返回

terms聚合中,返回結(jié)果中的屬性含義:

屬性 含義
doc_count_error_upper_bound 被遺漏的term桶,可能包含文檔的最大值
sum_other_doc_count 除了返回結(jié)果中bucket中的terms之外,其他terms的文檔總數(shù)(總數(shù)-返回的總數(shù))
栗子2:子聚合

取每種商品中,價(jià)格最高的1個(gè)訂單

# 先根據(jù)goodsName.keyword分組,然后在按照價(jià)格倒序排序,取第一個(gè)
GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_goodsName": {
      "terms": {
        "field": "goodsName.keyword",
        "size": 10
      },
      "aggs": {
        "more_amount": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "amount": {
                  "order": "desc"
                }
              }
              ]
          }
        }
      }
    }
  }
}

看下返回結(jié)果

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_goodsName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "IPhone 8 Plus",
          "doc_count" : 2,
          "more_amount" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_order",
                  "_type" : "_doc",
                  "_id" : "HAOY-SKXIS-LIWN",
                  "_score" : null,
                  "_source" : {
                    "platform" : "IOS",
                    "amount" : 1200,
                    "createTime" : "2020-04-15 10:00",
                    "originatorId" : 2,
                    "originatorName" : "李四",
                    "goodsId" : 1,
                    "goodsName" : "IPhone 8 Plus"
                  },
                  "sort" : [
                    1200
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "IPhone 9 Plus",
          "doc_count" : 2,
          "more_amount" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_order",
                  "_type" : "_doc",
                  "_id" : "USYX_SJJSUL_XUSYA",
                  "_score" : null,
                  "_source" : {
                    "platform" : "PC",
                    "amount" : 500,
                    "createTime" : "2020-01-20 10:00",
                    "originatorId" : 1,
                    "originatorName" : "張三",
                    "goodsId" : 2,
                    "goodsName" : "IPhone 9 Plus"
                  },
                  "sort" : [
                    500
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "IPhone 10 Plus",
          "doc_count" : 1,
          "more_amount" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_order",
                  "_type" : "_doc",
                  "_id" : "XXSA-KSUWL-USIA",
                  "_score" : null,
                  "_source" : {
                    "platform" : "PC",
                    "createTime" : "2020-01-20 10:00",
                    "originatorId" : 3,
                    "originatorName" : "王五",
                    "goodsId" : 3,
                    "goodsName" : "IPhone 10 Plus"
                  },
                  "sort" : [
                    -9223372036854775808
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}
栗子3:range

按照訂單價(jià)格區(qū)間進(jìn)行分組(通過(guò)這個(gè)例子,可以看到range是前閉后開(kāi)區(qū)間 [0, 300) )

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "amount_range": {
      "range": {
        "field": "amount",
        "ranges": [
          {
            "to": 300
          },
          {
            "from": 300,
            "to": 700
          },
          {
            "key": "gt 700",
            "from": 700
          }
        ]
      }
    }
  }
}

看下結(jié)果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "amount_range" : {
      "buckets" : [
        {
          "key" : "*-300.0",
          "to" : 300.0,
          "doc_count" : 1
        },
        {
          "key" : "300.0-700.0",
          "from" : 300.0,
          "to" : 700.0,
          "doc_count" : 2
        },
        {
          "key" : "gt 700",
          "from" : 700.0,
          "doc_count" : 1
        }
      ]
    }
  }
}
栗子4:script

首先計(jì)算出訂單中的年,然后按照年進(jìn)行分組

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_year": {
      "range": {
        "script": {
          "source": """
              JodaCompatibleZonedDateTime dateTime = doc['createTime'].value;
              return params.now - dateTime.getYear();
          """,
          "params": {
            "now": 2020
          }
        },
        "ranges": [
          {
            "to": 1
          },
          {
            "from": 1,
            "to": 3
          },
          {
            "from": 3,
            "to": 5
          },
          {
            "from": 5
          }
        ]
      }
    }
  }
}
栗子5:geo_distance

以給定位置為圓心畫(huà)一個(gè)圓,來(lái)找出那些地理坐標(biāo)落在其中的文檔

GET /aggs_hotel/_search
{
  "size": 0, 
  "aggs": {
    "rings_around_amsterdam": {
      "geo_distance": {
        "field": "location",
        "origin": {
          "lon": 109.0000000,
          "lat": 34.0000000
        },
        "ranges": [
          { "to" : 100000 },
          { "from" : 100000, "to" : 300000 },
          { "from" : 300000 }
        ]
      }
    }
  }
}

我們來(lái)看下結(jié)果

{
  "took" : 82,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "rings_around_amsterdam" : {
      "buckets" : [
        {
          "key" : "*-100000.0",
          "from" : 0.0,
          "to" : 100000.0,
          "doc_count" : 6
        },
        {
          "key" : "100000.0-300000.0",
          "from" : 100000.0,
          "to" : 300000.0,
          "doc_count" : 0
        },
        {
          "key" : "300000.0-*",
          "from" : 300000.0,
          "doc_count" : 2
        }
      ]
    }
  }
}
  • 我們可以使用unit來(lái)指定單位,默認(rèn)是m
By default, the distance unit is m (meters) but it can also accept: mi (miles), in (inches), 
yd (yards), km (kilometers), cm (centimeters), mm (millimeters).
  • 我們可以使用keyed,將buckets下的數(shù)組變?yōu)閎uckets下的每一個(gè)hash
栗子5:filter ,nested

我們查一下“澤蘭雅家酒店”這個(gè)酒店,會(huì)員等級(jí)為001,住離日期是[2020-05-01, 2020-05-03),所要花費(fèi)的價(jià)格等信息

# 第一種寫(xiě)法,直接篩選
GET /aggs_hotel_price/_search
{
  "size": 0, 
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "name.keyword": "澤蘭雅家酒店"
        }
      }
    }
  },
  "aggs": {
     "prices": {
        "nested": {
          "path": "prices"
        },
        "aggs": {
          "group_by_level": {
            "terms": {
              "field": "prices.level",
              "size": 1,
              "include": "001"
            },
            "aggs": {
              "date_range": {
                "date_range": {
                  "field": "prices.selldate",
                  "ranges": [
                    {
                      "from": "2020-05-01",
                      "to": "2020-05-03"
                    }
                  ]
                },
                "aggs": {
                  "stats": {
                    "stats": {
                      "field": "prices.price"
                    }
                  }
                }
              }
            }
          }
        }
    }
  }
}

# 第二種寫(xiě)法-并不是很建議,因?yàn)檫@里對(duì)名字進(jìn)行分組后篩選
GET /aggs_hotel_price/_search
{
  "size": 0,
  "aggs": {
    "group_by_name": {
      "filter": {
        "term": {
          "name.keyword": "澤蘭雅家酒店"
        }
      },
      "aggs": {
        "prices": {
          "nested": {
            "path": "prices"
          },
          "aggs": {
            "group_by_level": {
              "terms": {
                "field": "prices.level",
                "size": 1,
                "include": "001"
              },
              "aggs": {
                "date_range": {
                  "date_range": {
                    "field": "prices.selldate",
                    "ranges": [
                      {
                        "from": "2020-05-01",
                        "to": "2020-05-03"
                      }
                    ]
                  },
                  "aggs": {
                    "stats": {
                      "stats": {
                        "field": "prices.price"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

我們看下第二種方式的結(jié)果[需要注意的是,上述2中方式的返回結(jié)果的格式不一樣,因?yàn)榈诙N多了一次聚合]

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_name" : {
      "doc_count" : 1,
      "prices" : {
        "doc_count" : 6,
        "group_by_level" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "001",
              "doc_count" : 2,
              "date_range" : {
                "buckets" : [
                  {
                    "key" : "2020-05-01-2020-05-03",
                    "from" : 1.5882912E12,
                    "from_as_string" : "2020-05-01",
                    "to" : 1.588464E12,
                    "to_as_string" : "2020-05-03",
                    "doc_count" : 2,
                    "stats" : {
                      "count" : 2,
                      "min" : 9.0,
                      "max" : 15.0,
                      "avg" : 12.0,
                      "sum" : 24.0
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Metric

在Metric中,有兩種類(lèi)型,一種是單值類(lèi)型,另外一種是多值類(lèi)型,我們接下來(lái)分別看下

單值類(lèi)型(只返回一個(gè)分析結(jié)果)

在es的文檔中有好多類(lèi)型,這里就不一一列舉了

  • min 最小值
  • max 最大值
  • avg 平均值
  • sum 總和
  • cardinality 去重后的數(shù)量

接下來(lái)我們來(lái)舉個(gè)栗子

  • 我要查詢(xún)訂單中最小的支付金額
GET /aggs_order/_search
{
  "size": 0,  // 我這里沒(méi)有query部分,我也不關(guān)系它的返回,這里size設(shè)置為0
  "aggs": { // 這里是關(guān)鍵字,不能變的
    "min_aggs": { // 這里是自定義的aggs的名稱(chēng),自定義
      "min": { // 這里是要聚合的類(lèi)型,只能是我們上面說(shuō)的那些
        "field": "amount" // 要進(jìn)行聚合的字段
      }
    }
  }
}

返回結(jié)果如下:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "min_aggs" : {
      "value" : 100.0 // 可以看到這里返回了最小值
    }
  }
}

同樣我們也可以有多個(gè)聚合,這里我們查詢(xún)訂單中支付金額的最大,最小和平均值

GET /aggs_order/_search
{
  "size": 0, 
  "aggs": {
    "min_aggs": {
      "min": {
        "field": "amount"
      }
    },
    "max_aggs": {
      "max": {
        "field": "amount"
      }
    },
    "avg_aggs": {
      "avg": {
        "field": "amount"
      }
    }
  }
}

我們來(lái)看下結(jié)果

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_aggs" : { // 平均值
      "value" : 525.0
    },
    "min_aggs" : { // 最小值
      "value" : 100.0
    },
    "max_aggs" : { // 最大值
      "value" : 1200.0
    }
  }
}

多值類(lèi)型(返回多個(gè)分析結(jié)果)

在es的文檔中有好多類(lèi)型,這里就不一一列舉了

  • stats
  • extended stats
  • percentile
  • percentile rank
  • top hits
  • ...

我們舉個(gè)栗子,我要看下訂單中amount的綜合數(shù)據(jù),比如最大值,最小值等等

GET /aggs_order/_search
{
  "size": 0, 
  "aggs": {
    "stats_aggs": {
      "stats": { // 指定聚合類(lèi)型為多值類(lèi)型中的stats
        "field": "amount"
      }
    }
  }
}

我們看下返回結(jié)果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "stats_aggs" : {
      "count" : 4, // 包含amount字段的文檔數(shù)量
      "min" : 100.0, // 最小值
      "max" : 1200.0, // 最大值
      "avg" : 525.0, // 平均值
      "sum" : 2100.0 // 總和
    }
  }
}

Pipeline

對(duì)聚合再次進(jìn)行聚合

  • pipeline的分析結(jié)果根據(jù)不同聚合,會(huì)輸出到不同位置[以下解釋摘自冰鐘,多謝]
    1. Sibling - 以兄弟聚合(同級(jí)聚合)的結(jié)果作為輸入,對(duì)兄弟聚合的結(jié)果進(jìn)行聚合計(jì)算。計(jì)算出一個(gè)新的聚合結(jié)果,結(jié)果與兄弟聚合的結(jié)果同級(jí)。
      max,min,avg,sum
      stats,extended status
      percentiles
      ...
    2. Parent - 以父聚合的結(jié)果作為輸入,對(duì)父聚合的結(jié)果進(jìn)行聚合計(jì)算??梢杂?jì)算出新的桶或是新的聚合結(jié)果加入到現(xiàn)有的桶中。
      derivative[求導(dǎo)]
      cumltive sum[累計(jì)求和]
      moving function[滑動(dòng)窗口]
      ...

在pipeline的聚合中,必須要指定buckets_path,我們看下這個(gè)path的語(yǔ)法

buckets_path 的語(yǔ)法

# 聚合分隔符 ==> ">",指定父子聚合關(guān)系,如:"my_bucket>my_stats"
AGG_SEPARATOR       =  `>` ;

# metric aggregation的分隔符,指定度量值,如:“my_stats.avg”
# 我自己的實(shí)驗(yàn):bucket和bucket聚合之間用>,bucket和metric聚合之間用>或者.都行,metric和metric之間用metric
METRIC_SEPARATOR    =  `.` ;

# 聚合名稱(chēng) ==> <name of the aggregation> ,指定聚合的名稱(chēng)
AGG_NAME            =  <the name of the aggregation> ;

# 在多值metric聚合的情況下,指定metric聚合的名字
METRIC              =  <the name of the metric (in case of multi-value metrics aggregation)> ;

# 用于多值聚合選取其中指定名稱(chēng)的聚合進(jìn)行
# 如:sale_type['hat']>sales
MULTIBUCKET_KEY     =  `[<KEY_NAME>]`

# 最后的路徑公式為:
PATH                =  <AGG_NAME><MULTIBUCKET_KEY>? (<AGG_SEPARATOR>, <AGG_NAME> )* ( <METRIC_SEPARATOR>, <METRIC> ) ;
栗子1: min_bucket

計(jì)算個(gè)人訂單的平均金額,并從中取出最小的那個(gè)

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorId": {
      "terms": {
        "field": "originatorName"
      },
      "aggs": {
        "avg_amount": {
          "avg": {
            "field": "amount",
            "missing": 0
          }
        }
      }
    },
      "min_avg_amount": { // 這里是自定義的pipeline聚合的名字
        "min_bucket": { // 這里是關(guān)鍵字
          "buckets_path": "group_by_originatorId>avg_amount" // 這里是聚合路徑
        }
      }
  }
}

我們看下結(jié)果,因?yàn)閙in bucket是Sibling pipeline,所以結(jié)果與兄弟聚合的結(jié)果同級(jí)

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_originatorId" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "張三",
          "doc_count" : 2,
          "avg_amount" : {
            "value" : 300.0
          }
        },
        {
          "key" : "王五",
          "doc_count" : 2,
          "avg_amount" : {
            "value" : 150.0
          }
        },
        {
          "key" : "李四",
          "doc_count" : 1,
          "avg_amount" : {
            "value" : 1200.0
          }
        }
      ]
    },
    "min_avg_amount" : {
      "value" : 150.0,
      "keys" : [
        "王五"
      ]
    }
  }
}

聚合的作用范圍

默認(rèn)的作用范圍是query的查詢(xún)結(jié)果集
我們可以使用以下方式改變聚合的作用范圍

post filter

在聚合分析之后進(jìn)行篩選

# 按照名稱(chēng)分桶,分別統(tǒng)計(jì)每個(gè)人的訂單金額信息[在返回結(jié)果的aggregations中展示],最后篩選出張三的信息[在返回結(jié)果的hits中展示]
GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName"
      },
      "aggs": {
        "stats": {
          "stats": {
            "field": "amount"
          }
        }
      }
    }
  },
  "post_filter": {
    "term": {
      "originatorName": "張三"
    }
  }
}

查詢(xún)結(jié)果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "aggs_order",
        "_type" : "_doc",
        "_id" : "HASA-XSIAN-SIWU",
        "_score" : 1.0,
        "_source" : {
          "platform" : "Android",
          "amount" : 100,
          "createTime" : "2019-05-20 10:00",
          "originatorId" : 1,
          "originatorName" : "張三",
          "goodsId" : 1,
          "goodsName" : "IPhone 8 Plus"
        }
      },
      {
        "_index" : "aggs_order",
        "_type" : "_doc",
        "_id" : "USYX_SJJSUL_XUSYA",
        "_score" : 1.0,
        "_source" : {
          "platform" : "PC",
          "amount" : 500,
          "createTime" : "2020-01-20 10:00",
          "originatorId" : 1,
          "originatorName" : "張三",
          "goodsId" : 2,
          "goodsName" : "IPhone 9 Plus"
        }
      }
    ]
  },
  "aggregations" : {
    "group_by_originatorName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "張三",
          "doc_count" : 2,
          "stats" : {
            "count" : 2,
            "min" : 100.0,
            "max" : 500.0,
            "avg" : 300.0,
            "sum" : 600.0
          }
        },
        {
          "key" : "王五",
          "doc_count" : 2,
          "stats" : {
            "count" : 1,
            "min" : 300.0,
            "max" : 300.0,
            "avg" : 300.0,
            "sum" : 300.0
          }
        },
        {
          "key" : "李四",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 1200.0,
            "max" : 1200.0,
            "avg" : 1200.0,
            "sum" : 1200.0
          }
        }
      ]
    }
  }
}

global

在該聚合中,忽略掉query部分的查詢(xún)限制

GET /aggs_order/_search
{
  "size": 0, 
  "query": {
    "range": {
      "amount": {
        "gt": 100
      }
    }
  }, 
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName"
      },
      "aggs": {
        "stats": {
          "stats": {
            "field": "amount"
          }
        }
      }
    },
    "all": {
      "global": {},
      "aggs": {
        "group_by_originatorName": {
          "terms": {
            "field": "originatorName"
          },
          "aggs": {
            "stats": {
              "stats": {
                "field": "amount"
              }
            }
          }
        }
      }
    }
  }
}

我們看下結(jié)果比對(duì)下:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "all" : {
      "doc_count" : 5,
      "group_by_originatorName" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "張三",
            "doc_count" : 2,
            "stats" : {
              "count" : 2,
              "min" : 100.0,
              "max" : 500.0,
              "avg" : 300.0,
              "sum" : 600.0
            }
          },
          {
            "key" : "王五",
            "doc_count" : 2,
            "stats" : {
              "count" : 1,
              "min" : 300.0,
              "max" : 300.0,
              "avg" : 300.0,
              "sum" : 300.0
            }
          },
          {
            "key" : "李四",
            "doc_count" : 1,
            "stats" : {
              "count" : 1,
              "min" : 1200.0,
              "max" : 1200.0,
              "avg" : 1200.0,
              "sum" : 1200.0
            }
          }
        ]
      }
    },
    "group_by_originatorName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "張三",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 500.0,
            "max" : 500.0,
            "avg" : 500.0,
            "sum" : 500.0
          }
        },
        {
          "key" : "李四",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 1200.0,
            "max" : 1200.0,
            "avg" : 1200.0,
            "sum" : 1200.0
          }
        },
        {
          "key" : "王五",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 300.0,
            "max" : 300.0,
            "avg" : 300.0,
            "sum" : 300.0
          }
        }
      ]
    }
  }
}

排序

根據(jù)關(guān)鍵字排序

  • _count
  • _key

通過(guò)聚合后的文檔數(shù)量和關(guān)鍵詞排序

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName",
        "order": [
          {"_count": "desc"},
          {"_key": "desc"}
          ]
      }
    }
  }
}

根據(jù)子單值聚合結(jié)果排序

使用類(lèi)似min,max,min等返回單值結(jié)果的聚合作為排序條件

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName",
        "order": {
          "avg_amount": "desc"
        }
      },
      "aggs": {
        "avg_amount": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}

根據(jù)子多值聚合結(jié)果排序

使用類(lèi)似stats等返回多值結(jié)果的聚合中的某一項(xiàng)作為排序條件

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName",
        "order": {
          "stats_amount.sum": "desc"
        }
      },
      "aggs": {
        "stats_amount": {
          "stats": {
            "field": "amount"
          }
        }
      }
    }
  }
}

思考題

nested查詢(xún),內(nèi)部需要聚合,再刷選,怎么弄?

業(yè)務(wù)場(chǎng)景:當(dāng)前有100w用戶(hù),50w紅包記錄,一個(gè)用戶(hù)有多條紅包記錄。首先建100w索引記錄,然后在用戶(hù)記錄中,使用一個(gè)字段nested類(lèi)型,保存對(duì)應(yīng)當(dāng)前的紅包列表。
紅包記錄有:紅包金額,紅包有效期。
需求:需要實(shí)現(xiàn)一個(gè)功能,在當(dāng)前的紅包有效期內(nèi),累計(jì)的紅包金額滿足,對(duì)應(yīng)的當(dāng)前用戶(hù)有多少?

在找資料的時(shí)候,發(fā)現(xiàn)了這么一個(gè)問(wèn)題,然后我自己試了一下,現(xiàn)在給出我的答案

  • 第一種方式
    這種方式需要在terms中指定size,多分片時(shí)候會(huì)有數(shù)據(jù)精準(zhǔn)度問(wèn)題,而且如果size過(guò)大,會(huì)占用更多內(nèi)存,慎用
GET /aggs_user_envelope/_search
{
  "size": 0,
  "aggs": {
    "aggs_nested": {
      "nested": {
        "path": "envelope"
      },
      "aggs": {
        "filter_date": {
          "filters": {
            "filters": {
              "range": {
                "range": {
                  "envelope.until": {
                    "gte": "2020-05-30 00:00"
                  }
                }
              }
            }
          },
          "aggs": {
            "group_by_username": {
              "terms": {
                "field": "envelope.userId",
                "size": 10
              },
              "aggs": {
                "sum_of_money": {
                  "sum": {
                    "field": "envelope.money"
                  }
                },
                "filter_money": {
                  "bucket_selector": {
                    "buckets_path": {
                      "money": "sum_of_money"
                    },
                    "script": "params.money >= 50"
                  }
                },
                "sort": {
                  "bucket_sort": {
                    "sort": [
                      {"sum_of_money": {"order": "desc"}}
                      ,{"_count": {"order": "desc"}}
                      ,{"_key": {"order": "desc"}}
                      ]
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
  • 第二種方式
    官網(wǎng)推薦使用composite進(jìn)行分頁(yè),類(lèi)似scroll分頁(yè),但是composite聚合也有限制,內(nèi)部只能是Terms,Histogram,Date histogram這三種聚合

第一次分頁(yè):

GET /aggs_user_envelope/_search
{
  "size": 0,
  "aggs": {
    "nested_wrapper": {
      "nested": {
        "path": "envelope"
      },
      "aggs": {
        "group_by_userName": {
          "composite": {
            "size": 2, 
            "sources": [
              {
                "userName": {
                  "terms": {
                    "field": "envelope.userId",
                    "missing_bucket": true
                  }
                }
              }
            ]
          },
          "aggs": {
            "filter_date": {
              "filter": {
                "range": {
                  "envelope.until": {
                    "gte": "2020-05-30 00:00"
                  }
                }
              },
              "aggs": {
                "sum_of_money": {
                  "sum": {
                    "field": "envelope.money"
                  }
                }
              }
            },
            "filter_money": {
              "bucket_selector": {
                "buckets_path": {
                  "money": "filter_date>sum_of_money"
                },
                "script": "params.money >= 50"
              }
            },
            "sort": {
              "bucket_sort": {
                "sort": [
                  {"filter_date>sum_of_money": {"order": "desc"}}
                  ,{"_count": {"order": "desc"}}
                  ,{"_key": {"order": "desc"}}
                  ]
              }
            }
          }
        }
      }
    }
  }
}

我們看下第一次分頁(yè)的結(jié)果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nested_wrapper" : {
      "doc_count" : 9,
      "group_by_userName" : {
        "after_key" : {
          "userName" : "10086" // 使用這個(gè)作為下次分頁(yè)的依據(jù)
        },
        "buckets" : [
          {
            "key" : {
              "userName" : "10086"
            },
            "doc_count" : 3,
            "filter_date" : {
              "doc_count" : 3,
              "sum_of_money" : {
                "value" : 50.0
              }
            }
          }
        ]
      }
    }
  }
}

第二次分頁(yè)需要指定after

GET /aggs_user_envelope/_search
{
  "size": 0,
  "aggs": {
    "nested_wrapper": {
      "nested": {
        "path": "envelope"
      },
      "aggs": {
        "group_by_userName": {
          "composite": {
            "size": 2, 
            "sources": [
              {
                "userName": {
                  "terms": {
                    "field": "envelope.userId",
                    "missing_bucket": true
                  }
                }
              }
            ],
            "after": {"userName" : "10086"} // 這里指定after
          },
          "aggs": {
            "filter_date": {
              "filter": {
                "range": {
                  "envelope.until": {
                    "gte": "2020-05-30 00:00"
                  }
                }
              },
              "aggs": {
                "sum_of_money": {
                  "sum": {
                    "field": "envelope.money"
                  }
                }
              }
            },
            "filter_money": {
              "bucket_selector": {
                "buckets_path": {
                  "money": "filter_date>sum_of_money"
                },
                "script": "params.money >= 50"
              }
            },
            "sort": {
              "bucket_sort": {
                "sort": [
                  {"filter_date>sum_of_money": {"order": "desc"}}
                  ,{"_count": {"order": "desc"}}
                  ,{"_key": {"order": "desc"}}
                  ]
              }
            }
          }
        }
      }
    }
  }
}

3. 大功告成

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 227,488評(píng)論 6 531
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 98,034評(píng)論 3 414
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人,你說(shuō)我怎么就攤上這事。” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 175,327評(píng)論 0 373
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我,道長(zhǎng),這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 62,554評(píng)論 1 307
  • 正文 為了忘掉前任,我火速辦了婚禮,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 71,337評(píng)論 6 404
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 54,883評(píng)論 1 321
  • 那天,我揣著相機(jī)與錄音,去河邊找鬼。 笑死,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 42,975評(píng)論 3 439
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 42,114評(píng)論 0 286
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 48,625評(píng)論 1 332
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 40,555評(píng)論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 42,737評(píng)論 1 369
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,244評(píng)論 5 355
  • 正文 年R本政府宣布,位于F島的核電站,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 43,973評(píng)論 3 345
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 34,362評(píng)論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 35,615評(píng)論 1 280
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 51,343評(píng)論 3 390
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 47,699評(píng)論 2 370