ElasticSearch超强聚合查询(一)
Elasticsearch聚合查询一
聚合与搜索的概念
通俗的说:搜索是查找某些具体的文档.然而聚合就是对这些搜索到的文档进行统计.例如:
(你的es数据里面记录的都是一些关于针的数据)
- 针的平均长度是多少?
- 按照针的制造商来分组,针的长度中位值是多少?
- 每个月加入到某地区中的针有多少?
上面这些问题就是数据的聚合.聚合还可以有更加细致的问题:
- 最受欢迎的针的制造商是什么?
- 在数据中是否有异常的针?
聚合可以计算很多我们需要的数据,这些数据统计在关系行数据中的计算可能要花很长的时间,但是在Elasticsearch中,虽然这个和实现查询的功能不同,但是他们使用相同的数据结构,它可以很快的速度就能把这些数据计算出来,就和查询的速度几乎是一样的,而且这些数据结果还是实时的.这个就是Elasticsearch用得比较广泛的原因之一.
高阶概念
- Buckets(桶/集合):满足特定条件的文档的集合
- Metrics(指标):对桶内的文档进行统计计算(例如最小值,求和,最大值等).
举例说明—关于汽车数据的相关聚合(Index=cars;type=transactions)
- 第一步添加创建相关的数据
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
注意点:官方文档说明,如何设置fildData.
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
设置方法:
实战之—-查询那个颜色的汽车销量最好?
使用http-restfull查询
GET /cars/transactions/_search
{
"size" : 0,//不需要返回文档,所以直接设置为0.可以提高查询速度
"aggs" : { //这个是aggregations的缩写,这边用户随意,可以写全称也可以缩写
"popular_colors" : { //定义一个聚合的名字,与java的方法命名类似,建议用'_'线来分隔单词
"terms" : { //定义单个桶(集合)的类型为 terms
"field" : "color"(字段颜色进行分类,类似于sql中的group by color)
}
}
}
}
使用java-api的形式查询
public void aggsTermsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("popular_colors")
.field("color"))
.setSize(0)
.get();
Aggregation popular_colors = response.getAggregations().get("popular_colors");
}
返回的结果
{
...
"hits": {
"hits": [] //因为我们设置了返回的文档数量为0,所以在这个文档里面是不会包含具体的文档的
},
"aggregations": {
"popular_colors": {
"buckets": [
{
"key": "red",
"doc_count": 4 //在红色车子集合的数量
},
{
"key": "blue",
"doc_count": 2
},
{
"key": "green",
"doc_count": 2
}
]
}
}
}
实战之—-在上面的聚合基础上添加一些指标—>’average‘平均价格
- http请求查询
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": { //为指标新增aggs层
"avg_price": { //指定指标的名字,在返回的结果中也是用这个变量名来储存数值的
"avg": {//指标参数:平均值
"field": "price" //明确求平均值的字段为'price'
}
}
}
}
}
}
- java-api查询
@Test
public void setMertricsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("colors")
.field("color")
//添加指标
.subAggregation(AggregationBuilders
.avg("avg_price")
.field("price")
)
)
.setSize(0)
.get();
Aggregation colors = response.getAggregations().get("colors");
}
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,//这个指标是不需要设置都会带上的
"avg_price": { //这个是我们在上面自定义的一个指标的名字
"value": 32500
}
},
{
"key": "blue",
"doc_count": 2,
"avg_price": {
"value": 20000
}
},
{
"key": "green",
"doc_count": 2,
"avg_price": {
"value": 21000
}
}
]
}
}
...
}
实战之—-桶/集合(Buckets)的嵌套,在沙面的基础上,先按照颜色划分—>再汽车按照厂商划分
- http请求
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"make": { //命名子集合的名字
"terms": {
"field": "make" //按照字段'make'再次进行分类
}
}
}
}
}
}
- java-api请求方式
@Test
public void subMertricsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("colors")
.field("color")
.subAggregation(AggregationBuilders
.avg("avg_price")
.field("price")
)
.subAggregation(AggregationBuilders
.terms("make")//子集合的名字
.field("make")//分类的字段
)
)
.setSize(0)
.get();
Aggregation colors = response.getAggregations().get("colors");
}
- 返回结果
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"make": { //子集合的名字
"buckets": [
{
"key": "honda",
"doc_count": 3
},
{
"key": "bmw",
"doc_count": 1
}
]
},
"avg_price": {
"value": 32500
}
},
...
}
实战之—-在上面的结果基础上,在增加一个指标,就是查询出每个制造商生产的最贵和最便宜的车子的价格分别是多少
- http请求
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { "avg": { "field": "price" }
},
"make" : {
"terms" : {
"field" : "make"
},
"aggs" : {
"min_price" : { //自定义变量名字
"min": { //参数-最小值
"field": "price"
}
},
"max_price" : {
"max": { //参数-最大值
"field": "price"
}
}
}
}
}
}
}
}
- java-api请求
@Test
public void subMertricsQuery(){
SearchResponse response = transportClient.prepareSearch("cars")
.setTypes("transactions")
.addAggregation(
AggregationBuilders.terms("colors")
.field("color")
.subAggregation(AggregationBuilders
.avg("avg_price")
.field("price")
)
.subAggregation(AggregationBuilders
.terms("make")
.field("make")
.subAggregation(AggregationBuilders
.max("max_price")
.field("price")
)
.subAggregation(AggregationBuilders
.min("min_price")
.field("price")
)
)
)
.setSize(0)
.get();
Aggregation colors = response.getAggregations().get("colors");
}
- 返回结果
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"make": {
"buckets": [
{
"key": "honda",
"doc_count": 3,
"min_price": {
"value": 10000
},
"max_price": {
"value": 20000
}
},
{
"key": "bmw",
"doc_count": 1,
"min_price": {
"value": 80000
},
"max_price": {
"value": 80000
}
}
]
},
"avg_price": {
"value": 32500
}
},
...
--------------------- 本文来自 ydw_武汉 的**** 博客 ,全文地址请点击:https://blog.****.net/ydwyyy/article/details/79487995?utm_source=copy