(二) Getting started with Elasticsearch(官网文档7.7版本学习）

目录：

一、Get Elasticsearch up and running

二、Index some documents

三、 Start searching

四、 Analyze results with aggregations

一、Get Elasticsearch up and running

1、安装教程 https://www.elastic.co/guide/en/elastic-stack-get-started/7.7/get-started-elastic-stack.html

If you’re already familiar with Elasticsearch and want to see how it works with the rest of the stack,

you might want to jump to the Elastic Stack Tutorial to see how to set up a system monitoring solution with Elasticsearch, Kibana, Beats, and Logstash

2、查看集群状态、节点数量分片数量

GET /_cat/health?v

(二) Getting started with Elasticsearch(官网文档7.7版本学习）

如果你是正跑着个单独实例，集群状态就是黄色的。单个节点集群也是具备全功能的，但没得备份数据到另一个节点。副本分片必须集群节点为绿色才可以。如果集群状态为红色，一些数据是不可用的。

The cluster status will remain yellow if you are only running a single instance of Elasticsearch.

A single node cluster is fully functional, but data cannot be replicated to another node to provide resiliency.

Replica shards must be available for the cluster status to be green. If the cluster status is red, some data is unavailable.

3、看一下curl的使用规则

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'

This example uses the following variables:

<VERB> The appropriate HTTP method or verb. For example, GET, POST, PUT, HEAD, or DELETE.

<PROTOCOL> Either http or https. Use the latter if you have an HTTPS proxy in front of Elasticsearch or you use Elasticsearch security features to encrypt HTTP communications.

<HOST> The hostname of any node in your Elasticsearch cluster. Alternatively, use localhost for a node on your local machine.

<PORT> The port running the Elasticsearch HTTP service, which defaults to 9200.

<PATH> The API endpoint, which can contain multiple components, such as _cluster/stats or _nodes/stats/jvm.

<QUERY_STRING> Any optional query-string parameters. For example, ?pretty will pretty-print the JSON response to make it easier to read.

<BODY> A JSON-encoded request body (if necessary).

上面的GET /_cat/health?v可转化为curl -XGET 'http://localhost:9200/_cat/health?v'

4、如果es安全特性开户了，必须加上用户名和密码作为cURL命令参数进行访问

If the Elasticsearch security features are enabled, you must also provide a valid user name (and password) that has authority to run the API. For example, use the -u or --u cURL command parameter. For details about which security privileges are required to run each API, see REST APIs.

二、Index some documents

1、添加索引文档

如果customer索引不存在会创建customer索引，并添加id为1的带有name字段的文档

当前5.4.2版本如果doc是_doc会报异常：

2、通过id获取文档

3、indexing document in bulk 批量索引文件

批量的添加索引是比单独一个个添加是要更快的。最佳的批量大小取决于许多因素：文档大小和复杂度，索引和查询负载、资源可用。最佳的批量是1000-5000个文档，大概有5-15MB

If you have a lot of documents to index, you can submit them in batches with the bulk API.

Using bulk to batch document operations is significantly faster than submitting requests individually as it minimizes network roundtrips.

The optimal batch size depends on a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster.

A good place to start is with batches of 1,000 to 5,000 documents and a total payload between 5MB and 15MB. From there, you can experiment to find the sweet spot.

4、查看索引分布

GET /_cat/indices?v

三、 Start searching

1、查询语句

GET /customer/_search

{

"query": {

"match_all": {}

}

结果属性说明：

The response also provides the following information about the search request:

* took – how long it took Elasticsearch to run the query, in milliseconds

* timed_out – whether or not the search request timed out

* _shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.

* max_score – the score of the most relevant document found

* hits.total.value - how many matching documents were found

* hits.sort - the document’s sort position (when not sorting by relevance score)

* hits._score - the document’s relevance score (not applicable when using match_all) 文档的关联分数

2、增加from从第几条开始找起/找多少条，并且查询name条件为John

3、匹配一个短语使用match_phrase，当查询Joh时是查不到值的

4、复杂查询

must要包含，must_not要排除

5、过滤filter的使用

范围查询balance在10000-20000之间

四、Analyze results with aggregations

1、terms的使用：bank索引以state进行group by 降序返回10种状态. group_by_state只是个名称，${fieldName}.keyword的字段需要是字符串，如果以age进行group by是不行的。

buckets中的key表示state的值，doc_count为该state的数量 .

size为0所以只返回聚合的结果，不会返回查到的具体数据，如果大于0，则会返回多少条查询结果

the following request uses a terms aggregation to group all of the accounts in the bank index by state, and returns the ten states with the most accounts in descending order.

The buckets in the response are the values of the state field. The doc_count shows the number of accounts in each state.

For example, you can see that there are 27 accounts in ID (Idaho). Because the request set size=0, the response only contains the aggregation results.

2、avg的使用：在排序的基础上再进行求平均值

3、order的使用：通过计算的值再进行order排序

4、学习方向

(二) Getting started with Elasticsearch(官网文档7.7版本学习）

相关推荐