Elasticsearch---学习记录(2)
仅供自己作学习笔记,详情请移步es官方文档
9.记录------sql插件
安装sql插件以后,就有两种方式查询数据
-
还是url里面直接使用
_sql
+"sql查询语句"
curl -XPOST http://172.16.150.149:29200/_sql?pretty -d "SELECT * FROM facebook" { "took" : 1, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 1.0, "hits" : [ { "_index" : "facebook", "_type" : "blog", "_id" : "pretty", "_score" : 1.0, "_source" : { "title" : "website", "text" : "blog is making", "date" : "2018/1016" } }, { "_index" : "facebook", "_type" : "blog", "_id" : "AWZ668ZcHFL4sAFl7IMI", "_score" : 1.0, "_source" : { "title" : "website", "text" : "blog is making", "date" : "2018/1016" } }, { "_index" : "facebook", "_type" : "blog", "_id" : "AWZ67I_dHFL4sAFl7IMJ", "_score" : 1.0, "_source" : { "title" : "website", "text" : "blog is making", "date" : "2018/1016" } }, { "_index" : "facebook", "_type" : "blog", "_id" : "123", "_score" : 1.0, "_source" : { "title" : "change version num", "text" : "changing...", "views" : 0, "tags" : [ "testing" ] } } ] } }
-
sql插件可视化界面
10.记录------GET多个文档
mget API 要求有一个 docs 数组作为参数,每个 元素包含需要检索文档的元数据, 包括 _index 、 _type 和 _id 。
当_index,_type相同的情况下,直接就传一个ids
数组
curl -i -XGET http://172.16.150.149:29200/facebook/blog/_mget?pretty -d " {"ids":["123","888"]}"
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 504
{
"docs" : [ {
"_index" : "facebook",
"_type" : "blog",
"_id" : "123",
"_version" : 121,
"found" : true,
"_source" : {
"title" : "change version num",
"text" : "changing...",
"views" : 0,
"tags" : [ "testing" ]
}
}, {
"_index" : "facebook",
"_type" : "blog",
"_id" : "888",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "website",
"text" : "new test is made",
"date" : "2018/10/17"
}
} ]
}
11.记录------bulk批量操作
为什么需要换行?
肯定是要从性能消耗的角度上看.以每条指令,作为一个数据源操作,直接读取,减少JVM的消耗.
bulk API 按如下步骤顺序执行:
客户端向 Node 1 -master发送 bulk 请求。
Node 1 为每个节点创建一个批量请求,并将这些请求并行转发到每个包含主分片的节点主机。
主分片一个接一个按顺序执行每个操作。当每个操作成功时,主分片并行转发新文档(或删除)到副本分片,然后执行下一个操作。 一旦所有的副本分片报告所有操作成功,该节点将向协调节点报告成功,协调节点将这些响应收集整理并返回给客户端。
由这个也可以看出是bulk的操作是非原子性的.
自己遇到的问题是怎么换行,而不是续行?
在github上面看到了解决方案(自己使用ubuntu进行测试),加入-H 'Content-Type: application/json'
curl -H 'Content-Type: application/json' -i -XPOST http://172.16.150.149:29200/_bulk -d '
{"create":{"_index":"twitter","_type":"newtype","_id":970}}
{ "create": { "_index": "user", "_type": "doc", "_id": "2" }}
'
然后就可以愉快地随意换行了,结尾注意'
,其实忘记输入,直接回车,也只会有另起一行.
12.了解------routing的作用
文档中讲了es的存储方式,这里就简单了解记录.
shard = hash(routing) % number_of_primary_shards
routing 是一个可变值,默认是文档的 _id ,也可以设置成一个自定义的值。 routing 通过 hash 函数生成一个数字,然后这个数字再除以 number_of_primary_shards (主分片的数量)后得到 余数 。这个分布在 0 到 number_of_primary_shards-1 之间的余数,就是我们所寻求的文档所在分片的位置。
这就解释了为什么我们要在创建索引的时候就确定好主分片的数量 并且永远不会改变这个数量:因为如果数量变化了,那么所有之前路由的值都会无效,文档也再也找不到了。
13.记录-------空搜索
不指定查询语句
GET /_search
curl -XGET http://172.16.150.149:29200/facebook/_search?pretty
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
"_index" : "facebook",
"_type" : "blog",
"_id" : "pretty",
"_score" : 1.0,
"_source" : {
"title" : "website",
"text" : "blog is making",
"date" : "2018/1016"
}
}, {
"_index" : "facebook",
"_type" : "blog",
"_id" : "888",
"_score" : 1.0,
"_source" : {
"title" : "website",
"text" : "new test is made",
"date" : "2018/10/17"
}
}, {
"_index" : "facebook",
"_type" : "blog",
"_id" : "AWZ668ZcHFL4sAFl7IMI",
"_score" : 1.0,
"_source" : {
"title" : "website",
"text" : "blog is making",
"date" : "2018/1016"
}
}, {
"_index" : "facebook",
"_type" : "blog",
"_id" : "AWZ67I_dHFL4sAFl7IMJ",
"_score" : 1.0,
"_source" : {
"title" : "website",
"text" : "blog is making",
"date" : "2018/1016"
}
}, {
"_index" : "facebook",
"_type" : "blog",
"_id" : "123",
"_score" : 1.0,
"_source" : {
"title" : "change version num",
"text" : "changing...",
"views" : 0,
"tags" : [ "testing" ]
}
} ]
}
}
主要字段含义
took:查询消耗时间.
timeout:设定一个时间来等待各个节点,分片返回的结果,过时就关闭连接.
hits:记录主要信息,_index,_type,_id等.
shards:分片信息.