elasticsearch安装中文分词器插件smartcn

elasticsearch默认分词器比较坑，中文的话，直接分词成单个汉字。

我们这里来介绍下smartcn插件，这个是官方推荐的，中科院搞的，基本能满足需求；

还有另外一个IK分词器。假如需要自定义词库的话，那就去搞下IK，主页地址：https://github.com/medcl/elasticsearch-analysis-ik

smartcn安装比较方便，

直接用 elasticsearch的bin目录下的plugin命令；

先进入elasticsearch的bin目录

然后执行 sh elasticsearch-plugin install analysis-smartcn

-> Downloading analysis-smartcn from elastic

[=================================================] 100%

-> Installed analysis-smartcn

下载自动安装；

（注意，假如集群是3个节点，所有节点都需要安装；不过一般都是先一个节点安装好所有的东西，然后克隆几个节点，这样方便）

安装后 plugins目录会多一个smartcn文件包；

安装后，我们需要重启es；

然后我们来测试下；

POST http://192.168.1.111:9200/_analyze/

{"analyzer":"standard","text":"我是中国人"}

执行标准分词器；

结果：

elasticsearch安装中文分词器插件smartcn

中文都是单个字了；

很不符合需求；

我们用下 smartcn；

{"analyzer":"smartcn","text":"我是中国人"}

执行结果：

elasticsearch安装中文分词器插件smartcn

我们发现中国编程个单个词汇；

我们新建索引film2

然后映射的时候，指定smartcn分词；

post http://192.168.1.111:9200/film2/_mapping/dongzuo/

{

"properties": {

"title": {

"type": "text",

"analyzer": "smartcn"

"publishDate": {

"type": "date"

"content": {

"type": "text",

"analyzer": "smartcn"

"director": {

"type": "keyword"

"price": {

"type": "float"

}

然后执行前面的数据代码；

这样前面film索引，数据是标准分词，中文全部一个汉字一个汉字分词；film2用了smartcn，根据内置中文词汇分词；

我们用java代码来搞分词搜索；

先定义一个静态常量：

private static final String ANALYZER="smartcn";

/**

 * 条件分词查询

 * @throws Exception

 */
@Test

public void search()throws Exception{

    SearchRequestBuilder srb=client.prepareSearch("film2").setTypes("dongzuo");

    SearchResponse sr=srb.setQuery(QueryBuilders.matchQuery("title", "星球狼").analyzer(ANALYZER))

        .setFetchSource(new String[]{"title","price"}, null)

        .execute()

        .actionGet(); 

    SearchHits hits=sr.getHits();

    for(SearchHit hit:hits){

        System.out.println(hit.getSourceAsString());

    }
}

指定了中文分词，查询的时候查询的关键字先进行分词然后再查询，不指定的话，默认标准分词；

这里再讲下多字段查询，比如百度搜索，搜索的不仅仅是标题，还有内容，所以这里就有两个字段；

我们使用 multiMatchQuery 我们看下Java代码：‘’

/**

 * 多字段条件分词查询

 * @throws Exception

 */
@Test

public void search2()throws Exception{

    SearchRequestBuilder srb=client.prepareSearch("film2").setTypes("dongzuo");

    SearchResponse sr=srb.setQuery(QueryBuilders.multiMatchQuery("非洲星球", "title","content").analyzer(ANALYZER))

        .setFetchSource(new String[]{"title","price"}, null)

        .execute()

        .actionGet(); 

    SearchHits hits=sr.getHits();

    for(SearchHit hit:hits){

        System.out.println(hit.getSourceAsString());

    }
}

elasticsearch安装中文分词器插件smartcn

相关推荐