php+Sphinx分词中间件的认识和基础使用（亲测）

sphinx安装完毕之后，有两种方式使用
1.安装php拓展
2.调用包里给的sphinxapi.php
这里只介绍调用api的方式。原因：sphinx的php拓展更新的巨慢，不好做系统升级

首先先上图介绍一下各大分词查询中间件，没有更优秀，只有更适合。
图来自：http://www.sphinxsearch.org/archives/492
文章写的相对全面，通俗易懂，可以看看。
php+Sphinx分词中间件的认识和基础使用（亲测）

sphinx基础使用

点击快速到达
Windows
Linux
常见错误

Windows

http://sphinxsearch.com/downloads/current/
下载Sphinx 3.1.1（目前最新版），贴一个最基础的配置sphinx.conf

src1：Source 源，对应每次sphinx数据库操作的内容
test1：Index 索引，执行src1，并生成日志
searchd：安装和启动
src1和test1可以修改

source src1
{
    type                = mysql
    sql_host            = localhost
    sql_user            = root
    sql_pass            = root
    sql_db              = test
    sql_port            = 3306
    sql_query_pre       = SET NAMES utf8
    sql_query_pre       = SET SESSION query_cache_type = OFF
    sql_query           = SELECT * FROM log
	
	############ 查询的字段，根据不同的表结构修改
	sql_field_string    = type
	sql_field_string    = post_data
	sql_field_string    = http_respon
	sql_field_string    = code
	sql_field_string    = add_time
	
	xmlpipe_field       = post_data
}

index test1
{
    source          = src1
    ############ 目录不存在的话要自己建，下同
    path            = E:\sphinx-3.1.1\data\test1  
    morphology      = none
    stopwords       =
}

indexer
{
    mem_limit       = 32M
}

searchd
{
    listen              = 9312
    listen              = 9306:mysql41
    log                 = E:\sphinx-3.1.1\log\searchd.log
    query_log           = E:\sphinx-3.1.1\log\query.log
    read_timeout        = 5
    max_children        = 30
    pid_file            = E:\sphinx-3.1.1\log\searchd.pid
}

将此文件放到bin目录下，我们好来操作和测试。

下面是我的数据表结构

CREATE TABLE `log` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `type` varchar(20) NOT NULL,
  `post_data` varchar(255) NOT NULL,
  `http_respon` varchar(100) NOT NULL,
  `code` varchar(20) NOT NULL,
  `add_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

打开cmd或者powershell，cd到bin目录，执行命令：

# 这里的test1可以改成--all，sphinx.conf是默认的
.\indexer.exe [-c sphinx.conf] test1

php+Sphinx分词中间件的认识和基础使用（亲测）
可以看到有一条写入，打开配置文件中的path目录，我这里设置的是data，会发现生成了一系列的.sp*文件。
这边就是sphinx给目标数据做的分词。
我们继续，依旧在bin目录下执行（需要超级管理员权限）：

# 安装，如已安装则忽略
.\searchd.exe --install 
# 启动sphinx
.\searchd.exe [-c sphinx.conf]

现在，我们试试查询，将api目录里新建一个sphinxSearch.php
api调用文档可以看官方wiki：http://sphinxsearch.com/wiki/doku.php?id=sphinx_manual_chinese#通用api方法

<?php

	header("Content-type:text/html;charset=utf-8");
	# 引入sphinx接口文件
	require 'E:\sphinx-3.1.1\sphinxapi.php';
	$keyword = 'create_time';
	
	$sphinx = new SphinxClient();
	$sphinx->SetServer('localhost',9312);
	
	# *表示在所有索引里面进行搜索
	$result = $sphinx->query($keyword,'*');
	print_r($sphinx->GetLastError());
	print_r($result);die;
	
?>

执行 php sphinxSearch.php
php+Sphinx分词中间件的认识和基础使用（亲测）
可以看到我想要的结果查出来了，这边通常的做法就是，sphinx获取到所有目标的id，再用id去mysql最终结果。
还没完，要想在不停止searchd的情况下想增加sphinx库怎么办？
sphinx这边给出了一个增量索引和合并索引的概念
以下是给增量索引用的Source 源和index 索引，可以写在同一配置文件sphinx.conf下：

source src2
{
    type                = mysql
    sql_host            = localhost
    sql_user            = root
    sql_pass            = root
    sql_db              = test
    sql_port            = 3306
    sql_query_pre       = SET NAMES utf8
    sql_query_pre       = SET SESSION query_cache_type = OFF
    ############ 需要新增一个config数据表来记录当前sphinx的id，当然也可以找其他方式做新增记录
    sql_query           = SELECT * FROM log where id > (select value from config where name = 'sphinx_max_id')
    ############ 更新完数据，记录当前id
    sql_query_post      = update config set value = (select id from log order by id desc limit 1)
	
	sql_field_string    = type
	sql_field_string    = post_data
	sql_field_string    = http_respon
	sql_field_string    = code
	sql_field_string    = add_time
	
	xmlpipe_field       = post_data
}
index test1_1
{
    source              = src2
    path                = E:\sphinx-3.1.1\data\test1_1
    morphology          = none
    stopwords           =
}

数据库截图：
php+Sphinx分词中间件的认识和基础使用（亲测）
命令行放到定时任务（需要超级管理员权限）：

# 增量索引 
.\indexer.exe [-c .\sphinx.conf] test1_1 --rotate 
# 增量索引合并 (test1_1合并到test1)，当清除增量索引的.sp*文件，或再一次执行增量索引，数据会丢失或被覆盖
.\indexer.exe [-c .\sphinx.conf] --merge test1 test1_1 --rotate 
# 如果按照我上面写的方式来新增索引，那么每次增量完一定要合并

当增量执行时，由于数据太多，还没执行完，就执行了合并，这样会造成数据丢失。

在windows下大多数都是用来做本地环境，不会有这种困扰。
这边也建议用Linux来作线上环境，毕竟win搭载个图像，性能肯定有所下降。

有空来更，要睡觉了

Linux

首先一样样，到 http://www.sphinxsearch.org/archives/492 下载源包压缩。
我这里用的是vagrant+虚拟机，可以直接下载放入目录。
php+Sphinx分词中间件的认识和基础使用（亲测）
贴一下配置文件sphinx.conf，作为测试我放在bin目录下。
配置基础介绍上面有。
里面log和path路径根据各自需求修改，保证目录存在。

source src1
{
    type                = mysql
    sql_host            = localhost
    sql_user            = root
    sql_pass            = root
    sql_db              = test
    sql_port            = 3306
	
    sql_query_pre       = SET NAMES utf8
    sql_query_pre       = SET SESSION query_cache_type = OFF
    sql_query           = SELECT * FROM log
	
	sql_field_string    = type
	sql_field_string    = post_data
	sql_field_string    = http_respon
	sql_field_string    = code
	sql_field_string    = add_time
	
	xmlpipe_field       = post_data
}
source src2
{
    type                = mysql
    sql_host            = localhost
    sql_user            = root
    sql_pass            = root
    sql_db              = test
    sql_port            = 3306
    sql_query_pre       = SET NAMES utf8
    sql_query_pre       = SET SESSION query_cache_type = OFF
    sql_query           = SELECT * FROM log where id > (select value from config where name = 'sphinx_max_id')
    sql_query_post      = update config set value = (select id from log order by id desc limit 1)
	
	sql_field_string    = type
	sql_field_string    = post_data
	sql_field_string    = http_respon
	sql_field_string    = code
	sql_field_string    = add_time
	
	xmlpipe_field       = post_data
}

index test1
{
    source              = src1
    path                = /vagrant/sphinx-3.1.1/data/test1
    morphology          = none
    stopwords           =
}
index test1_1
{
    source              = src2
    path                = /vagrant/sphinx-3.1.1/data/test1_1
    morphology          = none
    stopwords           =
}

indexer
{
    mem_limit           = 32M
}

searchd
{
    listen              = 9312
    listen              = 9306:mysql41
    log                 = /vagrant/sphinx-3.1.1/log/searchd.log
    query_log           = /vagrant/sphinx-3.1.1/log/query.log
    read_timeout        = 5
    max_children        = 30
    pid_file            = /vagrant/sphinx-3.1.1/log/searchd.pid
}

添加索引：

# 这里-c sphinx.conf 要写，不然默认它会去找/etc/sphinx/sphinx.conf
indexer -c sphinx.conf --all

启动：

# 需要sudo权限
searchd -c sphinx.conf

# 停止
searchd -c sphinx.conf --stop

查找类sphinxSearch.php

<?php

	header("Content-type:text/html;charset=utf-8");
	#步骤1：引入sphinx接口文件
	require './sphinxapi.php';
	$keyword = '';
	$sphinx = new SphinxClient();
	$sphinx->SetServer('localhost',9312);
	# *表示在所有索引里面进行搜索
	$result = $sphinx->query($keyword,'*');
	
	# 要想在'post_data'字段找关键字'a'，要下面这么写。
	//$result = $sphinx->query("@post_data a",'*');
	
	print_r($sphinx->GetLastError());
	print_r($result);die;
	
?>

其他命令行

# sudo 增量索引 
indexer -c sphinx.conf test1_1 --rotate 
# 合并索引
indexer -c sphinx.conf --merge test1 test1_1 --rotate

上面有说到，当增量数据很大的时候，还没增量完就合并索引，会导致数据缺失。

解决办法就是
当增量索引的时候，会产生除了test1_1.tmp.spl之外的<.tmp*>缓存文件，写一个脚本去判断是否存在这些文件即可。

常见错误

1.大部分错误都是权限问题，或者是某些目录找不到之类的。

2.我在linux上运行php sphinxSearch.php的时候会有个错误。

searchd error: clisearchd -c sphinx.conf --stopn version (client is v.1.32, daemon is v.1.31)

我找了半天，后来发现，从官网下下来的linux3.*版本，searchd启动之后变成了2.2.11，不明所以。
php+Sphinx分词中间件的认识和基础使用（亲测）
既然服务端变成了这个版本，客户端我们也去找，客户端指的就是<sphinxapi.php>这个文件。

找到里面的api文件，替换掉当前使用的。

php+Sphinx分词中间件的认识和基础使用（亲测）

sphinx基础使用

Windows

Linux

常见错误

相关推荐