MongoDB数据导入Elasticsearch

1. 本文测试环境

a) Elasticsearch版本5.2
b) Python版本2.7.13
c) Vmware环境，操作系统SUSE12

2. 安装Elasticsearch

a) 安装java运行环境: sudo zipper in java-1_8_0-openjdk
b) 下载并解压elasticsearch，下载地址https://www.elastic.co/downloads/elasticsearch
c) 修改elasticsearch配置文件config/elasticsearch.yml，添加
network.host: 0.0.0.0
http.cors.enabled: true
http.cors.allow-origin: "*"
d) 修改elasticsearch配置文件config/jvm.options，根据物理机内存大小调整这两个参数（其中的2048m表示2G的堆内存，直接写为2g亦可）
-Xmx2048m（-Xmx2g）
-Xms2048m（-Xms2g）
e) 启动elasticsearch: elasticsearch-5.5.2/bin/elasticsearch
f) 异常：ERROR: bootstrap checks failed

现象

解决办法

max file descriptors [4096] for elasticsearch process likely too low, increase to at least [65536]

修改/etc/security/limits.conf文件，添加或修改如下行：

* hard nofile 65536

* soft nofile 65536

max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]

修改/etc/sysctl.conf文件，添加

vm.max_map_count = 262144

修改完成后重启系统

3. 配置Elasticsearch集群

暂未研究。

4. 安装mongo-connector

a) Pip2 install mongo-connector
b) Pip2 install elastic2-doc-manager[elastic5]
也有其他同步插件，本文选择mongo-connector做测试。

5. 配置mongodb

开启mongodb副本集（Replica Set）。开启副本集才会产生oplog，副本拷贝主分片的oplog并通过oplog与主分片进行同步。
如果使用mongos作为入口，路由进程、各副本集节点需要配置相同的用户名和密码，此用户需配置backup和readAnyDatabase角色。该用户名和密码在使用mongo-connector导入数据时使用（即-a和-p两个参数）。
NOTE: With a sharded cluster, the mongo-connector user must be created on a mongos AND separately on each shard

6. 安装elasticsearch-head

a) git clone git://github.com/mobz/elasticsearch-head.git
b) sudo zipper in nmp6
c) cd elasticsearch-head
d) npm install 【比较耗时】
e) npm run start
MongoDB数据导入Elasticsearch

f) 浏览器打开http://localhost:9100/验证是否安装成功
MongoDB数据导入Elasticsearch

g) 连接至elasticsearch
MongoDB数据导入Elasticsearch

(有另一款Elastisearch客户端工具：Chrome扩展程序sense)

7. 导入mongodb中的数据到Elasticsearch集群

mongo-connector -o icssa.icsdevice.timestamp --auto-commit-interval=0 -m 127.0.0.1:27017 -t 172.15.227.179:9200 -d elastic2_doc_manager -n icssa.icsdevice -a elastic -p elastic 【该命令表示只同步icssa.icsdevice集合，如果要同步所有数据，去掉-n icssa.icsdevice参数】
mongo-connector –o scan.aiwen.timestamp --auto-commit-interval=0 -m 127.0.0.1:27017 -t 172.15.227.179:9200 -d elastic2_doc_manager -n scan.aiwen -a elastic -p elastic
注意：
① 执行命令后会在当前目录生成两个文件：mongo-connector.log和icssa.icsdevice.timestamp 【如果未指定-o参数，则默认为oplog.timestamp】。前者为同步的日志文件，后者存储了oplog的上一次同步时间戳。如果想重新同步全部数据，删除oplog.timestamp文件、elasticsearch中相关索引，再重新跑一次命令即可
② 当mongodb副本集节点的IP配置为127.0.0.1，如果是在远程主机上进行导入，导入时可能会报如下错误：
MongoDB数据导入Elasticsearch

③ 同步完成后在head工具界面可以查看到相关索引信息:

MongoDB数据导入Elasticsearch

8. 注意事项

使用mongo-connector命令同步数据时，mongo-connector的oplog（参照-o参数）不能随便删除，否则会引起重新同步所有数据的问题。该问题可以通过--no-dump选项关闭。
生产环境下建议将mongo-connector配置为系统服务，运行mongo-connector时采用配置文件的方式。

9. 踩过的坑

1. 数据库A中有多个集合(A1, A2, A3)，且已开启了副本集（Replica Set），但是集合A1可以同步，集合A2不能同步.
原因：oplog中有A1的操作记录，没有A2的操作记录。
结论：开启副本集（Replica Set）并不能保证一定能同步，oplog中必须包含待同步集合的操作记录，才能通过mongo-connector同步到Elasticsearch集群。
2. mongodb3.x版本加强了安全机制，导致了在只拥有某个库的权限时不能同步数据的问题。
原因：拥有某个库的权限，并不能拥有oplog的读取权限，而mongo-connector需要读取oplog的权限。
结论：同步数据至少需要能够读取oplog的权限，确保当前mongodb用户的权限能够操作oplog，或者直接使用mongodb的管理员权限。

以下来自官方文档，包括MongoDB访问权限配置等问题：
• Authenticate with a username and password using the default MONGODB-CR (pre-MongoDB 2.8) or SCRAM-SHA-1 (2.8+) mechanisms:
• mongo-connector -m mongodb://username:[email protected]:27017 ...
• Authenticate using Kerberos (requires the pykerberos package to be installed):
• mongo-connector -m mongodb://[email protected]@localhost/?authMechanism=GSSAPI&authSource=$external&gssapiServiceName=mongodb ...
• Authenticate using the default mechanism over SSL:
• mongo-connector -m mongodb://username:[email protected]:27017/?ssl=true
• Connect over SSL to MongoDB. Require and validate the server's certificate:
mongo-connector --ssl-certfile client.pem --ssl-certificate-policy required ...

Required Permissions
Mongo Connector authenticates only as one user. The simplest way to get mongo-connector running is to create a user with the backup role:
$ mongo
replset:PRIMARY> db.getSiblingDB("admin").createUser({
user: "mongo-connector-username",
pwd: "password",
roles: ["backup"]
})
NOTE: With a sharded cluster, the mongo-connector user must be created on a mongos AND separately on each shard. This enables mongo-connector to authenticate to the cluster as a whole and to each shard individually.
The following table explains what privileges are needed and what the built-in role for MongoDB grants this privilege:
Privilege Needed ReasonBuilt-in Role
find on theconfig.shardscollection Required if the source is a sharded cluster.backup or clusterMonitor

listDatabases
Required to determine what databases to read from if no --namespace-set is given.backup or readAnyDatabase

find on thelocal.oplog.rscollection Required for tailing the oplog.backup or clusterManagerand readAnyDatabase [email protected]

find
Required to read the specified collections given in --namespace-set.backup or read orreadAnyDatabase

Note that in order to read the oplog, the user needs to be defined in the admin database.

MongoDB数据导入Elasticsearch

1. 本文测试环境

2. 安装Elasticsearch

3. 配置Elasticsearch集群

4. 安装mongo-connector

5. 配置mongodb

6. 安装elasticsearch-head

7. 导入mongodb中的数据到Elasticsearch集群

8. 注意事项

9. 踩过的坑

相关推荐