es集群数据备份到hdfs

一、es集群数据备份

es的eeplica 提供了运行时的高可用的保障机制，可以容忍少数节点的故障和部分数据的丢失，但整体上不会丢失任何数据。

注意：replica无法做灾难性的数据保护，比如机房停电，所有机器全部宕机

二、使用snapshot备份数据

snapshot会将集群中的状态和数据全部存储到一个外部文件系统中比如HDFS。snapshot首次备份是全量备份，后面是增量备份。

1、集群每个节点安装repository-hdfs插件，bin/elasticsearch-plugin install repository-hdfs 然后重启集群

2、创建备份仓库

（1）给es用户赋hdfs目录权限

hadoop fs -chown -R esuser /es/

（2）创建备份仓库

curl -XPUT 'http://master:9200/_snapshot/hdfs_repository' -d '
{
"type": "hdfs",
"settings": {
"uri": "hdfs://master:9000",
"path": "/es/repository/es-prod",
"max_snapshot_bytes_per_sec": "50mb",
"max_restore_bytes_per_sec": "50mb"
}}'

es集群数据备份到hdfs

注意：

1）max_snapshot_bytes_per_sec:20mb/s

指定数据从es写入仓库的时候进行限流，默认值20mb/s

2）max_restore_bytes_per_sec:20mb/s

指定数据从仓库恢复到es的时候进行限流，默认值20mb/s

（3）新增一个索引

[[email protected] hadoop]# curl -XPUT 'http://master:9200/test_index/test_type/1' -d'
> {
> "name":"neo"
> }'
{"_index":"test_index","_type":"test_type","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}

3、对索引做备份（默认备份所有索引）

（1）对所有的索引进行snapshot备份

一个仓库可以包含多个snapshot，每个snapshot是一个部分索引的备份数据。

创建一份snapshot备份时，需要指定备份的索引。

1）PUT _snapshot/hdfs_repository/snapshot_1

该命令会将所有索引的数据放入snapshot_1备份，该命令会立即返回，然后再后台备份。

[[email protected] hadoop]# curl -XPUT http://master:9200/_snapshot/hdfs_repository/snapshot_1
{"accepted":true}

es集群数据备份到hdfs

也可以用命令查看备份结果

curl -XGET 'http://master:9200/_snapshot/hdfs_repository/snapshot_1'

[[email protected] hadoop]# curl -XGET 'http://master:9200/_snapshot/hdfs_repository/snapshot_1?pretty'
{
"snapshots" : [
{
"snapshot" : "snapshot_1",
"uuid" : "hYN2sWNiTWePxUM3wmS4HQ",
"version_id" : 5060999,
"version" : "5.6.9",
"indices" : [
"test_index"
],
"state" : "SUCCESS",
"start_time" : "2020-03-22T07:13:52.838Z",
"start_time_in_millis" : 1584861232838,
"end_time" : "2020-03-22T07:13:53.530Z",
"end_time_in_millis" : 1584861233530,
"duration_in_millis" : 692,
"failures" : [ ],
"shards" : {
"total" : 5,
"failed" : 0,
"successful" : 5
}
}
]
}
[[email protected] hadoop]#

2）PUT _snapshot/hdfs_repository/snapshot_1?wait_forcompletion=true

wait_forcompletion等待备份完成该命令返回

（2）对指定索引进行snapshot备份

默认备份所有索引，如果想之备份部分索引，可以使用下面配置

注意：

（1）ignore_unavailable，如果设置为true，则不存在的index会被忽略，不会进行备份。默认情况不设置

（2）include_global_state 设置为false，可以阻止集群把全局的state也作为snapshot一部分备份数据。

4、删除snapshot备份

curl -XDELETE http://master:9200/_snapshot/hdfs_repository/snapshot_1

三、数据恢复

1、删除索引

curl -XDELETE http://master:9200/test_index

[[email protected] hadoop]# curl -XGET http://master:9200/test_index/test_type/1?pretty
{
"error" : {
"root_cause" : [
{
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_expression",
"resource.id" : "test_index",
"index_uuid" : "_na_",
"index" : "test_index"
}
],
"type" : "index_not_found_exception",
"reason" : "no such index",
"resource.type" : "index_expression",
"resource.id" : "test_index",
"index_uuid" : "_na_",
"index" : "test_index"
},
"status" : 404
}
2、恢复数据

curl -XPOST http://master:9200/_snapshot/hdfs_repository/snapshot_1/_restore

[[email protected] hadoop]# curl -XGET http://master:9200/test_index/test_type/1?pretty
{
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"name" : "neo"
}
}

es集群数据备份到hdfs

一、es集群数据备份

二、使用snapshot备份数据

三、数据恢复

相关推荐