Elasticsearch冷热数据分离集群-离线数仓方案
大数据存储方案预研,使用elasticsearch集群进行海量数据存储,并根据数据生产时间进行冷热数据分离,解决数据存储问题并保证实时查询的效率。
参考架构:
3.1部署规划(虚拟机搭建)
节点 |
服务部署 |
磁盘类型 |
Jvm内存分配 |
Master1(2核4G) |
Elasticsearch |
SATA |
4G |
Master2(2核4G) |
Elasticsearch |
SATA |
4G |
Master3(2核4G) |
Elasticsearch |
SATA |
4G |
Es_hot1(8核16G) |
Elasticsearch |
SSD |
8G |
Es_hot2(8核16G) |
Elasticsearch |
SSD |
8G |
Es_hot3(8核16G) |
Elasticsearch |
SSD |
8G |
Es_cold1(4核16G) |
Elasticsearch |
SATA |
8G |
Es_cold2(4核16G) |
Elasticsearch |
SATA |
8G |
Es_cold3(4核16G) |
Elasticsearch |
SATA |
8G |
3.2安装配置
环境 Centos7 、jdk1.8+ 、elasticsearch7.5
安装(略)
Master配置 /etc/elasticsearch/elasticsearch.yml
cluster.name: anjubao_parking node.name: master-1 path.data: /data/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 network.publish_host: 172.25.2.202 http.port: 9200 discovery.seed_hosts: ["172.25.2.201","172.25.2.202","172.25.2.203","172.25.2.207","172.25.2.208","172.25.2.199","172.25.2.196","172.25.2.197","172.25.2.198"] cluster.initial_master_nodes: ["172.25.2.201","172.25.2.202","172.25.2.203"]
node.attr.rack: r6 node.master: true node.data: false node.ingest: false node.ml: false cluster.remote.connect: false
bootstrap.system_call_filter: false http.cors.enabled: true http.cors.allow-origin: "*" cluster.max_shards_per_node: 5000 xpack.monitoring.enabled: false
|
Hot节点配置/etc/elasticsearch/elasticsearch.yml
cluster.name: anjubao_parking node.name: hot-2 path.data: /data/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 network.publish_host: 172.25.2.196 http.port: 9200 discovery.zen.ping.unicast.hosts: ["172.25.2.202","172.25.2.203","172.25.2.207","172.25.2.208","172.25.2.199","172.25.2.196","172.25.2.197","172.25.2.198"] cluster.initial_master_nodes: ["172.25.2.202","172.25.2.203"] node.attr.rack: r1 node.master: false node.data: true node.ingest: false node.ml: false cluster.remote.connect: false bootstrap.system_call_filter: false node.attr.box_type: hot cluster.max_shards_per_node: 5000 xpack.monitoring.enabled: false
|
Cold节点配置/etc/elasticsearch/elasticsearch.yml
cluster.name: anjubao_parking node.name: cold-2 path.data: /data/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 network.publish_host: 172.25.2.207 http.port: 9200 discovery.seed_hosts: ["172.25.2.202","172.25.2.203","172.25.2.207","172.25.2.208","172.25.2.199","172.25.2.196","172.25.2.197","172.25.2.198"] cluster.initial_master_nodes: ["172.25.2.202","172.25.2.203"] node.attr.rack: r1 node.master: false node.data: true node.ingest: false node.ml: false cluster.remote.connect: false bootstrap.system_call_filter: false node.attr.box_type: cold cluster.max_shards_per_node: 5000 xpack.monitoring.enabled: false
|
Jvm内存配置
cat /etc/elasticsearch/jvm.options
-Xms8g -Xmx8g |
- 启动运行
systemctl start elasticsearch
systemctl stop elasticsearch
- 验证
1、集群状态(正常)
- 数据写入
2.1 创建模板
curl -H "Content-Type: application/json" -XPUT http://localhost:9200/_template/pk_template -d ' { "template": "pk_*", "order":0, "settings": { "index": { "refresh_interval": "5s", "number_of_shards": "8", "number_of_replicas": "1", "routing": { "allocation": { "require": { "box_type": "hot" } } } } } } '
|
自动匹配pk_ 前缀的索引优先写入到hot节点。
- 实现数据从hot节点迁移到cold节点
curl -H "Content-Type: application/json" -XPUT http://localhost:9200/pk_xxx_2020-04/_settings?pretty -d' { "settings": { "index.routing.allocation.require.box_type": "cold" } }' |