Elasticsearch冷热数据分离集群-离线数仓方案

    大数据存储方案预研,使用elasticsearch集群进行海量数据存储,并根据数据生产时间进行冷热数据分离,解决数据存储问题并保证实时查询的效率。

参考架构:

 

 

Elasticsearch冷热数据分离集群-离线数仓方案

3.1部署规划(虚拟机搭建)

节点

服务部署

磁盘类型

Jvm内存分配

Master1(2核4G)

Elasticsearch

SATA

4G

Master2(2核4G)

Elasticsearch

SATA

4G

Master3(2核4G)

Elasticsearch

SATA

4G

Es_hot1(8核16G)

Elasticsearch

SSD

8G

Es_hot2(8核16G)

Elasticsearch

SSD

8G

Es_hot3(8核16G)

Elasticsearch

SSD

8G

Es_cold1(4核16G)

Elasticsearch

SATA

8G

Es_cold2(4核16G)

Elasticsearch

SATA

8G

Es_cold3(4核16G)

Elasticsearch

SATA

8G

3.2安装配置

环境 Centos7  、jdk1.8+  、elasticsearch7.5

安装(略)

Master配置 /etc/elasticsearch/elasticsearch.yml

cluster.name: anjubao_parking

node.name: master-1

path.data: /data/elasticsearch

path.logs: /var/log/elasticsearch

network.host: 0.0.0.0

network.publish_host: 172.25.2.202

http.port: 9200

discovery.seed_hosts: ["172.25.2.201","172.25.2.202","172.25.2.203","172.25.2.207","172.25.2.208","172.25.2.199","172.25.2.196","172.25.2.197","172.25.2.198"]

cluster.initial_master_nodes: ["172.25.2.201","172.25.2.202","172.25.2.203"]

 

node.attr.rack: r6

node.master: true

node.data: false

node.ingest: false

node.ml: false

cluster.remote.connect: false

 

bootstrap.system_call_filter: false

http.cors.enabled: true

http.cors.allow-origin: "*"

cluster.max_shards_per_node: 5000

xpack.monitoring.enabled: false

 

 

Hot节点配置/etc/elasticsearch/elasticsearch.yml

cluster.name: anjubao_parking

node.name: hot-2

path.data: /data/elasticsearch

path.logs: /var/log/elasticsearch

network.host: 0.0.0.0

network.publish_host: 172.25.2.196

http.port: 9200

discovery.zen.ping.unicast.hosts: ["172.25.2.202","172.25.2.203","172.25.2.207","172.25.2.208","172.25.2.199","172.25.2.196","172.25.2.197","172.25.2.198"]

cluster.initial_master_nodes: ["172.25.2.202","172.25.2.203"]

node.attr.rack: r1

node.master: false

node.data: true

node.ingest: false

node.ml: false

cluster.remote.connect: false

bootstrap.system_call_filter: false

node.attr.box_type: hot

cluster.max_shards_per_node: 5000

xpack.monitoring.enabled: false

 

 

Cold节点配置/etc/elasticsearch/elasticsearch.yml

cluster.name: anjubao_parking

node.name: cold-2

path.data: /data/elasticsearch

path.logs: /var/log/elasticsearch

network.host: 0.0.0.0

network.publish_host: 172.25.2.207

http.port: 9200

discovery.seed_hosts: ["172.25.2.202","172.25.2.203","172.25.2.207","172.25.2.208","172.25.2.199","172.25.2.196","172.25.2.197","172.25.2.198"]

cluster.initial_master_nodes: ["172.25.2.202","172.25.2.203"]

node.attr.rack: r1

node.master: false

node.data: true

node.ingest: false

node.ml: false

cluster.remote.connect: false

bootstrap.system_call_filter: false

node.attr.box_type: cold

cluster.max_shards_per_node: 5000

xpack.monitoring.enabled: false

 

 

Jvm内存配置

cat /etc/elasticsearch/jvm.options

-Xms8g

-Xmx8g

  1. 启动运行

systemctl  start  elasticsearch

systemctl  stop  elasticsearch

 

  1. 验证

1、集群状态(正常)

Elasticsearch冷热数据分离集群-离线数仓方案

 

  1. 数据写入

2.1 创建模板

curl -H "Content-Type: application/json"   -XPUT http://localhost:9200/_template/pk_template -d '

{

        "template": "pk_*",

        "order":0,

        "settings": {

            "index": {

                "refresh_interval": "5s",

                "number_of_shards": "8",

                "number_of_replicas": "1",

"routing": {

"allocation": {

  "require": {

"box_type": "hot"

  }

}

}

            }

        }

} '

 

自动匹配pk_ 前缀的索引优先写入到hot节点。

 

  1. 实现数据从hot节点迁移到cold节点

curl -H "Content-Type: application/json" -XPUT http://localhost:9200/pk_xxx_2020-04/_settings?pretty -d'

{

  "settings": {

      "index.routing.allocation.require.box_type": "cold"

  }

}'