Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

Linux下监控系统搭建(Telegraf+Influxdb+Grafana)

 

一、安装文件准备(可提前去官网下载好)

 

telegraf-1.12.4-1.x86_64.rpm

influxdb-1.7.8.x86_64.rpm 单机的免费,集群的收费

grafana-6.4.3-1.x86_64.rpm

kapacitor-1.5.3.x86_64.rpm (TIGK技术栈的告警服务)

 

二、安装

 

1、创建存放软件目录

mkdir /home/ldw/monitor

把下载的安装文件上传到服务器的monitor目录下

登录到monitor所在目录下赋权

chmod -R 777 monitor

 

2、安装

安装命令:(如果是分布式监控,需要在其他client端安装telegraf)

rpm -ivh telegraf-1.12.4-1.x86_64.rpm

rpm -ivh influxdb-1.7.8.x86_64.rpm

rpm -ivh grafana-6.4.3-1.x86_64.rpm

rpm -ivh kapacitor-1.5.3.x86_64.rpm

 

安装过程:(登录到安装软件所在目录下)

[[email protected] monitor]# rpm -ivh telegraf-1.12.4-1.x86_64.rpm

准备中...                          ################################# [100%]

正在升级/安装...

   1:telegraf-1.12.4-1                ################################# [100%]

Created symlink from /etc/systemd/system/multi-user.target.wants/telegraf.service to /usr/lib/systemd/system/telegraf.service.

 

[[email protected] monitor]# rpm -ivh influxdb-1.7.8.x86_64.rpm

准备中...                          ################################# [100%]

正在升级/安装...

   1:influxdb-1.7.8-1                 ################################# [100%]

Created symlink from /etc/systemd/system/influxd.service to /usr/lib/systemd/system/influxdb.service.

Created symlink from /etc/systemd/system/multi-user.target.wants/influxdb.service to /usr/lib/systemd/system/influxdb.service.

 

[[email protected] monitor]# rpm -ivh grafana-6.4.3-1.x86_64.rpm

警告:grafana-6.4.3-1.x86_64.rpm: 头V4 RSA/SHA1 Signature, ** ID 24098cb6: NOKEY

准备中...                          ################################# [100%]

正在升级/安装...

   1:grafana-6.4.3-1                  ################################# [100%]

### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd

 sudo /bin/systemctl daemon-reload

 sudo /bin/systemctl enable grafana-server.service

### You can start grafana-server by executing

 sudo /bin/systemctl start grafana-server.service

POSTTRANS: Running script

 

[[email protected] monitor]# rpm -ivh kapacitor-1.5.3.x86_64.rpm

准备中...                          ################################# [100%]

正在升级/安装...

   1:kapacitor-1.5.3-1                ################################# [100%]

 

监控软件安装后的配置文件地址如下:

/etc/telegraf/telegraf.conf

/etc/influxdb/influxdb.conf

/etc/grafana/grafana.ini

/etc/kapacitor/kapacitor.conf

 

监控软件安装后的log文件地址如下:

/var/log/telegraf/telegraf.log

/var/log/influxdb/influxdb.log

/var/log/grafana/grafana.log

 

Grafana插件地址

/var/lib/grafana/plugins

 

Influxdb的后台文件保存位置:

/var/lib/influxdb/meta  #元数据/raft数据库的存储位置

/var/lib/influxdb/data  #TSM存储引擎存储TSM文件的目录

/var/lib/influxdb/wal   #TSM存储引擎存储WAL文件的目录

 

 

三、配置

 

1、Telegraf配置

[agent]

#修改数据采集间隔

interval = "5s"

 

[outputs.influxdb]

#修改对应的influxdb的url,IP修改成安装influxdb服务器的IP地址

urls = ["http://10.67.31.74:8086"]

#修改对应的influxdb的数据库名称,使用默认的telegraf就可以,后续启动influxdb数据库的时候要创建telegraf名称的数据库就可以。

database = "telegraf"

 

2、Influxdb配置

# Determines whether HTTP endpoint is enabled.主要作用是接收telegraf的数据并存储,提供API给Grafana调用数据

enabled = true

# The bind address used by the HTTP service.打开HTTP API使用的端口

bind-address = ":8086"

 

3、Grafana配置

# The public facing domain name used to access grafana from a browser 从浏览器访问grafana的面向公众的域名

;domain = 10.67.31.74

# The full public facing url you use in browser, used for redirects and emails 浏览器中使用的面向公众的完整url,用于重定向和电子邮件

;root_url = http://10.67.31.74:3000

默认的登录用户名密码都是admin,不用修改

 

 

四、启动

 

启动命令:

systemctl start telegraf

systemctl start influxdb

systemctl start grafana-server

 

查看启动情况

systemctl status telegraf

systemctl status influxdb

systemctl status grafana-server

 

停止命令:

systemctl stop telegraf

systemctl stop influxdb

systemctl stop grafana-server

 

 

五、Influxdb数据库配置

 

启动influxdb后,需要配置下数据库

 

[[email protected] ~]# influx

   Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.

   Connected to http://localhost:8086 version 1.0.2

   InfluxDB shell version: 1.0.2

   > create user "telegraf" with password 'telegraf'

   > show users;

   user     admin

   telegraf false

   > create database telegraf

   > show databases

   name: databases

   ---------------

   name

   _internal

   telegraf

 

#使用数据库

>use telegraf

 

#显示该数据库中所有的表

>show measurements

 

 

六、Grafana使用

 

登录Grafana

http://10.67.31.74:3000

用户名密码:admin/admin

 

登录后配置数据源

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

 

配置数据源:

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

 

 

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

提前下载了合适的Dashboard文件,直接导入。选用server-single_rev3.json

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

 

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

 

 

然后可以自己起个模板名字,选择influxdb类型数据库,点击import进行导入。

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

 

导入成功后,就可以进行模板的配置。

这个模板server-single_rev3.json有特殊的配置要求,需要重新配置telegraf,以下是配置信息,需要到linux后台重新配置telegraf.conf文件。

 

telegraf.conf重新配置:

--------------------------------------------------------------------------------------------------------

[global_tags]

 

  host = "$HOSTNAME"

  ##注意每个client都要配置自己的hostname

[agent]

 

  interval = "5m"

 

[[outputs.influxdb]]

 

  urls = ["http://mydomain.invalid:8086"]

 

  database = "servermonitor"

 

[[inputs.cpu]]

 

  percpu = false

 

  totalcpu = true

 

  collect_cpu_time = true

 

  fielddrop = ["time_guest","time_guest_nice","time_irq","time_nice","time_softirq","time_steal","usage_guest","usage_guest_nice","usage_irq","usage_nice","usage_softirq","usage_steal"]

 

  interval = "2s"

 

[[inputs.disk]]

 

  mount_points = ["/","/var","/data"]

 

  fielddrop=["used","inodes_used"]

 

[[inputs.mem]]

 

  fielddrop=["active","buffered","cached","free","inactive","used","used_percent"]

 

[[inputs.processes]]

 

[[inputs.swap]]

 

  fielddrop=["free","total"]

 

[[inputs.system]]

 

  fielddrop=["n_users","uptime_format"]

 

[[inputs.nstat]]

 

  interval = "2s"

 

  #proc_net_netstat = "" # this is of interest.

    ##注意:这条不知道别配,先注释掉,否则配置成空,telegraf会启动不了。

  fieldpass = ["IpExtOutOctets","IpExtInOctets"]

 

telegraf.conf文件配置完成后要重启telegraf。

 

可以通过脚本或者手动,重新启动telegraf+influxdb+grafana.重新登录grafana就可以看到下面的截图,保留自己想监控的指标,其他指标删除了就可以了。

这个模板的好处就是可以通过左上角的hostname来随时切换无服务。进行不同服务器的监控指标查看。

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

 

 

上面模板各个指标的配置条件导出:

CPU:

SELECT mean("n_cpus") FROM "system" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

SELECT mean("usage_system") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

SELECT mean("usage_user") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

SELECT mean("usage_iowait") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

 

RAM:

SELECT mean("available") FROM "mem" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

SELECT mean("total") FROM "mem" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

 

swap:

SELECT derivative(mean("in"), 1s) FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

SELECT derivative(mean("out"), 1s) FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

SELECT mean("used_percent") FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

 

Disk:

SELECT mean("total") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

SELECT mean("free") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

SELECT mean("inodes_total") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

SELECT mean("inodes_free") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

 

Processes:

SELECT mean("total") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT mean("running") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT mean("blocked") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT mean("stopped") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT max("blocked") FROM "processes" WHERE $timeFilter GROUP BY time($interval), "host" fill(null)

 

 

七、脚本

 

附件是一键启动、停止监控脚本。参考。

/home/ldw/monitor/script

Linux下轻量级监控系统搭建(Telegraf+Influxdb+Grafana)

 

脚本内容参考:

start.sh

ssh [email protected] 'systemctl start telegraf'&ssh [email protected] 'systemctl start influxdb'&ssh [email protected] 'systemctl start grafana-server'&ssh [email protected] 'systemctl start telegraf'&

stop.sh

ssh [email protected] 'systemctl stop telegraf'&ssh [email protected] 'systemctl stop influxdb'&ssh [email protected] 'systemctl stop grafana-server'&ssh [email protected] 'systemctl stop telegraf'&