使用Flume采集系统日志并写入hdfs文件系统
一、环境准备
前面已经配置好了hadoop的简单环境和hdfs分布式文件系统
https://blog.****.net/lsysafe/article/details/105250714
下载Flume
http://archive.apache.org/dist/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
二、安装配置
安装就是解压并配置JAVA_HOME即可
[[email protected] hadoop]# tar -zxvf /shared/app/install/tar.gz/apache-flume-1.8.0-bin.tar.gz -C /hadoop/
[[email protected] conf]# echo $JAVA_HOME
/usr/local/java
解压后到conf目录修改配置
[[email protected] conf]# cp flume-env.sh.template flume-env.sh
配置后的文件内容
[[email protected] conf]# cat /hadoop/apache-flume-1.8.0-bin/conf/flume-env.sh | grep -v ^# | grep -v ^$
export JAVA_HOME=/usr/local/java
三、配置采集的内容和方式
拷贝hdfs的配置文件core-site.xml ,hdfs-site.xml至flume的conf目录
[[email protected] hadoop]# cp core-site.xml /hadoop/apache-flume-1.8.0-bin/conf/
[[email protected] hadoop]# cp hdfs-site.xml /hadoop/apache-flume-1.8.0-bin/conf/
编写采集系统日志到hdfs文件系统的配置文件
[[email protected] conf]# pwd
/hadoop/apache-flume-1.8.0-bin/conf
配置好的.conf文件,就是查看 tail -f /var/log/messages的内容,用来测试
[[email protected] conf]# cat logtohdfs.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /var/log/messages
a1.channels.c1.type = memory
a1.sinks.k1.channel = c1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /message_%y.%m.%d.%H.%M.%S
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 24
a1.sinks.k1.hdfs.roundUnit = hour
a1.sinks.k1.hdfs.rollInterval = 1000
a1.sinks.k1.hdfs.rollSize = 1024
a1.sinks.k1.hdfs.rollCount = 10
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = systems
a1.sinks.k1.hdfs.fileSuffix = .log
四、启动agent采集日志
[[email protected] apache-flume-1.8.0-bin]# bin/flume-ng agent -c conf -f conf/logtohdfs.conf -n a1 -Dflume.root.logger=INFO,console
五、验证和查看生成的文件和内容