使用Flume采集系统日志并写入hdfs文件系统

一、环境准备

前面已经配置好了hadoop的简单环境和hdfs分布式文件系统

https://blog.****.net/lsysafe/article/details/105250714

下载Flume

http://archive.apache.org/dist/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz

 

 

二、安装配置

安装就是解压并配置JAVA_HOME即可

[[email protected] hadoop]# tar -zxvf /shared/app/install/tar.gz/apache-flume-1.8.0-bin.tar.gz -C /hadoop/

[[email protected] conf]# echo $JAVA_HOME
/usr/local/java

解压后到conf目录修改配置

[[email protected] conf]#  cp flume-env.sh.template flume-env.sh

配置后的文件内容

[[email protected] conf]# cat /hadoop/apache-flume-1.8.0-bin/conf/flume-env.sh | grep -v ^# | grep -v ^$
export JAVA_HOME=/usr/local/java

 

 

三、配置采集的内容和方式

拷贝hdfs的配置文件core-site.xml ,hdfs-site.xml至flume的conf目录

[[email protected] hadoop]# cp core-site.xml /hadoop/apache-flume-1.8.0-bin/conf/
[[email protected] hadoop]# cp hdfs-site.xml /hadoop/apache-flume-1.8.0-bin/conf/

 

编写采集系统日志到hdfs文件系统的配置文件

[[email protected] conf]# pwd
/hadoop/apache-flume-1.8.0-bin/conf

 

配置好的.conf文件,就是查看 tail -f /var/log/messages的内容,用来测试


[[email protected] conf]# cat logtohdfs.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /var/log/messages
a1.channels.c1.type = memory
a1.sinks.k1.channel = c1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /message_%y.%m.%d.%H.%M.%S
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 24
a1.sinks.k1.hdfs.roundUnit = hour
a1.sinks.k1.hdfs.rollInterval = 1000
a1.sinks.k1.hdfs.rollSize = 1024
a1.sinks.k1.hdfs.rollCount = 10
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = systems
a1.sinks.k1.hdfs.fileSuffix = .log

 

 

四、启动agent采集日志

[[email protected] apache-flume-1.8.0-bin]# bin/flume-ng agent -c conf -f conf/logtohdfs.conf -n a1  -Dflume.root.logger=INFO,console

使用Flume采集系统日志并写入hdfs文件系统

 

 

五、验证和查看生成的文件和内容

使用Flume采集系统日志并写入hdfs文件系统