Apache Flume 1.9:安装配置测试(log2hive)
Apache Flume:1.9.0
1、下载
wget https://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
2、解压
tar -zxvf apache-flume-1.9.0-bin.tar.gz
3、数据流
4、创建一张hive 目标表
create table action_log
(id string,
write_date string,
name string)
COMMENT 'click action log'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
5、查看表hdfs目录
hdfs://bigdata-dev1.nexttao:8020/warehouse/tablespace/managed/hive/flume.db/action_log
6、插入一条测试数据
insert into action_log values ('1','2019-12-13 00:00:00','Raymond');
通过HDFS再插入一条
7、写一个简单flume配置文件
#agent1表示代理名称
agent1.sources=source1
agent1.sinks=sink1
agent1.channels=channel1
#配置source1
agent1.sources.source1.type=TAILDIR
agent1.sources.source1.filegroups = f1
agent1.sources.source1.filegroups.f1 = /data/log/tracy/.*log.*
agent1.sources.source1.channels=channel1
agent1.sources.source1.fileHeader = false
#加拦截器
agent1.sources.source1.interceptors = i1
#时间戳拦截器
agent1.sources.source1.interceptors.i1.type = timestamp
#配置channel1
agent1.channels.channel1.type=file
agent1.channels.channel1.checkpointDir=/data/flume/tracy/cheackpointDir
agent1.channels.channel1.dataDirs=/data/flume/tracy/dataDirs
#配置sink1
agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=hdfs://bigdata-dev1.nexttao:8020/warehouse/tablespace/managed/hive/flume.db/action_log
#DataStream类似于textfile
agent1.sinks.sink1.hdfs.fileType=DataStream
#只写入event的body部分
agent1.sinks.sink1.hdfs.writeFormat=TEXT
#hdfs创建多长时间新建文件,0不基于时间
agent1.sinks.sink1.hdfs.rollInterval=1
agent1.sinks.sink1.channel=channel1
agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d
8、启动flume-ng
./flume-ng agent -n agent1 -c ../conf -f ../conf/log2hive.properties -Dflume.root.logger=DEBUG,console
9、启动报错
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
更换guava jar包版本
10、在监控的log目录下添加新log
11、查看flume log
显示数据已经写入hdfs
12、查看hdfs
13、查看hive数据
14、再往1.log写入一条数据
查看hive:
总结:这是一个从下载flume,到配置log2hive的简单流程,只是简单跑通,后续需要做优化压测等