Spark的安装和单词计数
目录
4、修改slave文件,添加work节点:(只写两台从机的节点)
6、在sbin目录下的spark-config.sh 文件中加入如下配置:
一、Standalone模式安装
1、上传并解压spark安装包
[[email protected] sorfware]$ tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /usr/local/
[[email protected] sorfware]$ mv spark-2.4.4-bin-hadoop2.7 spark
2、进入spark安装目录下的conf文件夹
[[email protected] sorfware]$ cd spark/conf/
3、修改配置文件名称
[[email protected] conf]$ mv slaves.template slaves
[[email protected] conf]$ mv spark-env.sh.template spark-env.sh
4、修改slave文件,添加work节点:(只写两台从机的节点)
[[email protected] conf]$ vim slaves
hadoop102
hadoop103
5、修改spark-env.sh文件,添加如下配置:
[[email protected] conf]$ vim spark-env.sh
SPARK_MASTER_HOST=hadoop101
SPARK_MASTER_PORT=7077 服务端口
6、在sbin目录下的spark-config.sh 文件中加入如下配置:
export JAVA_HOME=export JAVA_HOME=/usr/local/jdk1.8.0_91
7、分发spark包
[[email protected] local]$ scp -r spark hadoop102:/usr/local
[[email protected] local]$ scp -r spark hadoop103:/usr/local
8、启动
[[email protected] spark]$ sbin/start-all.sh
网页查看:hadoop101:8080
二、单词计数
1、创建文件
[[email protected] spark]# mkdir input
[[email protected] spark]# cd input
[[email protected] spark]# vim hello.txt
[[email protected] input]# scp -r /usr/local/spark/input hadoop102:/usr/local/spark/input
[[email protected] input]# scp -r /usr/local/spark/input hadoop103:/usr/local/spark/input
备注:file://来表明使用的是本地文件系统!!同时要保证每台节点都要有该文件!
2、启动
/usr/local/spark/bin/spark-shell \
--master spark://hadoop101:7077 \
--executor-memory 1g \
--total-executor-cores 2
3、在scala>中测试
1)在本地测试
scala> sc.textFile("input/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
2)在hdfs上测试
scala> sc.textFile("hdfs://hadoop101:9000/hello/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect