Centos7下安装(Hadoop2.7.6+Spark2.3.1)

预先准备:
jdk1.8安装包、scala2.11.8安装包、hadoop2.7.6安装包、spark2.3.1安装包

1、安装JDK

解压jdk-8u181-linux-x64.tar.gz到/usr/local/java/文件夹下
修改配置文件vi /etc/profile

export JAVA_HOME=/usr/local/java/jdk1.8.0_181
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin

Centos7下安装(Hadoop2.7.6+Spark2.3.1)
修改成功后 执行 source /etc/profile使文件生效
验证是否成功
Centos7下安装(Hadoop2.7.6+Spark2.3.1)

2、关闭防火墙

执行下面命令

systemctl stop firewalld && systemctl disable firewalld
setenforce 0

修改文件vim /etc/selinux/config

SELINUX=disabled

重启服务器 reboot

3、Scala安装

解压scala-2.11.8.tgz到/usr/local/scala/文件夹下
修改配置/etc/profile

# 在最后下添加
export SCALA_HOME=/usr/local/scala/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin

修改成功后 执行 source /etc/profile使文件生效
验证是否成功
Centos7下安装(Hadoop2.7.6+Spark2.3.1)

4、修改机器名称

查看当前机器名称
hostname
改名执行vi /etc/hostname

spark1

重启
克隆这台机器,分别克隆为spark2、spark3、spark4.

5、修改/etc/hosts文件

192.168.xxx.xxx	spark1
192.168.xxx.xxx	spark2
192.168.xxx.xxx	spark3
192.168.xxx.xxx	spark4

可以使用ifconfig命令查看每台机器的ip
机器之间互ping一下,验证是否可以ping通

6、免密登录配置

生成公钥

ssh-****** -t rsa

生成的公钥默认位于/root/.ssh目录
将公钥内容写入authorized_keys

cat id_rsa >> authorized_keys

每个虚拟机分别执行

ssh-copy-id -i sparkx

分别测试每台机器,除了第一次需要输入密码,之后不再需要输入密码便可直接登录

ssh spark2

7、Hadoop集群

解压hadoop-2.7.6.tar.gz到/usr/local/hadoop/文件夹下
修改/etc/profile

export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.6
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib:$JRE_HOME/lib:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

新建目录在/usr/local/hadoop下

mkdir tmp
mkdir var
mkdir dfs
mkdir dfs/name
mkdir dfs/data

在/usr/local/hadoop/hadoop-2.7.6/etc/hadoop目录下,配置Hadoop文件
修改hadoop-env.sh文件

export JAVA_HOME=/usr/local/java/jdk1.8.0_181

修改core-site.xml文件

<configuration>
     <property>
         <name>fs.defaultFS</name>
         <value>hdfs://spark1:9000</value>
     </property>

     <!-- 指定hadoop运行时产生文件的存储路径 -->
     <property>
         <name>hadoop.tmp.dir</name>
         <value>file:/usr/local/hadoop/tmp</value>
     </property>
</configuration>

修改hdfs-site.xml文件

        <!-- 设置namenode的http通讯地址 -->
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>spark1:50090</value>
        </property>
        <!-- 设置hdfs副本数量 -->
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
         <!-- 设置namenode存放的路径 -->
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file: /usr/local/hadoop/dfs/name</value>
        </property>
         <!-- 设置datanode存放的路径 -->
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file: /usr/local/hadoop/dfs/data</value>
        </property>
        <!-- 设置Secondary NameNode的地址 -->
        <property>
             <name>dfs.secondary.http.address</name>
             <value>spark2:50090</value>
         </property>

配置slaves文件

spark2
spark3
spark4

先把配置的文件分发给各个节点
注意:不需要在另外几个虚拟机上新建hadoop文件夹,直接将hadoop整个文件夹发送过去

scp -r /usr/local/hadoop/ [email protected]:/usr/local/hadoop/

运行HDFS
在spark1上格式化文件系统

hdfs  namenode  -format
start-dfs.sh

启动了一个NameNode,一个Secondary NameNode和3个DataNode
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
测试启动成功(一开始只将spark3和spark4作为datanode,后面在slaves文件中添加了spark2)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
配置YARN和MapReduce
编辑mapred-site.xml 文件

		 <!-- 通知框架MR使用YARN -->
         <property>
                 <name>mapreduce.framework.name</name>
                 <value>yarn</value>
         </property>
         <property>
                 <name>mapreduce.jobhistory.address</name>
                 <value>spark1:10020</value>
         </property>
         <property>
                 <name>mapreduce.jobhistory.webapp.address</name>
                 <value>spark1:19888</value>
         </property>

编辑yarn-site.xml文件

		<!-- 设置 resourcemanager 在哪个节点-->
 		<!-- Site specific YARN configuration properties -->
         <property>
          <name>yarn.resourcemanager.hostname</name>
                 <value>spark1</value>
         </property>
          <!-- reducer取数据的方式是mapreduce_shuffle -->
         <property>
                 <name>yarn.nodemanager.aux-services</name>
                 <value>mapreduce_shuffle</value>
         </property>

编辑yarn-env.sh文件

export JAVA_HOME=/usr/local/java/jdk1.8.0_181

启动YARN,将会启动ResourceManager和NodeManager

start-yarn.sh

Centos7下安装(Hadoop2.7.6+Spark2.3.1)
测试:ResourceManager在NodeName上运行,NodeManager在Slave上运行
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
访问网址,查看详情
spark1:50070
spark:8088

8、Spark集群

将spark-2.3.1-bin-hadoop2.7.tgz解压至/usr/local/spark
修改/etc/profile文件

export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.7
PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

source /etc/profile使之生效
在/usr/local/spark/spark-2.3.1-bin-hadoop2.7/conf下修改配置文件

cp spark-env.sh.template spark-env.sh

修改spark-env.sh文件

export JAVA_HOME=/usr/local/java/jdk1.8.0_181
export SCALA_HOME=/usr/local/scala/scala-2.11.8
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.6
export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.7.6/etc/hadoop
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.7
export SPARK_WORKER_MEMORY=1G
export SPARK_WORKER_CORES=2
export SPARK_MASTER_IP=spark1
export SPARK_MASTER_PORT=7077
cp slaves.template slaves

修改slaves文件

spark2
spark3
spark4

将配置文件传到各个节点上

scp -r /usr/local/spark/ [email protected]:/usr/local/spark/
scp -r /etc/profile [email protected]:/etc/profile

启动Spark集群

./start-all.sh

Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
Centos7下安装(Hadoop2.7.6+Spark2.3.1)
启动成功
访问网站 http://spark1:8080/

Centos7下安装(Hadoop2.7.6+Spark2.3.1)
最终,整体的配置文件为

export JAVA_HOME=/usr/local/java/jdk1.8.0_181
export SCALA_HOME=/usr/local/scala/scala-2.11.8
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.6
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.7
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib:$JRE_HOME/lib:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin

Centos7下安装(Hadoop2.7.6+Spark2.3.1)