spark_scala

用hadoop的hadoop-mapreduce-examples-2.7.6.jar做简单数据分析:

先用xhell,xftp成功安装好了hadoop,jdk等软件,并成功运行。

在xhell中输入su hadoop切换用户hadoop,在/home/hadoop目录下创建data.txt文档

编辑文档data,输入hello python hadoop     i am student     i am teacher 保存并退出


查看文档内容:

[[email protected] ~]$ cat data.txt
hello python hadoop 
i am student 
i am teacher

将文档data上传至python5:50070 :(一般mapreduce分析服务器上的文档数据所以需要上传文档至服务器)

[[email protected] ~]$ hadoop fs -put data.txt

找到mapreduce的jar包:

[[email protected] ~]$ cd hadoop-2.7.6/

[[email protected] hadoop-2.7.6]$ ll   (查看目录列表)
总用量 116
drwxr-xr-x. 2 hadoop hadoop   194 4月  18 09:39 bin
drwxr-xr-x. 3 hadoop hadoop    20 4月  18 09:39 etc
drwxr-xr-x. 2 hadoop hadoop   106 4月  18 09:39 include
drwxr-xr-x. 3 hadoop hadoop    20 4月  18 09:39 lib
drwxr-xr-x. 2 hadoop hadoop   239 4月  18 09:39 libexec
-rw-r--r--. 1 hadoop hadoop 86424 4月  18 09:39 LICENSE.txt
drwxrwxr-x  3 hadoop hadoop  4096 5月  12 18:29 logs
-rw-r--r--. 1 hadoop hadoop 14978 4月  18 09:39 NOTICE.txt
-rw-r--r--. 1 hadoop hadoop  1366 4月  18 09:39 README.txt
drwxr-xr-x. 2 hadoop hadoop  4096 4月  18 09:39 sbin

drwxr-xr-x. 4 hadoop hadoop    31 4月  18 09:39 share

进入 share目录,找到hadoop:

[[email protected] hadoop-2.7.6]$ cd share/
[[email protected] share]$ ll
总用量 0
drwxr-xr-x. 4 hadoop hadoop 42 4月  28 18:35 doc 

drwxr-xr-x. 9 hadoop hadoop 99 4月  18 09:39 hadoop

进入hadoop目录,找到mapreduce:

[[email protected] share]$ cd hadoop/
[[email protected] hadoop]$ ll
总用量 8
drwxr-xr-x. 6 hadoop hadoop  158 4月  18 09:39 common
drwxr-xr-x. 7 hadoop hadoop  174 4月  18 09:39 hdfs
drwxr-xr-x. 3 hadoop hadoop   20 4月  18 09:39 httpfs
drwxr-xr-x. 3 hadoop hadoop   20 4月  18 09:39 kms
drwxr-xr-x. 5 hadoop hadoop 4096 4月  18 09:39
mapreduce
drwxr-xr-x. 5 hadoop hadoop   43 4月  18 09:39 tools

drwxr-xr-x. 5 hadoop hadoop 4096 4月  18 09:39 yarn

ll查看目录列表:

[[email protected] mapreduce]$ ll
总用量 5000
-rw-r--r--. 1 hadoop hadoop  545480 4月  18 09:39 hadoop-mapreduce-client-app-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop  776862 4月  18 09:39 hadoop-mapreduce-client-common-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 1558777 4月  18 09:39 hadoop-mapreduce-client-core-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop  191915 4月  18 09:39 hadoop-mapreduce-client-hs-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop   27832 4月  18 09:39 hadoop-mapreduce-client-hs-plugins-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop   62961 4月  18 09:39 hadoop-mapreduce-client-jobclient-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 1562342 4月  18 09:39 hadoop-mapreduce-client-jobclient-2.7.6-tests.jar
-rw-r--r--. 1 hadoop hadoop   72051 4月  18 09:39 hadoop-mapreduce-client-shuffle-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop  295833 4月  18 09:39
hadoop-mapreduce-examples-2.7.6.jar
drwxr-xr-x. 2 hadoop hadoop    4096 4月  18 09:39 lib
drwxr-xr-x. 2 hadoop hadoop      30 4月  18 09:39 lib-examples

drwxr-xr-x. 2 hadoop hadoop    4096 4月  18 09:39 sources

运行命令:

[[email protected] mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.6.jar  wordcount /user/hadoop/data.txt /user/output/ 

(用jar包做数据分析,分析data.txt这个文件, 输出结果在/user/output/下显示,无output目录则会自动创建)

spark_scala

分析成功后会在/user/output目录下生成两个文件,但是在页面看不到分析结果,可以用cat来查看:

[[email protected] mapreduce]$ hadoop fs -cat /user/output/part-r-00000
am 2
hadoop 1
hello 1
i 2
python 1
student 1

teacher 1


scala:

在xshell中启动spark:spark-shell

[[email protected] mapreduce]$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/05/12 19:45:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/12 19:45:57 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://192.168.209.110:4041
Spark context available as 'sc' (master = local[*], app id = local-1526125557763).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.


scala> 

成功运行