spark_scala
用hadoop的hadoop-mapreduce-examples-2.7.6.jar做简单数据分析:
先用xhell,xftp成功安装好了hadoop,jdk等软件,并成功运行。
在xhell中输入su hadoop切换用户hadoop,在/home/hadoop目录下创建data.txt文档
编辑文档data,输入hello python hadoop i am student i am teacher 保存并退出
查看文档内容:
[[email protected] ~]$ cat data.txt
hello python hadoop
i am student
i am teacher
将文档data上传至python5:50070 :(一般mapreduce分析服务器上的文档数据所以需要上传文档至服务器)
[[email protected] ~]$ hadoop fs -put data.txt
找到mapreduce的jar包:
[[email protected] ~]$ cd hadoop-2.7.6/
[[email protected] hadoop-2.7.6]$ ll (查看目录列表)
总用量 116
drwxr-xr-x. 2 hadoop hadoop 194 4月 18 09:39 bin
drwxr-xr-x. 3 hadoop hadoop 20 4月 18 09:39 etc
drwxr-xr-x. 2 hadoop hadoop 106 4月 18 09:39 include
drwxr-xr-x. 3 hadoop hadoop 20 4月 18 09:39 lib
drwxr-xr-x. 2 hadoop hadoop 239 4月 18 09:39 libexec
-rw-r--r--. 1 hadoop hadoop 86424 4月 18 09:39 LICENSE.txt
drwxrwxr-x 3 hadoop hadoop 4096 5月 12 18:29 logs
-rw-r--r--. 1 hadoop hadoop 14978 4月 18 09:39 NOTICE.txt
-rw-r--r--. 1 hadoop hadoop 1366 4月 18 09:39 README.txt
drwxr-xr-x. 2 hadoop hadoop 4096 4月 18 09:39 sbin
drwxr-xr-x. 4 hadoop hadoop 31 4月 18 09:39 share
进入 share目录,找到hadoop:
[[email protected] hadoop-2.7.6]$ cd share/
[[email protected] share]$ ll
总用量 0
drwxr-xr-x. 4 hadoop hadoop 42 4月 28 18:35 doc
drwxr-xr-x. 9 hadoop hadoop 99 4月 18 09:39 hadoop
进入hadoop目录,找到mapreduce:
[[email protected] share]$ cd hadoop/
[[email protected] hadoop]$ ll
总用量 8
drwxr-xr-x. 6 hadoop hadoop 158 4月 18 09:39 common
drwxr-xr-x. 7 hadoop hadoop 174 4月 18 09:39 hdfs
drwxr-xr-x. 3 hadoop hadoop 20 4月 18 09:39 httpfs
drwxr-xr-x. 3 hadoop hadoop 20 4月 18 09:39 kms
drwxr-xr-x. 5 hadoop hadoop 4096 4月 18 09:39 mapreduce
drwxr-xr-x. 5 hadoop hadoop 43 4月 18 09:39 tools
drwxr-xr-x. 5 hadoop hadoop 4096 4月 18 09:39 yarn
ll查看目录列表:
[[email protected] mapreduce]$ ll
总用量 5000
-rw-r--r--. 1 hadoop hadoop 545480 4月 18 09:39 hadoop-mapreduce-client-app-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 776862 4月 18 09:39 hadoop-mapreduce-client-common-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 1558777 4月 18 09:39 hadoop-mapreduce-client-core-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 191915 4月 18 09:39 hadoop-mapreduce-client-hs-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 27832 4月 18 09:39 hadoop-mapreduce-client-hs-plugins-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 62961 4月 18 09:39 hadoop-mapreduce-client-jobclient-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 1562342 4月 18 09:39 hadoop-mapreduce-client-jobclient-2.7.6-tests.jar
-rw-r--r--. 1 hadoop hadoop 72051 4月 18 09:39 hadoop-mapreduce-client-shuffle-2.7.6.jar
-rw-r--r--. 1 hadoop hadoop 295833 4月 18 09:39 hadoop-mapreduce-examples-2.7.6.jar
drwxr-xr-x. 2 hadoop hadoop 4096 4月 18 09:39 lib
drwxr-xr-x. 2 hadoop hadoop 30 4月 18 09:39 lib-examples
drwxr-xr-x. 2 hadoop hadoop 4096 4月 18 09:39 sources
运行命令:
[[email protected] mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.6.jar wordcount /user/hadoop/data.txt /user/output/
(用jar包做数据分析,分析data.txt这个文件, 输出结果在/user/output/下显示,无output目录则会自动创建)
分析成功后会在/user/output目录下生成两个文件,但是在页面看不到分析结果,可以用cat来查看:
[[email protected] mapreduce]$ hadoop fs -cat /user/output/part-r-00000
am 2
hadoop 1
hello 1
i 2
python 1
student 1
teacher 1
scala:
在xshell中启动spark:spark-shell
[[email protected] mapreduce]$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/05/12 19:45:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/12 19:45:57 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://192.168.209.110:4041
Spark context available as 'sc' (master = local[*], app id = local-1526125557763).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
成功运行