Windows中的eclipse远程连接Ubuntu中的Hadoop

Windows中的eclipse远程连接Ubuntu中的Hadoop

1.在windows下安装好eclipse

2. 解压hadoop-2.7.1.tar.gz到windows下的一个指定目录

3.安装 Hadoop-Eclipse-Plugin

要在 Eclipse 上编译和运行 MapReduce 程序，需要安装 hadoop-eclipse-plugin，可下载 Github 上的 hadoop2x-eclipse-plugin

https://raw.githubusercontent.com/winghc/hadoop2x-eclipse-plugin/master/release/hadoop-eclipse-plugin-2.6.0.jar

下载后，将 release 中的 hadoop-eclipse-kepler-plugin-2.6.0.jar复制到 Eclipse 安装目录的 plugins 文件夹中（如果你开着eclipse需要重启）。 unzip -qo ~

4.下载 hadoop.dll、winutils.exe

下载hadoop-common-2.7.1-bin-master.zip，然后解压后，把hadoop-common-2.7.1-bin-master下的bin全部复制放到我们Windows下解压的的Hadoop的bin目录下。

同时，将bin下面的hadoop.dll复制到C:\Windows\SysWOW64下（32位系统放到C:\Windows\System32下）。

然后配置环境变量JAVA_HOME和Path：

Windows中的eclipse远程连接Ubuntu中的Hadoop

重启电脑。

/下载/hadoop2x-eclipse-plugin-master.zip -d ~/下载

5.配置 Hadoop-Eclipse-Plugin

在启动Hadoop之前，确保Hadoop配置文件中core-site.xml文件中的local host改为系统对于的IP，以及我们在这个hdfs-site.xml添加

<property>

<name>dfs.permissions</name>
<value>false</value>
</property>

在继续配置前请确保已经开启了 Hadoop。

cd /usr/local/hadoop
./sbin/start-all.sh #启动hadoop

启动 Eclipse 后就可以在左侧的Project Explorer中看到 DFS Locations（若看到的是 welcome 界面，点击左上角的 x 关闭就可以看到了。 Windows中的eclipse远程连接Ubuntu中的Hadoop 安装好Hadoop-Eclipse-Plugin插件后的效果

插件需要进一步的配置。

第一步：选择 Window 菜单下的 Preference。

Windows中的eclipse远程连接Ubuntu中的Hadoop 打开Preference

此时会弹出一个窗体，窗体的左侧会多出 Hadoop Map/Reduce 选项，点击此选项，选择 Hadoop 的安装目录（Windows系统中Hadoop的解压目录）。

Windows中的eclipse远程连接Ubuntu中的Hadoop 选择 Hadoop 的安装目录

第二步：切换 Map/Reduce 开发视图，选择 Window 菜单下选择 Open Perspective -> Other，弹出一个窗体，从中选择 Map/Reduce 选项即可进行切换。

Windows中的eclipse远程连接Ubuntu中的Hadoop 切换 Map/Reduce 开发视图

第三步：建立与 Hadoop 集群的连接，点击 Eclipse软件右下角的 Map/Reduce Locations 面板，在面板中单击右键，选择 New Hadoop Location。

Windows中的eclipse远程连接Ubuntu中的Hadoop 建立与 Hadoop 集群的连接

在弹出来的 General 选项面板中，General 的设置要与 Hadoop 的配置一致。一般两个 Host 值是一样的，都写ubuntu系统的IP（比如我的是192.168.122.112）， DFS Master 的 Port 要改为 9000。Map/Reduce(V2) Master 的 Port 用默认的即可，Location Name 随意填写。User name填写Ubuntu用户名（我的是hadoop）。

最后的设置如下图所示：

Windows中的eclipse远程连接Ubuntu中的Hadoop Hadoop Location 的设置

Advanced parameters 选项面板是对 Hadoop 参数进行配置，实际上就是填写 Hadoop 的配置项(/usr/local/hadoop/etc/hadoop中的配置文件)，如我配置了 hadoop.tmp.dir ，就要进行相应的修改。但修改起来会比较繁琐，我们可以通过复制配置文件的方式解决（下面会说到）。

总之，我们只要配置 General 就行了，点击 finish，Map/Reduce Location 就创建好了。

6.在 Eclipse 中操作 HDFS 中的文件

配置好后，点击左侧 Project Explorer 中的 MapReduce Location （点击三角形展开）就能直接查看 HDFS 中的文件列表了（HDFS 中要有文件，如下图是 WordCount 的输出结果），双击可以查看内容，右键点击可以上传、下载、删除 HDFS 中的文件，无需再通过繁琐的 hdfs dfs -ls 等命令进行操作了。

如果无法查看，可右键点击 Location 尝试 Reconnect 或重启 Eclipse。

Tips

HDFS 中的内容变动后，Eclipse 不会同步刷新，需要右键点击 Project Explorer中的 MapReduce Location，选择 Refresh，才能看到变动后的文件。

7.在 Eclipse 中创建 MapReduce 项目

点击 File 菜单，选择 New -> Project...:

Windows中的eclipse远程连接Ubuntu中的Hadoop 创建Project

选择 Map/Reduce Project，点击 Next。

Windows中的eclipse远程连接Ubuntu中的Hadoop 创建MapReduce项目

填写 Project name 为 WordCount 即可，点击 Finish 就创建好了项目。

Windows中的eclipse远程连接Ubuntu中的Hadoop 填写项目名

此时在左侧的 Project Explorer 就能看到刚才建立的项目了。

Windows中的eclipse远程连接Ubuntu中的Hadoop 项目创建完成

接着右键点击刚创建的 WordCount 项目，选择 New -> Class

Windows中的eclipse远程连接Ubuntu中的Hadoop 新建Class

需要填写两个地方：在 Package 处填写 org.apache.hadoop.examples；在 Name 处填写 WordCount。

Windows中的eclipse远程连接Ubuntu中的Hadoop 填写Class信息

创建 Class 完成后，在 Project 的 src 中就能看到 WordCount.java 这个文件。将如下 WordCount 的代码复制到该文件中。

package org.apache.hadoop.examples;

import java.io.IOException;

import java.util.Iterator;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public WordCount() {

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();

if(otherArgs.length < 2) {

System.err.println("Usage: wordcount <in> [<in>...] <out>");

System.exit(2);

}

Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);

job.setMapperClass(WordCount.TokenizerMapper.class);

job.setCombinerClass(WordCount.IntSumReducer.class);

job.setReducerClass(WordCount.IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

for(int i = 0; i < otherArgs.length - 1; ++i) {

FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

}

FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));

System.exit(job.waitForCompletion(true)?0:1);

}

public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public IntSumReducer() {

}

public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {

int sum = 0;

IntWritable val;

for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {

val = (IntWritable)i$.next();

}

this.result.set(sum);

context.write(key, this.result);

}

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

private static final IntWritable one = new IntWritable(1);

private Text word = new Text();

public TokenizerMapper() {

}

public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

while(itr.hasMoreTokens()) {

this.word.set(itr.nextToken());

context.write(this.word, one);

}

8.通过 Eclipse 运行 MapReduce

在运行 MapReduce 程序前，还需要执行一项重要操作（也就是上面提到的通过复制配置文件解决参数设置问题）：将 /usr/local/hadoop/etc/hadoop 中将有修改过的配置文件（如伪分布式需要 core-site.xml 和 hdfs-site.xml），以及 log4j.properties 复制到 WordCount 项目下的 src 文件夹（~/workspace/WordCount/src）中：

复制完成后，务必右键点击 WordCount 选择 refresh 进行刷新（不会自动刷新，需要手动刷新），可以看到文件结构如下所示： Windows中的eclipse远程连接Ubuntu中的Hadoop

WordCount项目文件结构

点击工具栏中的 Run 图标，或者右键点击 Project Explorer 中的 WordCount.java，选择 Run As -> Run on Hadoop，就可以运行 MapReduce 程序了。不过由于没有指定参数，运行时会提示 "Usage: wordcount "，需要通过Eclipse设定一下运行参数。

注：我的Windows系统用户是Administrator，因为需在hdfs中创建Administrator用户。

/usr/local/Hadoop/bin/hdfs dfs -mkdir -p /user/Administrator/input

然后放入一些输入文件：

/usr/local/Hadoop/bin/hdfs dfs -put *** /user/Administrator/input

接着右键点击刚创建的 WordCount.java，选择 Run As -> Run Configurations，在此处可以设置运行时的相关参数（如果 Java Application 下面没有 WordCount，那么需要先双击 Java Application）。切换到 "Arguments" 栏，在 Program arguments 处填写 "input output" 就可以了。

Windows中的eclipse远程连接Ubuntu中的Hadoop 设置程序运行参数

再次运行程序（注意：要先在左上角面板里删除output文件夹，否则报错！），

可以看到运行成功的提示，刷新 DFS Location 后也能看到输出的 output 文件夹。

Windows中的eclipse远程连接Ubuntu中的Hadoop WordCount 运行结果

至此，你就可以使用 Eclipse 方便的进行 MapReduce程序的开发了。

Windows中的eclipse远程连接Ubuntu中的Hadoop

相关推荐