????????一般情況下,開發MapReduce程序后,我們需要將MapReduce程序打包成JAR包,然后再上傳到Hadoop集群通過命令行運行,這樣非常的不方便。為了提高開發效率,非常需要搭建一個Hadoop本地開發環境,下面簡單將一個步驟:
????????1.將集群上安裝的Hadoop整個文件夾復制到本地
????????2.本地設置Hadoop環境變量,我的本地Hadoop目錄是:D:\hadoop-2.6.0-cdh5.14.0,設置的變量如下所示:
#新建系統變量
HADOOP_HOME=D:\hadoop-2.6.0-cdh5.14.0
HADOOP_PREFIX=D:\hadoop-2.6.0-cdh5.14.0
HADOOP_BIN_PATH=%HADOOP_HOME%\bin
#在Path環境變量增加
%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin
????????3.Windows部署Hadoop還需要winutils.exe和hadoop.dll,下載winutils.exe以及對應版本的hadoop.dll,將hadoop.dll復制到系統盤的:C:\Windows\System32目錄下,同時將hadoop.dll和winutils.exe復制到本地Hadoop的bin目錄下,
????????下面是hadoop-2.6.0的winutils.exe和hadoop.dll:
https://pan.baidu.com/s/1VDD8k-9RBl1E5mSZXJO37w
????????4.這時可能需要重啟機器,我的是重啟之后才生效。
????????這時,就可以在本地直接提交MapReduce到集群了,提交任務代碼配置如下所示:
Configuration conf=new Configuration();
conf.addResource("hadoop/core-site.xml");
conf.addResource("hadoop/hdfs-site.xml");
conf.addResource("hadoop/mapred-site.xml");
conf.addResource("hadoop/yarn-site.xml");
conf.set("fs.defaultFS","hdfs://192.168.199.100:9000");
conf.set("mapreduce.framework.name","yarn");
conf.set("yarn.resourcemanager.address","192.168.199.100:8032");
conf.set("yarn.resourcemanager.scheduler.address","192.168.199.100:8030");
conf.set("yarn.resourcemanager.hostname","192.168.199.100");
conf.set("mapreduce.app-submission.cross-platform","true");
Job job=Job.getInstance(conf,"MRJob_1");
job.setJar("G:\\idea-workplace\\movie_hadoop.jar");
job.setJarByClass(MRJob_1.class);
job.setMapperClass(MRJob_1_Map.class);
job.setReducerClass(MRJob_1_Reduce.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
job.setPartitionerClass(UserIdPartition.class);
FileInputFormat.addInputPath(job,new Path(map.get("MR1_input")));
Path outputPath=new Path(map.get("MR1_output"));
FileOutputFormat.setOutputPath(job,outputPath);
int flag=job.waitForCompletion(true)?0:1;
????????上面只是一個示例,注意:提交前需要需要將MR工程導出為JAR,因為其無法自動打包,然后通過job的setJar方法設置JAR包的位置就可以了。