How to install a Hadoop single node cluster on Windows 10

This is a short guide on how to install Hadoop single node cluster on a Windows computer without Cygwin. The intention behind this little test, is to have a test environment for Hadoop in your own local Windows environment.

The process is straight forward. First, we need to download and install the following software:

Java

Download the Java 1.8 from https://java.com/en/download/

Once installed confirm that you’re running the correct version from command line using ‘java -version’ command, output of which you can confirm in command line like this:

WinRAR

I’ve downloaded and installed WinRAR 64 bit release from http://www.rarlab.com/download.htm that will later allow me to decompress Linux type tar.gz packages on Windows.

Hadoop

The next step was to install a Hadoop distribution. To do so, I’ve decided to download the most recent release Hadoop 3.0.0-alpha2 (25 Jan, 2017) in a binary form, from the Apache Download Mirror at http://hadoop.apache.org/releases.html

Once the hadoop-3.0.0-alpha2.tar.gz (250 MB) downloaded, I’ve extracted it by using WinRAR (installed in the previous step) into C:\hadoop-3.0.0-alpha2 folder:

Now that I had Hadoop downloaded, it was time to start the Hadoop cluster with a single node.

Setup Environmental Variables

In Windows 10 I’ve opened System Properties windows and clicked on Environment Variables button:

Then created a new HADOOP_HOME variable and pointed the path to C:\hadoop-3.0.0-alpha2\bin folder on my PC:

Next step was to add a Hadoop bin directory path to PATH variable. Clicked on PATH and pressed edit:

Then added a ‘C:\hadoop-3.0.0-alpha2\bin’ path like this and pressed OK:

Edit Hadoop Configuration

Next thing I’ve configured Hadoop to start on localhost and port 9000, by editing:

C:\hadoop-3.0.0-alpha2\etc\hadoop\core-site.xml file, just like this:

Next I went to C:\hadoop-3.0.0-alpha2\etc\hadoop folder and renamed mapred-site.xml.template to mapred-site.xml.

Then I’ve edited the mapred-site.xml file adding the following XML Yarn configuration for Mapreduce:

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

 

This is what the file looks like when configured:

The next step was to created a new ‘data’ folder in Hadoop’s home directory (C:\hadoop-3.0.0-alpha2\data).

Once done, the next step was to add a data node and name node to Hadoop, by editing c:\hadoop-3.0.0-alpha2\etc\hadoop\hdfs-site.xml file.

And added following configuration to this XML file:

<configuration>

<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>C:/hadoop-3.0.0-alpha2/data/namenode</value> </property>

<property> <name>dfs.datanode.data.dir</name> <value>C:/hadoop-3.0.0-alpha2/data/datanode</value> </property>

</configuration>

 

In above step, I had to make sure that I am pointing to location of my newly created data folder and append the datanode and namenode as shown in example.

This is what hdfs-site.xml file looked like once completed:

The next step was to add site specific YARN configuration properties by editing yarn-site.xml at C:\hadoop-3.0.0-alpha2\etc\hadoop\yarn-site.xml, like this:

<configuration> <property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value> </property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

 

This is what yarn-site.xml file looked like once completed:

Then I continued by editing hadoop-env.cmd in C:\hadoop-3.0.0-alpha2\etc\hadoop\hadoop-env.cmd. I’ve changed the line for JAVA_HOME=%JAVA_HOME% and added a path to my JAVA folder: C:\PROGRA~1\Java\JDK18~1.0_1

It’s usually better to use Windows short names here.

So I went to C:\Program Files\Java\jdk1.8.0_111 where my Java JDK is installed and converted a long path to windows short name:

Next step was to open hadoop-env.cmd and add it in there, as shown in this screenshot:

Next in C:\hadoop-3.0.0-alpha2\bin using windows command prompt as admin run: ‘hdfs namenode -format’ command.

Output looked like this:

Then I’ve finally started Hadoop. I’ve opened command prompt as admin in C:\hadoop-3.0.0-alpha2\sbin and ran

start-dfs.cmd and also start-yarn.cmd, like this:

 

 

 

Open Hadoop GUI

Once all above steps were completed, I’ve opened browser and navigated to: http://localhost:8088/cluster

 

Next Steps?

This section won’t go into details of setting up IntelliJ, etc. But just very briefly… If you want to play with WordCount.java and Hadoop’s mapreduce algorithm, you can download it from https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html, it’ll look like this:

Then once you have the code working, you can use, same as I did, an online generator at https://www.randomlists.com/random-words to create couple of random words:

After I did so, I’ve saved my words to words.txt, but to make it little more fun, I’ve replaced some of them with my last name, for a total of 96 unique words and 4 that are repeated last name.

Running Wordlist against Hadoop’s MapReduce

Once I ran my code, it executed and started processing the words.txt file that was prior to execution copied to input folder (which I created earlier together with the output folder for the outcome files).

Following was the result of Hadoop’s processing job:

I was able to see follow the job progress in the browser as well:

RESULTS

Once done, I’ve looked up the output folder and voila! Words.txt was processed by Hadoop. Here is the result. It’s of course not the most perfect Hadoop wordcount, but I wanted to have all words as unique, except my last name, to make sure that the wordcount and Hadoop work as intended, which the result proved. Outcome is sorted, counts are provided correctly.

As I’ve numbered the lines, so the count is clearly visible, as well as I’ve highlighted my last name result on line 41:

  1. abusive 1
  2. achiever    1
  3. action  1
  4. admit   1
  5. air 1
  6. angry   1
  7. approval    1
  8. awesome 1
  9. bashful 1
  10. basket  1
  11. black   1
  12. bleach  1
  13. blood   1
  14. blot    1
  15. border  1
  16. bounce  1
  17. broad   1
  18. bubble  1
  19. changeable  1
  20. cloth   1
  21. colour  1
  22. command 1
  23. confess 1
  24. cough   1
  25. cross   1
  26. dark    1
  27. decorate    1
  28. defeated    1
  29. descriptive 1
  30. elite   1
  31. expect  1
  32. flap    1
  33. flock   1
  34. fold    1
  35. friction    1
  36. gabby   1
  37. hollow  1
  38. identify    1
  39. inform  1
  40. irritating  1
  41. jarosciak   5
  42. jobless 1
  43. jumpy   1
  44. kindly  1
  45. lackadaisical   1
  46. lake    1
  47. license 1
  48. male    1
  49. marble  1
  50. mark    1
  51. measure 1
  52. moor    2
  53. mug 1
  54. neat    1
  55. noisy   1
  56. object  1
  57. orange  1
  58. peck    1
  59. pot 1
  60. pour    1
  61. preach  1
  62. pull    1
  63. quarrelsome 1
  64. radiate 1
  65. reach   1
  66. rebel   1
  67. release 1
  68. reply   1
  69. request 1
  70. right   1
  71. scold   1
  72. seed    1
  73. sheet   1
  74. shoes   1
  75. smash   1
  76. sore    1
  77. squash  1
  78. stir    1
  79. stop    1
  80. strengthen  1
  81. succinct    1
  82. suggest 1
  83. tart    1
  84. taste   1
  85. thumb   1
  86. trip    1
  87. trite   1
  88. understood  1
  89. unique  1
  90. unkempt 1
  91. untidy  1
  92. van 1
  93. wasteful    1
  94. watch   1
  95. well-to-do  1
  96. wood    1

References

Apache Software Foundation (2008) ‘Example: WordCount v1.0’, MapReduce tutorial [Online]. Available from: https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v1.0 (Accessed: 17 October 2014)

Sharma, S. (no date) Installing Hadoop-2.6.x on windows 10. Available at: http://www.ics.uci.edu/~shantas/Install_Hadoop-2.6.0_on_Windows10.pdf (Accessed: 1 February 2017).

Running Hadoop on Cygwin in windows (Single-Node cluster) (2013) Available at: https://sundersinghc.wordpress.com/2013/04/08/running-hadoop-on-cygwin-in-windows-single-node-cluster/ (Accessed: 1 February 2017).

Cloudera (2016) Running WordCount v1.0. Available at: https://www.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_usage.html (Accessed: 2 February 2017).

K, R. (2016) TecAdmin.net. Available at: http://tecadmin.net/hadoop-running-a-wordcount-mapreduce-example/# (Accessed: 2 February 2017).

to first install Hadoop on my Windows computer. The process was straight forward. First, I needed to download and install the following software: