Step By Step Hadoop Installation Guide
Setting up Single Node Hadoop Cluster on Windows over VM
Contents
- Objective
- Current Environments
- Download VM and Ubuntu 14.04
- Install Ubuntu on VM
- Install Hadoop 2.4 on Ubuntu 14.04
Objective: This document will help you to setup Hadoop 2.4.0 onto Ubuntu 14.04 on your virtual machine of Windows operating system.
Current environment includes:
- Windows XP/7 – 32 bit
- VM Player (Non-commercial use only)
- Ubuntu 14.04 32 bit
- Java 1.7
- Hadoop 2.4.0
Download and Install VM Player from the link https://www.vmware.com/tryvmware/?p=player
Download Ubuntu 14.04 iso file from the link: http://www.ubuntu.com/download/desktop
Download the list of Hadoop commands for reference from the following link: http://hadoop.apache.org/docs/r1.0.4/commands_manual.pdf (Don’t be afraid of this file, this is just for your refer to help you learn more about important Hadoop commands)
Install Ubuntu in VM:
- Click on Create a New Virtual Machine
- Browse and select the Ubuntu iso file.
- Personalize Linux by providing appropriate details.
- Follow through the wizard steps to finish installation.



Install Hadoop 2.4 on Ubuntu 14.04
Step 1: Open Terminal

Step 2: Download Hadoop tar file by running the below command in terminal
wget http://mirror.fibergrid.in/apache/hadoop/common/stable/hadoop-2.7.2.tar.gz
Step 3: Unzip tar file through command: tar -xzf hadoop-2.7.2.tar.gz
Step 4: Let’s move everything into a more appropriate directory:
sudo mv hadoop-2.7.2/ /usr/local
cd /usr/local
sudo ln -s hadoop-2.7.2/ hadoop
Lets create a directory to for later use to store hadoop data:
mkdir /usr/local/hadoop/data
Step 5: Set up user and permission (Replace manohar by your user id)
sudo addgroup hadoop
sudo adduser –ingroup hadoop manohar
sudo chown -R hadoop: manohar /usr/local/hadoop/
Step 6: Install ssh:
sudo apt-get install ssh
ssh-keygen -t rsa -P “”
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step 7: Install Java:
sudo apt-get update
sudo apt-get install default-jdk
sudo gedit ~/.bashrc
This will open the .bashrc file in a text editor. Go to the end of the file and paste/type the following content in it:
#HADOOP VARIABLES START
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_INSTALL/lib”
export HADOOP_PREFIX=$HADOOP_INSTALL
export HADOOP_CMD=$HADOOP_INSTALL/bin/hadoop
export HADOOP_STREAMING=$HADOOP_INSTALL/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar
#HADOOP VARIABLES END

After saving and closing the .bashrc file, execute the following command so that your system recognizes the newly created environment variables:
source ~/.bashrc
Putting the above content in the .bashrc file ensures that these variables are always available when your VPS starts up.
Step 8:
Unfortunately, Hadoop and ipv6 don’t play nice so we’ll have to disable it – to do this you’ll need to open up /etc/sysctl.conf and add the following lines to the end:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Type the command: sudo gedit /etc/sysctl.conf

Step 9: Editing /usr/local/hadoop/etc/hadoop/hadoop-env.sh:
sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
In this file, locate the line that exports the JAVA_HOME variable. Change this line to the following:
Change export JAVA_HOME=${JAVA_HOME} to match the JAVA_HOME you set in your .bashrc (for us JAVA_HOME=/usr).
Also, change this line:
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
TO BE
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_PREFIX/lib”
And finally, add the following line:
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
Step 10: Editing /usr/local/hadoop/etc/hadoop/core-site.xml:
sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
In this file, enter the following content in between the <configuration></configuration> tag:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data</value>
</property>

Step 11: Editing /usr/local/hadoop/etc/hadoop/yarn-site.xml:
sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
In this file, enter the following content in between the <configuration></configuration> tag:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
The yarn-site.xml file should look something like this:

Step 12: Creating and Editing /usr/local/hadoop/etc/hadoop/mapred-site.xml:
By default, the /usr/local/hadoop/etc/hadoop/ folder contains the /usr/local/hadoop/etc/hadoop/mapred-site.xml.template file which has to be renamed/copied with the name mapred-site.xml. This file is used to specify which framework is being used for MapReduce.
This can be done using the following command:
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
Once this is done, open the newly created file with following command:
sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
In this file, enter the following content in between the <configuration></configuration> tag:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
The mapred-site.xml file should look something like this:

Step 13: Editing /usr/local/hadoop/etc/hadoop/hdfs-site.xml:
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml has to be configured for each host in the cluster that is being used. It is used to specify the directories which will be used as the namenode and the datanode on that host.
Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation. This can be done using the following commands:
sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
Open the /usr/local/hadoop/etc/hadoop/hdfs-site.xml file with following command:
sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
In this file, enter the following content in between the <configuration></configuration> tag:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
The hdfs-site.xml file should look something like this:

Step 14: Format the New Hadoop Filesystem:
After completing all the configuration outlined in the above steps, the Hadoop filesystem needs to be formatted so that it can start being used. This is done by executing the following command:
hdfs namenode –format
Note: This only needs to be done once before you start using Hadoop. If this command is executed again after Hadoop has been used, it’ll destroy all the data on the Hadoop filesystem.
Step 15: Start Hadoop
All that remains to be done is starting the newly installed single node cluster:
start-dfs.sh
While executing this command, you’ll be prompted twice with a message similar to the following:
Are you sure you want to continue connecting (yes/no)?
Type in yes for both these prompts and press the enter key. Once this is done, execute the following command:
start-yarn.sh
Executing the above two commands will get Hadoop up and running. You can verify this by typing in the following command:
jps
Executing this command should show you something similar to the following:

If you can see a result similar to the depicted in the screenshot above, it means that you now have a functional instance of Hadoop running on your VPS.