DEV Community

Zaw Htut Win
Zaw Htut Win

Posted on

 

Installing Hadoop single node cluster in AWS EC2

Ubuntu 18, m3.large, memory 8GB

Install openjdk(Not JRE)

sudo apt-get install openjdk-8-jdk

Get the Hadoop 2.9.0
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz

Extract Hadoop in home folder

tar -xvf hadoop-2.9.0.tar.gz

Create folder for Hadoop

sudo mkdir /usr/lib/hadoop

Move extracted Hadoop folders to /usr/lib/hadoop

mv hadoop-2.9.0 /usr/lib/hadoop/

Find the JDK 8 path and note down as following

EXPORT=/usr/lib/jvm/java-1.8.0-openjdk-amd64

Open ~/.bashrc and put the above line at the end of thee file.

EXPORT=/usr/lib/jvm/java-1.8.0-openjdk-amd64

Load the env environment

source ~/.bashrc

Generate SSH

ssh-keygen -t rsa

cd ~

sudo .ssh/id_rsa.pub >> .ssh/authorized_keys

ssh-copy-id -i .ssh/id_rsa.pub ubuntu@localhost

Create hadoopdata folder in home directory
cd ~

mkdir hadoopdata

Go to xml files

cd /usr/lib/hadoop/hadoop-2.9.0/etc/hadoop

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
  <property>

 <name>dfs.namenode.name.dir</name>

 <value>/home/ubuntu/hadoopdata/hdfs/name</value>

 </property>

 <property>

 <name>dfs.datanode.data.dir</name>

 <value>/home/ubuntu/hadoopdata/hdfs/data</value>

 </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

mapred-site.xml

<configuration>
 <property>

 <name>mapreduce.framework.name</name>

 <value>yarn</value>

 </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
 <property>

 <name>yarn.nodemanager.aux-services</name>

 <value>mapreduce_shuffle</value>

 </property>

 <property>

 <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

 <value>org.apache.hadoop.mapred.ShuffleHandler</value>

 </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

Format the name node
hdfs namenode -format

Go to sbin directory of hadoop :
cd $HADOOP_HOME/sbin

Start the name node
./hadoop-daemon.sh start namenode

Start HDFS components
./start-dfs.sh

Stop all
./stop-all.sh

Start all
./start-all.sh

Then access the web ui for Hadoop in following webpages.

NameNode – aws_ip_address: 50070

DataNode – aws_ip_address: 50075

SecondaryNameNode – aws_ip_address: 50090

ResourceManager – aws_ip_address: 8088

In the next tutorial, we will install sqoop.
https://dev.to/zawhtutwin/installing-sqoop-on-hadoop-14n8

Oldest comments (0)

An Animated Guide to Node.js Event Loop

Node.js doesn’t stop from running other operations because of Libuv, a C++ library responsible for the event loop and asynchronously handling tasks such as network requests, DNS resolution, file system operations, data encryption, etc.

What happens under the hood when Node.js works on tasks such as database queries? We will explore it by following this piece of code step by step.