Zaw Htut Win

Posted on Oct 5, 2022

Installing Hadoop single node cluster in AWS EC2

#hadoop #aws #datalake

Ubuntu 18, m3.large, memory 8GB

Install openjdk(Not JRE)

sudo apt-get install openjdk-8-jdk

Get the Hadoop 2.9.0
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz

Extract Hadoop in home folder

tar -xvf hadoop-2.9.0.tar.gz

Create folder for Hadoop

sudo mkdir /usr/lib/hadoop

Move extracted Hadoop folders to /usr/lib/hadoop

mv hadoop-2.9.0 /usr/lib/hadoop/

Find the JDK 8 path and note down as following

EXPORT=/usr/lib/jvm/java-1.8.0-openjdk-amd64

Open ~/.bashrc and put the above line at the end of thee file.

EXPORT=/usr/lib/jvm/java-1.8.0-openjdk-amd64

Load the env environment

source ~/.bashrc

Generate SSH

ssh-keygen -t rsa

cd ~

sudo .ssh/id_rsa.pub >> .ssh/authorized_keys

ssh-copy-id -i .ssh/id_rsa.pub ubuntu@localhost

Create hadoopdata folder in home directory
cd ~

mkdir hadoopdata

Go to xml files

cd /usr/lib/hadoop/hadoop-2.9.0/etc/hadoop

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
  <property>

 <name>dfs.namenode.name.dir</name>

 <value>/home/ubuntu/hadoopdata/hdfs/name</value>

 </property>

 <property>

 <name>dfs.datanode.data.dir</name>

 <value>/home/ubuntu/hadoopdata/hdfs/data</value>

 </property>
</configuration>

mapred-site.xml

<configuration>
 <property>

 <name>mapreduce.framework.name</name>

 <value>yarn</value>

 </property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
 <property>

 <name>yarn.nodemanager.aux-services</name>

 <value>mapreduce_shuffle</value>

 </property>

 <property>

 <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

 <value>org.apache.hadoop.mapred.ShuffleHandler</value>

 </property>
</configuration>

Format the name node
hdfs namenode -format

Go to sbin directory of hadoop :
cd $HADOOP_HOME/sbin

Start the name node
./hadoop-daemon.sh start namenode

Start HDFS components
./start-dfs.sh

Stop all
./stop-all.sh

Start all
./start-all.sh

Then access the web ui for Hadoop in following webpages.

NameNode – aws_ip_address: 50070

DataNode – aws_ip_address: 50075

SecondaryNameNode – aws_ip_address: 50090

ResourceManager – aws_ip_address: 8088

In the next tutorial, we will install sqoop.
https://dev.to/zawhtutwin/installing-sqoop-on-hadoop-14n8

DEV Community

Installing Hadoop single node cluster in AWS EC2

Top comments (0)

Read next

AWS Session Manager vs SSH

Creating an Instance on AWS using Ubuntu - AWS Tutorial

AWS Firewall Manager now supports retrofitting of existing AWS WAF Web ACLs

Qualities of a Good Cloud Architect