DEV Community

Cover image for The Ultimate Hadoop Installation Cheat Sheet
Nishkarsh Raj
Nishkarsh Raj

Posted on

The Ultimate Hadoop Installation Cheat Sheet

1. Install Java

$ apt-get -y upgrade && apt-get -y update
$ apt install -y default-jdk
$ java --version

2. Create Dedicated Hadoop user

$ sudo addgroup [group name]
$ sudo adduser --ingroup [group name] [user name] 
$ sudo adduser [username] sudo # Add to sudoers group

3. Setup Local and HDFS network connection using SSH

$ sudo apt-get install openssh-client openssh-server
$ su - [username]
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

4. Download Hadoop Tar file from official registry

Link to Hadoop Registry.

$ cd [to hadoop folder]
$ sudo tar xvzf [folder name]
$ sudo mv [extracted folder] /usr/local/hadoop
$ sudo chown -R [username] /usr/local/hadoop

5. Perform configurations

1. ~/.bashrc

Add following lines at End of file

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/native"
  • Execute the file to modify changes.
$ source ~/.bashrc

2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

3. nano /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

4. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>

5. /usr/local/hadoop/etc/hadoop/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

6. /usr/local/hadoop/etc/hadoop/mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

6. Create directories for data node and name node

$ sudo mkdir -p /usr/local/hadoop_space
$ sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode
$ sudo chown -R nish /usr/local/hadoop_space

7. Running Hadoop in Action

i. Format Name node

$ hdfs namenode -format

ii. Start All hadoop components

$ start-dfs.sh

iii. Start YARN

$ start-yarn.sh

iv. Check which components are up

$ jps

Top comments (0)