DEV Community

Zaw Htut Win
Zaw Htut Win

Posted on

Installing sqoop on Hadoop AWS EC2

Pre-requisitive: On how to install hadoop
https://dev.to/zawhtutwin/installing-hadoop-single-node-cluster-in-aws-ec2-o39

We have already installed Hadoop with OpenJDK 1.8 in the previous guide.
JDK 1.8 is required because later version of MySql Servers does not work well with JDK 1.7 MySql connector drivers. Cloudera shipped their docker image with JDK 1.7,so the objective of this manual installation is to allow the sqoop to work with JDK 8 which support wide range of MySql Server version especially on RDS. So we are not using Cloudera docker image, instead we will be installing things manually.

Go to home folder

cd ~

Download sqoop from Apache website

wget sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

Extract the file in the home folder

tar -xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

Create a sqoop directory in /usr/lib

cd /usr/lib

mkdir sqoop

Move the sqoop-1.4.7.bin__hadoop-2.6.0 folder to /usr/lib folder

mv ~/sqoop-1.4.7.bin__hadoop-2.6.0 .

Add $SQOOP_HOME environment variable in ~/.bashrc

sudo nano ~/.bashrc

export SQOOP_HOME=/usr/lib/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0

Then add to the $PATH variable too

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin

Save the .bashrc and source it

source .bashrc

Then download the sql-connector-j jar file from Maven

cd ~

wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.30/mysql-connector-java-8.0.30.jar

Then copy mysql-connector-java-8.0.30.jar to $SQOOP_HOME/lib folder

cp mysql-connector-java-8.0.30.jar $SQOOP_HOME/lib

Go to $SQOOP_HOME/conf folder and rename the sqoop-env-template.sh to sqoop-env.sh

mv sqoop-env-template.sh sqoop-env.sh

Then edit the file as following

export HADOOP_COMMON_HOME=/usr/lib/hadoop/hadoop-2.9.0
export HADOOP_MAPRED_HOME=/usr/lib/hadoop/hadoop-2.9.0
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
Enter fullscreen mode Exit fullscreen mode

Then check the sqoop installation version

sqoop version

22/10/05 04:50:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
Enter fullscreen mode Exit fullscreen mode

Then you can start import from the RDS or any Mysql remote database as following.

sqoop import --connect jdbc:mysql://your_rds_dns_address/yourdatabase --table hr_users --username something --password 'something'

After import the data will be saved as csv part files in hdfs. The location is /user/ubuntu/hr_users. You can verify as following.

hdfs dfs -ls /user/ububtu/hr_users

To see the content of the file.
hdfs dfs -cat /user/ubuntu/hr_users/part-m-00001

Then you are ready to install Apache Hive
https://dev.to/zawhtutwin/installing-hive-in-aws-ec2-2g35/

Oldest comments (0)