DEV Community

Saša Zejnilović
Saša Zejnilović

Posted on • Updated on

Building Hadoop native libraries on Mac in 2019

TL;DR to be found at the end

Recently I came into a situation that I "needed" Hadoop native libraries. Well, when I say "needed", I mean I was just getting fed up by the constant warnings like this one:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Enter fullscreen mode Exit fullscreen mode

So I thought I would build my own Hadoop native libraries. How hard can it be, right? Honest answer? Less than an hour if you don't have a tutorial. Fifteen minutes if you do and most of that is compilation time. In my search, I found out a lot of tutorials and guides were either outdated or didn't offer everything needed for a full compilation and installation and that is why I wrote my own which I tested on two independent Macs, thus it should be "tested enough".

Why do it

There was no real world issue I was hoping to solve. I just had a few minutes on my hands and I used them to learn something new. But I did read that there are cases of speed improvements which is good if you are developing or testing something locally because local machines tend to be slow and any improvement is more than welcome. Another thing is I did see two random articles a while back saying they did have some issues with the Java libraries, but chances of some of you having the same issues are really small.

Dependencies

First of all, we need to install the dependencies for the build and I am including links so you can check what you are going to install exactly:

(Please note I am skipping maven, java and others that I think you would already have. If I am wrong, tell me and let's update the article. As well as Hadoop installation. There is a beautiful article about Hadoop installation on Mac by Zhang Hao here.)

For the installation of most of these, I will be using Homebrew. It's a good tool, has a one-liner installation and a very short average time to be productive with it. As the link provides everything you need I am skipping the installation here.

If you are not using Homebrew for the first time, update and upgrade your tools. If you are using it for some time already and would like to keep some things with the current version, use brew pin like this.

# Update
brew update
brew upgrade

# Then the installation
brew install wget gcc autoconf automake libtool cmake snappy gzip bzip2 zlib openssl
Enter fullscreen mode Exit fullscreen mode

As you could have noticed one of those dependencies listed is missing from the list above. Yes! It is a protobuf that has been deprecated and can't be easily installed from Homebrew. So let's build our own. It's cleaner that way and much more fun then it sounds. We will first need to get it from GitHub and unarchive it somewhere. You can delete it right after, so you don't need a special folder structure.

wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar -xzf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
Enter fullscreen mode Exit fullscreen mode

Then comes the process of building and making sure everything went smoothly. It takes some time and I advise you to run it step by step to see and know what is happening. Some warnings here and there are normal so you can skip those.

./configure
make
make check
make install
# And just to check if everything is ok.
# This should print libprotoc 2.5.0
protoc --version
Enter fullscreen mode Exit fullscreen mode

OpenSSL setup

Now, linking OpenSSL libraries by hand as Homebrew refuses to link OpenSSL and the compiler needs them. This is a known feature and needs to be done by running ln.

cd /usr/local/include
ln -s ../opt/openssl/include/openssl .
Enter fullscreen mode Exit fullscreen mode

This will solve an error that looks something like the caption below.

[exec] -- Configuring incomplete, errors occurred!
[exec] See also /Users/user/github/hadoop/hadoop-tools/hadoop-pipes/target/native/CMakeCMake Error at /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
[exec]   Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
[exec]   system variable OPENSSL_ROOT_DIR (missing: OPENSSL_INCLUDE_DIR)
[exec] Call Stack (most recent call first):
[exec]   /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
[exec]   /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindOpenSSL.cmake:413 (find_package_handle_stFiles/CMakeOutput.log.
[exec] andard_args)
[exec]   CMakeLists.txt:20 (find_package)
[exec]
[exec]
Enter fullscreen mode Exit fullscreen mode

Building native libraries

And finally! The building of the libraries. Again, this will create a folder that you can delete in the end. Here is probably the first place you will need to modify something and that is the version of Hadoop you will be using.

git clone https://github.com/apache/hadoop.git
cd hadoop
# Change the version as needed
git checkout branch-<VERSION>
# And just package.
mvn package -Pdist,native -DskipTests -Dtar
# After build, move your newly created libraries.
cp -R hadoop-dist/target/hadoop-<VERSION>/lib $HADOOP_HOME
Enter fullscreen mode Exit fullscreen mode

Setting up environment variables

Now the critical part, making your shell see the libraries. I don't know what kind of shell you are using, nevertheless, put this into your shell profile (.bashrc, .zshrc, etc.):

export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native
Enter fullscreen mode Exit fullscreen mode

This will point all the libraries to the right path and will make everything fall right into place. The last thing that we need is just to check if everything is ok (and by everything I mean almost everything, because bzip is acting up and I still have not found a way to solve, when I do I will update this).

hadoop checknative -a

#The output should be something like this.
19/05/17 19:00:14 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
19/05/17 19:00:14 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /usr/local/Cellar/hadoop/2.7.5/lib/native/libhadoop.dylib
zlib:    true /usr/lib/libz.1.dylib
snappy:  true /usr/local/lib/libsnappy.1.dylib
lz4:     true revision:99
bzip2:   false
openssl: true /usr/lib/libcrypto.35.dylib
19/05/17 19:00:14 INFO util.ExitUtil: Exiting with status 1
Enter fullscreen mode Exit fullscreen mode

Afterword

Hopefully, everything is running smoothly and you no longer get those warnings and if I helped even one person with this I am glad. Because if there is no added value for the reader, then it is just me talking to my wall. On the other hand, if you did find some issues in the code or the article, please do tell me and I will fix everything I am capable of.

TL;DR

This is just a step by step shell script extracted from the upper text.

Top comments (18)

Collapse
 
cameronhudson8 profile image
Cameron Hudson

Depending on which Hadoop version you want to install, you may need to use an earlier Java version to package it. This can be done by temporarily changing the JAVA_HOME environment variable before running mvn package.

In my case, instead of

mvn package -Pdist,native -DskipTests -Dtar

I had to run

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home mvn package -Pdist,native -DskipTests -Dtar
Collapse
 
lanshunfang profile image
001027261_Shunfang Lan

JDK11 is still in progress cwiki.apache.org/confluence/displa...

Collapse
 
zejnilovic profile image
Saša Zejnilović

Yes, you are right. I didn't think of that use case, I just assumed people would have a compliant Java version already installed or set as the main one.

Collapse
 
ajndesai profile image
ajndesai • Edited

Thanks for this article. I get this error when compiling to generate package using this below command:

mvn package -Pdist,native -DskipTests -Dtar

branch: branch-3.2

[INFO] Apache Hadoop MapReduce NativeTask ................. FAILURE [ 1.766 s]

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.2.2-SNAPSHOT:cmake-compile (cmake-compile) on project hadoop-mapreduce-client-nativetask: make failed with error code 2 -> [Help 1]

Collapse
 
lanshunfang profile image
001027261_Shunfang Lan

It's because hadoop doesn't support macOS native building now.

See github.com/apache/hadoop/blob/trun...

Note that building Hadoop 3.1.1/3.1.2/3.2.0 native code from source is broken
on macOS. For 3.1.1/3.1.2, you need to manually backport YARN-8622. For 3.2.0,
you need to backport both YARN-8622 and YARN-9487 in order to build native code.
Collapse
 
zejnilovic profile image
Saša Zejnilović

Hello, I don't see enough of the message. Do you have java 8? Or maybe more of the error (line with cause by)?

I have tried building this now and it works for me.

Collapse
 
ajndesai profile image
ajndesai • Edited

Yes I have java 8 and that's my java home. Hadoop, hive setup and working though. I read some of the compression codecs works only with native libraries and jobs will fail with java libraries.

I tried this on mac Mojave OS 10.14.6

Collapse
 
imasli profile image
imasli

What version of OpenSSL did you use?

I used OpenSSL 1.1 from Brew and I have this error

error: variable has incomplete type 'HMAC_CTX' (aka 'hmac_ctx_st')
Collapse
 
zejnilovic profile image
Saša Zejnilović

This seems more like an issue with the OpenSSL installation than the version you are using. Anyway, this is my current PC (not sure with which I have built it, but libs work)

╰─$ openssl version
LibreSSL 2.8.3
╰─$ brew list --versions | grep ssl
openssl 1.0.2t
openssl@1.1 1.1.1d
Collapse
 
imasli profile image
imasli • Edited

According to
issues.apache.org/jira/browse/HADO...

OpenSSL 1.1 broke the compilation. They patched it but didn't include it to the version 2 build. Your tutorial used OpenSSL 1.0 (Open SSL 1.1 will have OpenSSL@1.1 on the path)

Too bad that Homebrew already deprecated OpenSSL 1.0

Thread Thread
 
zejnilovic profile image
Saša Zejnilović

Thank you very much! Will update the post.

Thread Thread
 
imasli profile image
imasli

This issue discusses how to forcefully install OpenSSL 1.0 using Homebrew
github.com/Homebrew/homebrew-core/...

Collapse
 
krikoon73 profile image
krikoon73 • Edited

Hi Sasa,
Thank you for this ! I'm a bit new on hadoop and I'm trying to fix the "native library" thing based on your article. Everything is ok until "Apache Hadoop Common" build :

Mac OSX highSierra
Hadoop version 2.9.2

~/hadoop/tmp/hadoop   branch-2.9.2  brew list
autoconf cmake gettext go libidn2 libunistring openjdk pcre2 pyenv-virtualenv sshpass wget
automake direnv git gzip libmpc maven openssl@1.1 pkg-config readline telnet zlib
bzip2 gcc gmp isl libtool mpfr pandoc pyenv snappy tree

openssl : ok
protoc : 2.5.0 ok

Error message :

[INFO] Apache Hadoop Common ............................... FAILURE [02:08 min]

I have :

[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:256:14: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] if (context->flags & EVP_CIPH_NO_PADDING) {
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:262:20: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] int b = context->cipher->block_size;
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:263:16: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] if (context->encrypt) {
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:310:14: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] if (context->flags & EVP_CIPH_NO_PADDING) {
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:313:20: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] int b = context->cipher->block_size;
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] 5 errors generated.

[WARNING] 5 errors generated.
[WARNING] make[2]: *** [CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/hadoop_static.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2
[WARNING] make: *** [all] Error 2

Any idea ?
Thks,
Christophe

Collapse
 
zejnilovic profile image
Saša Zejnilović

Hello, in all honesty, I don't remember having this problem, but seems like you aren't alone. Apparently it is an OpenSSL feature. This apache.org Jira seems to be talking precisely about your problem. Seems like they even provide places you need to fix in the C code.

Collapse
 
dales profile image
Dale Schaffer

When is $HADOOP_HOME defined?

when i run line 29 in script it fails and prints out the usage text for cp

Collapse
 
zejnilovic profile image
Saša Zejnilović

$HADOOP_HOME is not defined by me. It is part of proper Hadoop installation and I am not doing that in this article.

Collapse
 
azeddin1717 profile image
Sahir Azeddin

Hello, I am trying to install hadoop native library on mac but I have some problems here I tried to solve it but I can’t. Please if u have any idea.
My problem looks like:

Image description

Collapse
 
arulselvanmadhavan profile image
Arulselvan Madhavan

Thanks for writing this. Helped me a lot