DEV Community

Cover image for Installing and Running Hadoop and Spark on Windows

Installing and Running Hadoop and Spark on Windows

Andrew (he/him) on November 05, 2018

Installing and Running Hadoop and Spark on Windows We recently got a big new server at work to run Hadoop and Spark (H/S) on for a proof...
Collapse
 
crashbandavid profile image
David Camilo Serrano

Hi Andrew,
I am getting this error when i try to execute start-yarn.cmd:

This file does not have an app associated with it for performing this action. Please install an app or, if one is already installed, create an association in the defaul apps settings page.

It seems like yarn is not a command known by windows.

Here is my environment variables:
thepracticaldev.s3.amazonaws.com/i...

Collapse
 
awwsmm profile image
Andrew (he/him)

Hi David,

It sounds like you're trying to run this program by double-clicking on it. You should run it in the cmd prompt like:

C:\> start-yarn.cmd

Let me know if that works for you.

Collapse
 
crashbandavid profile image
David Camilo Serrano

No. I am using the cmd console.
For example if i try to type just hadoop in console, it shows me some options. But if i try to type yarn it says:

'yarn' is not recognized as an internal or external command,
operable program or batch file.

I am attaching the images where it can be seen: thepracticaldev.s3.amazonaws.com/i...

Thread Thread
 
awwsmm profile image
Andrew (he/him)

These error messages are giving you hints about what's going wrong. It looks like your %PATH% is set up correctly and hadoop is on it, but you can't run the hadoop command by itself. That's what the error message is telling you. You need to include additional command-line arguments.

Try running hadoop version and see if you get any output.

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

When i execute this command "hadoop version" i get this:

Hadoop 2.9.1
Subversion github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e
Compiled by root on 2018-04-16T09:33Z
Compiled with protoc 2.5.0
From source with checksum 7d6d2b655115c6cc336d662cc2b919bd
This command was run using /C:/BigData/hadoop-2.9.1/share/hadoop/common/hadoop-common-2.9.1.jar

But if i try execute just "yarn" i get:

'yarn' is not recognized as an internal or external command,
operable program or batch file.

Thread Thread
 
awwsmm profile image
Andrew (he/him)

Right, so hadoop is working fine. yarn isn't a command that you run, it's just the resource negotiator that the HDFS (Hadoop Distributed File System) uses behind the scenes to manage everything.

If you successfully ran start-yarn.cmd and start-dfs.cmd, you're good to go! Try uploading a file to HDFS with:

C:\> hadoop fs -put <file name here> /

...and checking that it's been uploaded with

C:\> hadoop fs -ls /
Thread Thread
 
crashbandavid profile image
David Camilo Serrano

Hi,
Thanks for your answer.
But the problem is exactly that. When a i run the command start-yarn.cmd i get:

This file does not have an app associated with it for performing this action. Please install an app or, if one is already installed, create an association in the defaul apps settings page.

So, i tried to see what the content is for the file start-yarn.cmd and it has a call to yarn command. So i tried to call it in a independent console and i get the same error. That is the reason why i think the problem is yarn, the command as is.

Thread Thread
 
awwsmm profile image
Andrew (he/him)

Okay, I think we're getting close. Can you echo %PATH% and share the result?

start-yarn.cmd should be within the Hadoop /sbin directory. If you haven't added it to your path correctly, maybe that's why you can't access it.

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

Thanks for the answer.

Here it is: echo %path%

Result:
C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\ProgramData\Oracle\Java\javapath;E:\app\dserranoa\product\11.2.0\client_1;E:\app\dserranoa\product\11.2.0\client_1\bin;C:\oraclexe\app\oracle\product\11.2.0\server\bin;;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files\TortoiseGit\bin;C:\Program Files\PuTTY\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI\wbin;C:\Program Files (x86)\Microsoft SQL Server\110\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\120\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\130\DTS\Binn\;C:\Program Files\Microsoft SQL Server\110\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\;C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\;C:\Program Files\nodejs\;C:\Program Files\Microsoft SQL Server\110\DTS\Binn\;C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\PrivateAssemblies\;C:\Program Files (x86)\Bitvise SSH Client;C:\Program Files\dotnet\;C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code;C:\Program Files\Microsoft SDKs\Service Fabric\Tools\ServiceFabricLocalClusterManager;C:\Program Files (x86)\Brackets\command;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\150\DTS\Binn\;C:\Program Files\Java\jdk1.8.0_121\bin;C:\Program Files\MySQL\MySQL Shell 8.0\bin;C:\Users\dserranoa\AppData\Local\Microsoft\WindowsApps;C:\Progra~1\Java\jdk1.8.0_121;C:\BigData\hadoop-2.9.1;C:\BigData\hadoop-2.9.1\bin;C:\BigData\hadoop-2.9.1\sbin

I have attached the image of my environment variables.

Thread Thread
 
awwsmm profile image
Andrew (he/him)

Huh. Can you run:

C:\> dir C:\BigData\hadoop-2.9.1\sbin

...and give the result?

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

Sure,
Here it is:

Volume in drive C has no label.
Volume Serial Number is 8276-D962

Directory of C:\BigData\hadoop-2.9.1\sbin

11/09/2019 09:55 a.m.

.

11/09/2019 09:55 a.m. ..

16/04/2018 06:52 a.m. 2.752 distribute-exclude.sh

11/09/2019 09:55 a.m. FederationStateStore

16/04/2018 06:52 a.m. 6.475 hadoop-daemon.sh

16/04/2018 06:52 a.m. 1.360 hadoop-daemons.sh

16/04/2018 06:52 a.m. 1.640 hdfs-config.cmd

16/04/2018 06:52 a.m. 1.427 hdfs-config.sh

16/04/2018 06:52 a.m. 3.148 httpfs.sh

16/04/2018 06:52 a.m. 3.677 kms.sh

16/04/2018 06:52 a.m. 4.134 mr-jobhistory-daemon.sh

16/04/2018 06:52 a.m. 1.648 refresh-namenodes.sh

16/04/2018 06:52 a.m. 2.145 slaves.sh

16/04/2018 06:52 a.m. 1.779 start-all.cmd

16/04/2018 06:52 a.m. 1.471 start-all.sh

16/04/2018 06:52 a.m. 1.128 start-balancer.sh

16/04/2018 06:52 a.m. 1.401 start-dfs.cmd

16/04/2018 06:52 a.m. 3.734 start-dfs.sh

16/04/2018 06:52 a.m. 1.357 start-secure-dns.sh

16/04/2018 06:52 a.m. 1.571 start-yarn.cmd

16/04/2018 06:52 a.m. 1.347 start-yarn.sh

16/04/2018 06:52 a.m. 1.770 stop-all.cmd

16/04/2018 06:52 a.m. 1.462 stop-all.sh

16/04/2018 06:52 a.m. 1.179 stop-balancer.sh

16/04/2018 06:52 a.m. 1.455 stop-dfs.cmd

16/04/2018 06:52 a.m. 3.206 stop-dfs.sh

16/04/2018 06:52 a.m. 1.340 stop-secure-dns.sh

16/04/2018 06:52 a.m. 1.642 stop-yarn.cmd

16/04/2018 06:52 a.m. 1.340 stop-yarn.sh

16/04/2018 06:52 a.m. 4.295 yarn-daemon.sh

16/04/2018 06:52 a.m. 1.353 yarn-daemons.sh

28 File(s) 61.236 bytes

3 Dir(s) 101.757.034.496 bytes free
Thread Thread
 
awwsmm profile image
Andrew (he/him)

So start-dfs.cmd works, but start-yarn.cmd doesn't? Weird. They're both in the same directory. That doesn't make much sense.

I'm not sure how I can help further without being at your terminal. I'd say maybe try starting from scratch? Sometimes, it's easy to miss a small step or two.

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

mmm well, i tried to do the same process in another machine and it happened again. The same error. The yarn daemons are not running.

I have checked different options but i have not could find any solution yet.

I don't know if yarn needs some additional installation or something like that or if there is another environment variable that i am not setting up.

I am really lost here.
What kind of command would you use in my console?

Thread Thread
 
awwsmm profile image
Andrew (he/him)

I would start from scratch, and make sure the correct version (8) of Java is installed, and re-install Hadoop. Then, I would double-check all of the environment variables.

Can you try adding the environment variables as system environment variables, rather than user environment variables? You may need to be an Administrator to do this.

If all of that checks out, and the %PATH% is correct, and all of the .cmd files are on the path, I'm not sure what else I would do. There's no reason why those commands shouldn't work if they're on the %PATH%.

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

I appreciate your help.

I have already added the variables to the system but the problem is still there.

I would really appreciate that you tell me If you have another ideas to solve this issue.

I think it is also weird but it seems something related to yarn. I will look for more info , more tricks and if i solve it i will post here.

Thanks so much.

Collapse
 
crashbandavid profile image
David Camilo Serrano

Hi Andrew ,
It is me again. Now i am testing in my personal machine. But now i ma having another problem. In my local machine my user is "David Serrano". As you can see it has one space in it. When i try to format the namenode with "hdfs namenode -format" I am getting this error:

Error: Could not find or load main class Serrano
Caused by: java.lang.ClassNotFoundException: Serrano

So, i guess the problem is the space in my user name. What can i do in this case?

Thanks in advance!

Collapse
 
awwsmm profile image
Andrew (he/him)

Hadoop doesn't like spaces in paths. I think the only thing you can do is put Java, Hadoop, and Spark in locations where there are no spaces in the path. I usually use:

C:\Java
C:\Hadoop
C:\Spark
Collapse
 
crashbandavid profile image
David Camilo Serrano

Hi,
Well all the files are in paths without spaces. However, hadoop is executing something using my user "David Serrano" and it is generating the problem. I have not found the root cause of this.

Thread Thread
 
awwsmm profile image
Andrew (he/him)

Are there any spaces on your %PATH% at all?

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

Hi,
Here it is my path:
C:\Users\David Serrano>echo %path%
C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Program Files\Microsoft MPI\Bin\;C:\Program Files (x86)\Intel\iCLS Client\;C:\Program Files\Intel\iCLS Client\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\Git\cmd;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files\Java\jdk-12.0.1\bin;C:\Program Files\MySQL\MySQL Shell 8.0\bin\;C:\Progra~1\Java\jdk-12.0.1;C:\BigData\hadoop-3.1.2;C:\BigData\hadoop-3.1.2\bin;C:\BigData\hadoop-3.1.2\sbin;

As you can see, there are a lot of spaces, however, in the cofngiuration f the variables i am using--> C:\Progr~1 ... in order to avoid spaces problems. But, the problem is with my user "David Serrano". The error says:

Error: Could not find or load main class Serrano
Caused by: java.lang.ClassNotFoundException: Serrano

As you can see in the PATH there is not "Serrano" word. so, my conclusion is that the problem is in my user. But i don't know how i can to avoid this.

Thread Thread
 
awwsmm profile image
Andrew (he/him)

Maybe it's doing something with your working directory path? Try cd-ing to C:\ first, then running Hadoop. I'm really not sure, though.

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

I already did that:

C:>hadoop version
Error: Could not find or load main class Serrano
Caused by: java.lang.ClassNotFoundException: Serrano

Do you know which script of hadoop call the user profile? do you know if hadoop has some way to set up the name of the user profile in the scripts?

Thread Thread
 
awwsmm profile image
Andrew (he/him)

I don't, sorry, David. I'm not sure why that should be hard-coded anywhere, if it's not in your %PATH% and you're not in that directory.

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

Well, here are some info (although it is a little bit old) that could give some clue about the problem:

blog.benhall.me.uk/2011/01/install...

I think i can do something similar to the advice in the above blog. However i need to know which is the variable that hadoop is using to call java in order to change it in the config files.
If you have some info about it, please post here in order to try to solve the problem.
Thanks in advance.

Thread Thread
 
awwsmm profile image
Andrew (he/him)

Hadoop uses JAVA_HOME to determine where your Java distribution is installed. In a Linux installation, there's a file called hadoop/etc/hadoop/hadoop-env.sh. It might be .cmd instead of .sh on Windows, but I'm not sure.

Check out my other article on installing Hadoop on Linux. (Search for "JAVA_HOME" to find the relevant bit.)

Thread Thread
 
crashbandavid profile image
David Camilo Serrano

Yes, the JAVA_HOME variable is fine in my laptop. However, hadoop must use in another part of its code the variable %USERNAME% or %USERPROFILE%. Those variables are the problematic thing. I need to locate that part in hadoop and try to change in some config file (if it is possible). Actually i have another machine with ubuntu and hadoop works normally. The idea was installing on windows to do some specific work in both systems.

I appreciate your attention, and if you get some new info about this kind of problems (user name with spaces and yarn problems in windows) please don't hesitate in posting it here.

thanks a lot.

Thread Thread
 
patilpranay619 profile image
Pranay

Hey Guys, I am also the same problem in my system due to space in my system user name >Did You find any Solution
thanx in advance

Collapse
 
parixitodedara profile image
ParixitOdedara

Thanks for putting this together and sharing knowledge. I tried to get Hadoop up and running on my Windows machine last year, and it was painful! Anywho, it encouraged me to put together a blog just like you - exitcondition.com/install-hadoop-w...

Keep Exploring!

Collapse
 
punnupakistani profile image
پنوں پاکستانی

hi Andrew

when I run start-dfs.cmd and start-yarn.cmd command it gives me an error

C:\Java\jdk1.8.0_201\bin\java -Xmx32m -classpath "C:\Hadoop\hadoop-3.1.2\etc\hadoop;C:\Hadoop\hadoop-3.1.2\share\hadoop\common;C:\Hadoop\hadoop-3.1.2\share\hadoop\common\lib*;C:\Hadoop\hadoop-3.1.2\share\hadoop\common*" org.apache.hadoop.util.PlatformName' is not recognized as an internal or external command,
operable program or batch file.
The system cannot find the file C:\Windows\system32\cmd.exe\bin.
The system cannot find the file C:\Windows\system32\cmd.exe\bin.

Please help me

Collapse
 
awwsmm profile image
Andrew (he/him)

Hi پنوں,

It looks like your system variables are mis-configured. The path

C:\Windows\system32\cmd.exe\bin

Doesn't make any sense, as cmd.exe is an executable, not a directory. Double-check that you have the environment variables set correctly and let me know if you continue to have issues.

Collapse
 
punnupakistani profile image
پنوں پاکستانی

There was a problem with environment variables I was trying C:\Windows\system32\cmd.exe\bin but that was prompting an error. but when I changed the system variable with C:\Windows\system32\cmd.exe it was Running fine.

Thank you BOSS for your help Stay blessed.

Thread Thread
 
awwsmm profile image
Andrew (he/him)

Happy to help!

Collapse
 
nebrod666 profile image
Nebrod666

Just signed to thank you for this tutorial. Well explained and very clear. Also, thanks for the link to the patch for bin files. I was only able to work with older versions of hadoop and almost tempted to try to build the bins on my own. Cheers!

Collapse
 
mfundn profile image
Michèle

Thank you so much, after 4 tutorials and 3 days of trying it finally worked! Yay!!!

For those who might have the same problem as I did: When I used start-dfs.cmd and start-yarn.cmd it said the command couldn't be found. After a quick internet search I figured out that I needed to go to the sbin directory because it's in there and start it from there. Worked fine then.

Collapse
 
awwsmm profile image
Andrew (he/him)

Glad it worked! I actually went back to follow this guide again recently and skipped over the part where I say to add \sbin to the PATH, too. No worries!

Collapse
 
chinanu9a profile image
Chinanu

Thanks Andrew for this tutorial, it was very helpful.

How does one address this encryption error:

INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

I get it after I hadoop put a file.

Collapse
 
awwsmm profile image
Andrew (he/him)

Hi Chinanu. I haven't encountered an error like this before, so unfortunately your guess is as good as mine. This site seems to suggest that it might be an issue with missing jar files? I'm not sure.

Collapse
 
sekhart505 profile image
sekhart505

Hi I am following a similar tutorial(joe0.com/2017/02/02/how-to-install...)
earlier when I run the start-dfs.cmd command the hadoop clusters came up with no issues, but after I installed flume, scoop, pig(by following the official websites) now when I entered start-dfs.cmd I am getting below:
Can you please help me solve it
C:\hadoop\hadoop-3.1.2\sbin>start-dfs.cmd
The system cannot find the file hadoop.
The system cannot find the file hadoop.

Collapse
 
pojntfx profile image
Felicitas Pojtinger

But ... why? Just get Fedora and done ;)

Collapse
 
awwsmm profile image
Andrew (he/him)

Client-specified software that only runs on Windows Server :/

Collapse
 
pojntfx profile image
Felicitas Pojtinger

Well, that's sad. Have you thought about using smth. like an IIS container for those proprietary blobs?

Thread Thread
 
awwsmm profile image
Andrew (he/him)

I haven't, no... how would that work? Can you point me to any good resources?

Thread Thread
 
pojntfx profile image
Felicitas Pojtinger

See the Docker hub for more info, although I don't use it personally (I use & write FLOSS exclusivly)

Collapse
 
anhle16 profile image
anhle16

Hello Andrew,

Thank you so much for your tutorial. I just have some questions hope you can help:

  • after running start-dfs.cmd and start-yarn.cmd in cmd (boot HDFS step) I noticed that the yarn is working fine but NameNode and DataNode started for a few secs and then both stopped working for some reasons. Any idea what might cause this issue?

  • during the setting path process, I couldn't run the command: hdfs -version (I cd out to C:/User but I still have the same error Error: Could not find or load main class Last Name) so I edit /etc/hadoop/hadoop-env.cmd and change this line:

set HADOOP_IDENT_STRING=%USERNAME%

to

set HADOOP_IDENT_STRING=myuser

This allows me to do hdfs -version but I don't know this change will affect anything or not could you please clarify? Is this change make my NameNode and DataNode not working

Collapse
 
karthiksridhar7 profile image
Karthik

Hi Andrew,

Try Syncfusion BigData Studio and Syncfusion Cluster Manager products. It has builtin Hadoop ecosystems for Windows platform.

Much easier to install and configure Hadoop ecosystems in Windows.

Collapse
 
mferrall profile image
Mark Ferrall

My god, I've spent and insane amount of time on this for an assignment, and this was the only thing I've gotten to work. Thank you for putting this together.

Collapse
 
awwsmm profile image
Andrew (he/him)

Happy to help!

Collapse
 
rodrihc profile image
Rodrigo Herrán • Edited

hi Andrew,

I cant fix this problem:
issues.apache.org/jira/browse/YARN...

Hadoop Version ist 3.1.3

When I start yarn, this folder gets created, with insuficient permissions: /usercache. Running every script with unsificcient permissions, doesnt help.

Thanks a lot in advance!

Rodrigo

Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Permissions incorrectly set for dir c:/Hadoop/hadoop-3.1.3/yarn/tmp-nm/usercache, should be rwxr-xr-x, actual value = rw-rw-rw-
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1665)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1633)

Collapse
 
mbhoshen profile image
Moshe

Hi Andrew
This was so clear.
I had been getting problems installing Hadoop for a week, and this just made it a breeze.
Thank you indeed

Collapse
 
elizabethgithub profile image
Elizabethgithub

Hi Andrew,
Thank you for this tutorial. It has really been helpful.
I have been able to run start-yarn.cmd command successfully but whenever I run start-dfs.cmd it gives me an error message
"WARN datanode.Datanode: Problem connecting to server: localhost/127.0.0.1:9000"
Can you please tell me what to do to resolve this issue.
Thank you.

Collapse
 
nikhil01ranjan profile image
Nikhil01ranjan

Hi Andrew,

Thanks a lot for this.
This is the only thing that worked for me.

Collapse
 
awwsmm profile image
Andrew (he/him)

Happy to help!

Collapse
 
ceperezegma profile image
ceperezegma

simply great, it worked like a charm. Thanks for this tutorial Andrew!!

Collapse
 
funsoiyaju profile image
Funso Iyaju

Thanks Andrew. This post was really helpful.

Collapse
 
punnupakistani profile image
پنوں پاکستانی

HI ANDREW

when i run start-dfs.cmd and start-yarn.cmd this command it gives me a error msg

Collapse
 
orakxaii profile image
OrakXaii

Hi Andrew, Thank you.
I have followed all the steps on win 7 but when I run hdfs -version ; got an error hdfs is not recognized
please help

Collapse
 
awwsmm profile image
Andrew (he/him)

Can you give me the exact error message you get? I haven't tried this guide on Windows 7 -- I'm not sure it will work on that OS.

Collapse
 
rfks profile image
rfks

Thanks for the guide! Just noticed a small typo with one port number:
localhost:9087 instead of localhost:9870 (I should have looked at the image:)

Collapse
 
awwsmm profile image
Andrew (he/him)

Thanks for pointing that out! Typo is fixed :)

Collapse
 
maritzapg profile image
Maritzapg • Edited

Hi,

I'm getting this error when i execute start-yarn.cmd.
thepracticaldev.s3.amazonaws.com/i...

Help me, please