There are reports of an Apache Hadoop YARN “vulnerability” but want to share some more details that have missed the few articles I’ve come across. Here are a few of the articles/links:
The key point I want to make is that the report misleads the reader to assume that all Apache Hadoop YARN environments are insecure. This is false. The clusters described have no security and are akin to having your front door unlocked. Kerberized clusters are secure since they require a valid user account to be usable. Furthermore, clusters should not be exposed to the internet for most usecases (especially not endpoints that allow for remote job submission).
Imagine that one day you get home and find a whole bunch of extra lamps plugged into your outlets. You are annoyed because the lamps are using your electicity. You remember that you forgot to lock your door when you went on vacation. Instead of someone stealing stuff from your home, they decided to plug in lamps.
Now you might be thinking, it is expected that something bad would happen if you left your door unlocked when you went on vacation. This is the exact same thing as an unsecure Apache Hadoop YARN cluster. No one should leave their cluster unsecured and exposed to the outside world.
There have been multiple reports of “big data” endpoints being exposed to the internet and not being secured. This has affected Elasticsearch, Mongodb, and others. There is no reason to expose a cluster to the internet without security. Cloudera wrote a blog post that covers the same topic as well here.