DEV Community

Shaine Ismail
Shaine Ismail

Posted on

Understanding Hadoop.proxy users

When you have a secure Hadoop cluster you need the ability of services to be able to authenticate and execute applications on behalf of the user.

If a service user with name service_runner wants submit a yarn job and access HDFS for user bob. The service_runner user has kerberos credentials but bob does not, but the service_runner does not have the level of access the user bob does.

It is required that user bob is used to connect to the namenode or the job tracker on a connection authenticated with service_runner kerberos credentials. In other words service_runner needs to impersonate user bob.

Host level control

   <property>
     <name>hadoop.proxyuser.service_runner.hosts</name>
     <value>10.222.0.0/16,10.113.221.221</value>
   </property>
   <property>
     <name>hadoop.proxyuser.service_runner.users</name>
     <value>bob</value>
   </property>

service_runner can impersonate bob from 10.222.0.0-15 and 10.113.221.221

group level control too open

   <property>
     <name>hadoop.proxyuser.service_runner.hosts</name>
     <value>*</value>
   </property>
   <property>
     <name>hadoop.proxyuser.service_runner.users</name>
     <value>*</value>
   </property>

service_runner can impersonate any user on the cluster from any host

group level control close

   <property>
     <name>hadoop.proxyuser.service_runner.hosts</name>
     <value>*</value>
   </property>
   <property>
     <name>hadoop.proxyuser.service_runner.groups</name>
     <value>service_runner_execute</value>
   </property>

service_runner can impersonate any user that is in the service_runner_execute group

So when you are on boarding a new product to your Hadoop cluster adding the recommended setting might open your cluster to security issues, especially to malicious insiders. Consider limiting to a known group of users and having your security explicit inclusion rather than open to all by default.

link to the docs

Top comments (0)