Hadoop Security notes

Key Concepts

Authentication – Verifying credentials to reliably identify a user
Authorization – Limiting the user’s access to a given resource
User – Individual identified by underlying authentication system
Group – A set of users, maintained by the authentication system
Privilege – An instruction or rule that allows access to an object
Role – A set of privileges; a template to combine multiple access rules
Authorization models – Defines the objects to be subject to authorization rules and the granularity of actions allowed. For example, in the SQL model, the objects can be databases or tables, and the actions are SELECT, INSERT, and CREATE. For the Search model, the objects are indexes, collections and documents; the access
modes are query and update.

HDFS ACLs overview:

The user identity mechanism is extrinsic to HDFS itself. There is no provision within HDFS for creating user identities, establishing groups, or processing user credentials.


If you’ve ever used POSIX ACLs on a Linux file system, then you already know how ACLs work in HDFS.  Best practice is to rely on traditional permission bits to implement most permission requirements, and define a smaller number of ACLs to augment the permission bits with a few exceptional rules.

To set and get file access control lists (ACLs), use the file system shell commands, setfacl and getfacl.




<!-- To list all ACLs for the file located at /user/hdfs/file -->
sudo -u hdfs hdfs dfs -getfacl /user/hdfs/file

-R: Use this option to recursively list ACLs for all files and directories.
sudo -u hdfs hdfs dfs -getfacl -R /user/hdfs/file


hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>]

<path>: Path to the file or directory for which ACLs should be set.
-R: Use this option to recursively set ACLs for all files and directories.
-b: Revoke all permissions except the base ACLs for user, groups and others.
-k: Remove the default ACL.
-m: Add new permissions to the ACL with this option. Does not affect existing permissions.
-x: Remove only the ACL specified.
<acl_spec>: Comma-separated list of ACL permissions.
--set: Use this option to completely replace the existing ACL for the path specified. 
       Previous ACL entries will no longer apply.


<!-- To give user ben read & write permission over /user/hdfs/file -->
hdfs dfs -setfacl -m user:ben:rw- /user/hdfs/file

<!-- To remove user alice's ACL entry for /user/hdfs/file -->
hdfs dfs -setfacl -x user:alice /user/hdfs/file

<!-- To give user hadoop read & write access, and group or others read-only access -->
hdfs dfs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- /user/hdfs/file

For the following folder we will set “execs” group to have read permission:

rwr—–   3 bruce sales          0 20140304 16:31 /salesdata

hdfs dfs -setfacl -group:execs:r-- /sales-data

  • Check results by running getfacl.

 hdfs dfs -getfacl /sales-data
# file: /sales-data
# owner: bruce
# group: sales

 Default ACLs define the ACL that newly created child files and directories receive automatically.

  • Set default ACL on parent directory.

> hdfs dfs -setfacl -default:group:execs:r-/monthly-sales-data

  • Make sub-directories.

> hdfs dfs -mkdir /monthly-sales-data/JAN
> hdfs dfs -mkdir /monthly-sales-data/FEB

  • Verify HDFS has automatically applied default ACL to sub-directories.

 hdfs dfs -getfacl -/monthly-sales-data

# file: /monthly-sales-data/FEB
# owner: bruce
# group: sales

The default ACL is copied from the parent directory to the child file or child directory at time of creation.  Subsequent changes to the parent directory’s default ACL do not alter the ACLs of existing children.


For more information on using HDFS ACLs, see the HDFS Permissions Guide on the Apache website.


LDAP and AD concepts:


What are CN, OU, DC?

From RFC2253 (UTF-8 String Representation of Distinguished Names):

String  X.500 AttributeType
CN      commonName
L       localityName
ST      stateOrProvinceName
O       organizationName
OU      organizationalUnitName
C       countryName
STREET  streetAddress
DC      domainComponent
UID     userid

What does the string from that query mean?

The string ("CN=Dev-India,OU=Distribution Groups,DC=gp,DC=gl,DC=google,DC=com") is a path from an hierarchical structure (DIT = Directory Information Tree) and should be read from right (root) to left (leaf).

It is a DN (Distinguished Name) (a series of comma-separated key/value pairs used to identify entries uniquely in the directory hierarchy). The DN is actually the entry’s fully qualified name.

Here you can see an example where I added some more possible entries.
The actual path is represented using green.

enter image description here

The following paths represent DNs (and their value depends on what you want to get after the query is run):

  • "DC=gp,DC=gl,DC=google,DC=com"
  • "OU=Distribution Groups,DC=gp,DC=gl,DC=google,DC=com"
  • "OU=People,DC=gp,DC=gl,DC=google,DC=com"
  • "OU=Groups,DC=gp,DC=gl,DC=google,DC=com"
  • "CN=QA-USA,OU=Distribution Groups,DC=gp,DC=gl,DC=google,DC=com"
  • "CN=Dev-India,OU=Distribution Groups,DC=gp,DC=gl,DC=google,DC=com"
  • "CN=Ted Owen,OU=People,DC=gp,DC=gl,DC=google,DC=com"









A Secure HDFS Client Example






Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.