ViPR 2.1 - Configure a secure Hadoop cluster to use VIPR HDFS

Table of Contents

Configure secure Hadoop cluster to use ViPR HDFS

This article describes how to configure your existing Hadoop distribution to use the data in your ViPR storage infrastructure with ViPR HDFS. Use this step-by-step procedure if your Hadoop cluster is configured to use Kerberos authentication.

If your Hadoop cluster is configured for simple authentication, follow the steps described here. This procedure has not been qualified on the ECS Appliance.

To perform this integration procedure, you must have:

Back to Top

Plan the ViPR HDFS and secure Hadoop cluster integration

Use this list to verify that you have the information necessary to ensure a successful integration. It is a best practice to get your Hadoop cluster working with Kerberos before you configure ViPR HDFS.

In addition, verify that a Kerberos KDC is installed and configured to handle authentication of the Hadoop service principals. If you are using Active Directory to authenticate ViPR users, you must set up a cross-realm trust between the Kerberos realm and the ViPR user realm.

To integrate ViPR HDFS with your secure Hadoop cluster, complete the following tasks:

  1. Obtain the ViPR HDFS installation and support package
  2. Deploy the ViPR HDFS JAR
  3. Add a ViPR Service Principal
  4. Configure ViPR data nodes with the ViPR Service Principal
  5. Configure Java security on Hadoop nodes
  6. Edit core-site.xml
  7. Restart the HDFS and MapReduce services
  8. Confirm the services restart correctly
  9. Verify that you have file system access
Back to Top

Obtain the ViPR HDFS installation and support package

The ViPR HDFS JAR and HDFS support tools are provided in a ZIP file, vipr-hdfs-<version>.zip that you can download from the ViPR support pages on support.EMC.com.

The ZIP file contains\client and\tools\bin directories. Before you unzip the file, create a directory to hold the zip contents (your unzip tool might do this for you), then extract the contents to that directory. After you extract the files, the directories will contain the following:

Back to Top

Deploy the ViPR HDFS JAR

Use this procedure to put the ViPR HDFS JAR on the ViPR classpath of each client node in the ViPR cluster.

Before you begin

Obtain the ViPR HDFS JAR for your ViPR distribution from the EMC Support site for ViPR as described in Obtaining the ViPR HDFS installation and support package .

Procedure

  1. Log in to a ViPR client node.
  2. Run the classpath command to get the list of directories in the classpath:
    # hadoop classpath
  3. Copy ViPR HDFS JAR to one of folders listed by the classpath command that occurs after the /conf folder.
    ViPR distribution Class path location (suggested)
    Pivotal HD /usr/lib/gphd/hadoop/lib
    Cloudera /usr/lib/hadoop/lib
    Apache /opt/hadoop/hadoop/lib/native
  4. Repeat this procedure on each ViPR client node.
Back to Top

Add a ViPR Service Principal

To enable ViPR services to authenticate with a Kerberos server, you must define a service principal and a keytab file to store the ViPR service's authentication credentials.

Before you begin

Procedure

  1. Log in to the KDC as a user with admin privileges.
  2. Create a directory for the ViPRkeytab file, for example:
    # mkdir -p /etc/keytabs
  3. Create a principal using the addprinc command.
    Use the -randkey parameter to set the key of the principal to a random value, and the -e parameter to specify the encryption type; for example:
    # kadmin
    kadmin: addprinc -randkey -e "des-cbc-crc:normal" principal@REALM.NAME
    
  4. Add the new principal to its keytab by using the ktadd command.
    The following example command adds the principal to a keytab file called vipr.keytab.
    kadmin: ktadd -k /etc/keytabs/vipr.keytab principal@REALM.NAME
Back to Top

Configure ViPR data nodes with the ViPR Service Principal

The ViPR service principal and its corresponding keytab file must reside on each ViPR data node. Use the setupViPRKerberosConfiguration.sh script to automate these steps.

Before you begin

You must have the following items before you can complete this procedure:

Procedure

  1. Log in to the KDC as a user with admin privileges.
  2. Using a text editor, create a text file that contains the IP addresses of each of the ViPR data nodes.
    The file can have any name. Put each IP address on its own line, for example:
    203.0.113.10
    203.0.113.11
    203.0.113.12
  3. Run the setupViPRKerberosConfiguration.sh passing in the following parameters:
    • -s for the principal@REALM.
    • -o for the name and path of the IP address file.
    • -k for path to the keytab file.
    The following example copies the ViPRServices principal and the ViPR keytab file to each IP addresses listed in ViPRNodes.txt:
    #  setupViPRKerberosConfiguration.sh -s ViPRServices/ACME.COM@ACME.COM -o ViPRNodes.txt -k /etc/keytabs/vipr.keytab
  4. Enter the root password for the data nodes when prompted.
    The script configures the ViPR data nodes specified in the text file.
Back to Top

Configure Java security on Hadoop nodes

You must define how the ViPR HDFS service interacts with Kerberos by specifying the login module to use and how Kerberos should handle the login credentials. These settings enable the ViPR HDFS software to determine who executes a kinit command. If you do not set these parameters, the ViPR HDFS software cannot determine who the current user is so the request fails.

Before you begin

Determine if $JRE_HOME is set on your system. If it is not, you can determine where the JRE is installed by doing a directory listing of /etc/alternatives/jre, for example:
# ls -al /etc/alternatives/jrelrwxrwxrwx. 1 root root 26 Oct  7 08:09 /etc/alternatives/jre -> /usr/lib/jvm/jre-1.5.0-gcj 
In this example, JRE_HOME is /usr/lib/jvm/jre-1.5.0-gcj.

When you make changes to the java.security file, you must restart Hadoop services.

Procedure

  1. Create a file named /etc/hadoop/conf/vipr.jaas.config.
  2. Add the following content to /etc/hadoop/conf/vipr.jaas.config.
    hadoop-user-kerberos {
    com.sun.security.auth.module.Krb5LoginModule required useTicketCache="true"
    doNotPrompt="true";
    };
  3. Add a line to $JRE_HOME/lib/security/java.security to point to the vipr.jaas.config file you created:
    login.config.url.n=file://<filepath>/vipr.jaas.config
    For example:
    login.config.url.1=file:///etc/hadoop/conf/vipr.jaas.config
  4. Restart the Hadoop services.
Back to Top

Edit core-site.xml

Use this procedure to update core-site.xml with the properties that are required when using ViPR HDFS with a ViPR cluster that uses Kerberos authentication mode.

Before you begin

Obtain the credentials that enable you to log in to ViPR nodes and modify core-site.xml.

See core_site.xml property reference for more information about each property you need to set.

core-site.xml resides on each node in the Hadoop cluster, and you must modify the same properties in each instance. You can make the change in one node, and then use secure copy command (scp) to copy the file to the other nodes in the cluster. As a best practice, back up core-site.xml before you start the configuration procedure.

The location of core-site.xml depends on the distribution you are using.

Procedure

  1. Log in to one of the HDFS nodes where core-site.xml is located.
  2. Make a backup copy of core-site.xml.
    cp core-site.xml core-site.backup
  3. Using the text editor of your choice, open core-site.xml for editing.
  4. Add the following properties and values to define the Java classes that implement the ViPR HDFS file system:
    <property>
    <name>fs.viprfs.impl</name>
    <value>com.emc.hadoop.fs.vipr.ViPRFileSystem</value>
    </property> 
    <property> 
    <name>fs.AbstractFileSystem.viprfs.impl</name>
    <value>com.emc.hadoop.fs.vipr.ViPRAbstractFileSystem</value>
    </property>
  5. Add the fs.vipr.installations property. In the following example, the value is set to Site1.
    <property>
      <name>fs.vipr.installations</name>
      <value>Site1</value>
    </property>
  6. Add the fs.vipr.installation.[installation_name].hosts property as a comma-separated list of ViPR data nodes or load balancer IP addresses. In the following example, the installation_name is set to Site1.
    <property>
      <name>fs.vipr.installation.Site1.hosts</name>
      <value>203.0.113.10,203.0.113.11,203.0.113.12</value>
    </property>
  7. Add the fs.vipr.installation.[installation_name].resolution property, and set it to one of the following values:
    Option Description
    dynamic Use when accessing ViPR data nodes directly without a load balancer.
    fixed Use when accessing ViPR data nodes through a load balancer.
    In the following example, installation_name is set to Site1.
    <property>
      <name>fs.vipr.installation.Site1.hosts.resolution</name>
      <value>dynamic</value>
    </property>
    1. If you set fs.vipr.installation.[installation_name].resolution to dynamic, add the fs.vipr.installation.[installation_name].resolution.dynamic.time_to_live_ms property to specify how often to query ViPR for the list of active nodes.
      In the following example, installation_name is set to Site1.
      <property>
      <name>fs.vipr.installation.Site1.resolution.dynamic.time_to_live_ms</name>
      <value>900000</value>
      </property>
  8. Locate the fs.defaultFS property and modify the value to specify the ViPR file system URI using the following format:viprfs://<bucket_name.namespace.installation_name.
    Where
    • bucket_name: The name of the bucket that contains the data you want to use when you run Hadoop jobs. If running in simple authentication mode, the owner of the bucket must grant permission to Everybody. In the following example, the bucket_name is set to mybucket.
    • namespace: The tenant namespace where bucket_name resides. In the following example, the namespace is set to mynamespace.
    • installation_name: The value specified by the fs.vipr.installations property. In the following example, installation_name is set to Site1.
    <property>
      <name>fs.defaultFS</name>
      <value>viprfs://mybucket.mynamespace.Site1/</value>
    </property>
  9. Locate fs.permissions.umask-mode, and set the value to 027.
    In some configurations, this property might not already exist. If it does not, then add it.
    <property>
    		<name>fs.permissions.umask-mode</name>
    		<value>027</value>
    </property>
  10. Add the fs.viprfs.auth.anonymous_translation property; use it to specify whether to map anonymously owned objects to the current user so the current user has permission to modify it.
    Option Description
    NONE (default) Do not map anonymously owned objects to the current user.
    CURRENT_USER Map anonymously owned objects to the current Unix user.
    <property>
      <name>fs.viprfs.auth.anonymous_translation</name>
      <value>CURRENT_USER</value>
    </property>
  11. Add the fs.viprfs.auth.identity_translation property, and set it to CURRENT_USER_REALM, which maps to the realm of the user signed in via kinit.
    <property>
    		<name>fs.viprfs.auth.identity_translation</name>
    		<value>CURRENT_USER_REALM</value>
    </property>
  12. Add the viprfs.security.principal property. This property tells the KDC who the ViPR user is.
    <property>
    		<name>viprfs.security.principal</name>
    		<value>ViPRServices@<realm></value>
    </property>
  13. Add the fs.vipr.auth.service.users property. Use it to map the Unix name of each ViPR service you are using to a service principal.
    <property>
    		<name>fs.vipr.auth.service.users</name>
    		<value>mapred,hive,HBase</value>
    </property>
  14. Add one fs.vipr.auth.service.[user].principal property for each service user you entered in the previous step. This property maps each service user to its realm.
    <property>
    		<name>fs.vipr.auth.service.mapred.principal</name>
    		<value>mapred/_HOST@ACME.COM</value>
    </property>
  15. Add the fs.vipr.auth.service.[user].keytab property.
    <property>
    		<name>fs.vipr.auth.service.mapred.keytab</name>
    		<value>/etc/hadoop/conf/mapred/keytab</value>
    </property>
  16. Restart the HDFS and MapReduce services.
  17. Test the configuration by running the following command to get a directory listing:
    # kinit <service principal>
    # hdfs dfs -ls viprfs://mybucket.mynamespace.Site1/
    13/12/13 22:20:37 INFO vipr.ViPRFileSystem: Initialized ViPRFS for viprfs://mybucket.mynamespace.Site1/
    

    If you have set fs.defaultFS, you can use:

    # hdfs dfs -ls /

Back to Top