Configure a secure Hadoop cluster to use VIPR HDFS

Table of Contents

Back to Top

Configure secure Hadoop cluster to use ViPR HDFS

This article describes how to configure your existing Hadoop distribution to use the data in your ViPR storage infrastructure with ViPR HDFS. Use this step-by-step procedure if your Hadoop distribution is configured to use Kerberos authentication.

This article applies to EMC ViPR 2.0.

If your Hadoop distribution is configured for simple authentication, follow the steps described here.

To perform this integration procedure, you must have:

Back to Top

Planning the ViPR HDFS and secure Hadoop cluster integration

Before you start the integration process, use this list to verify that you have the information necessary to ensure a successful integration. It is a best practice to get your Hadoop cluster working with Kerberos first; you can then integrate ViPR HDFS.

Verify that:

To enable the configuration to use Kerberos, complete the following procedures:

  1. Adding a ViPR Service Principal
  2. Configuring ViPR data nodes with the ViPR Service Principal
  3. Editing core-site.xml
  4. Restart the HDFS and MapReduce services.
  5. Confirm the services restart correctly.
  6. Verify that you have file system access.

To perform this integration procedure, you must have a working knowledge of your Hadoop distribution, the credentials to log in to Hadoop nodes, to modify Hadoop system files, and to start and stop Hadoop services.

Back to Top

Obtaining the ViPR HDFS installation and support package

The ViPR HDFS JAR and HDFS support tools are provided in a ZIP file, vipr-hdfs-<version>.zip that you can download from the ViPR support pages on support.EMC.com.

The ZIP file contains\client and\tools\bin directories. Before you unzip the file, create a directory to hold the zip contents (your unzip tool might do this for you), then extract the contents to that directory. After you extract the files, the directories will contain the following:

Back to Top

Adding a ViPR Service Principal

To enable ViPR services to authenticate with a Kerberos server, you must define a service principal and a keytab file to store the ViPR service's authentication credentials.

Before you begin

  • You must have access to the KDC server as a user with administrative permissions that can execute kadmin.
  • The Kerberos realm must already exist.
  • The service principal name must follow the required naming conventions principal@REALM.NAME. Principal names are case-sensitive.

Procedure

  1. Log in to the KDC as a user with admin privileges.
  2. Create a directory for the ViPRkeytab file, for example:

    # mkdir -p /etc/keytabs

  3. Create a principal using the addprinc command.
    Use the -randkey parameter to set the key of the principal to a random value, and the -e parameter to specify the encryption type; for example:

    # kadmin kadmin: addprinc -randkey -e "des-cbc-crc:normal" principal@REALM.NAME

  4. Add the new principal to its keytab by using the ktadd command.
    The following example command adds the principal to a keytab file called vipr.keytab.

    kadmin: ktadd -k /etc/keytabs/vipr.keytab principal@REALM.NAME

Back to Top

Configuring ViPR data nodes with the ViPR Service Principal

The ViPR service principal and its corresponding keytab file must reside on each ViPR data node. Use the setupViPRKerberosConfiguration.sh script to automate these steps.

Before you begin

You must have the following items before you can complete this procedure:
  • The setupViPRKerberosConfiguration.sh script. Obtain this script from the ViPR HDFS software package as described in Obtaining the ViPR HDFS installation and support package, and copy it to the KDC.
  • The ViPR principal and keytab files copied to the KDC.
  • The list of ViPR data services node IP addresses.

Procedure

  1. Log in to the KDC as a user with admin privileges.
  2. Using a text editor, create a text file that contains the IP addresses of each of the ViPR data nodes.
    The file can have any name. Put each IP address on its own line, for example:

    203.0.113.10 203.0.113.11 203.0.113.12

  3. Run the setupViPRKerberosConfiguration.sh passing in the following parameters:
    • -s for the principal@REALM.
    • -o for the name and path of the IP address file.
    • -k for path to the keytab file.
    The following example copies the ViPRServices principal and the ViPR keytab file to each IP addresses listed in ViPRNodes.txt:

    # setupViPRKerberosConfiguration.sh -s ViPRServices/ACME.COM@ACME.COM -o ViPRNodes.txt -k /etc/keytabs/vipr.keytab

  4. Enter the root password for the data nodes when prompted.
    The script configures the ViPR data nodes specified in the text file.
Back to Top

Editing core-site.xml

Use this procedure to update core-site.xml with the properties that are required when using ViPR HDFS with a ViPR cluster that uses Kerberos authentication mode.

Before you begin

Obtain the credentials that enable you to log in to ViPR nodes and modify core-site.xml.

See core_site.xml property reference for more information about each property you need to set.

core-site.xml resides on each node in the Hadoop cluster, and you must modify the same properties in each instance. You can make the change in one node, and then use secure copy command (scp) to copy the file to the other nodes in the cluster. As a best practice, back up core-site.xml before you start the configuration procedure.

The location of core-site.xml depends on the distribution you are using.

Procedure

  1. Log in to one of the HDFS nodes where core-site.xml is located.
  2. Make a backup copy of core-site.xml.
    cp core-site.xml core-site.backup
  3. Using the text editor of your choice, open core-site.xml for editing.
  4. Add the following properties and values to define the Java classes that implement the ViPR HDFS file system:

    <property> <name>fs.viprfs.impl</name> <value>com.emc.hadoop.fs.vipr.ViPRFileSystem</value> </property>

    <property> <name>fs.AbstractFileSystem.viprfs.impl</name> <value>com.emc.hadoop.fs.vipr.ViPRAbstractFileSystem</value> </property>

  5. Add the fs.vipr.installations property. In the following example, the value is set to Site1.

    <property> <name>fs.vipr.installations</name> <value>Site1</value> </property>

  6. Add the fs.vipr.installation.[installation_name].hosts property as a comma-separated list of ViPR data nodes or load balancer IP addresses. In the following example, the installation_name is set to Site1.

    <property> <name>fs.vipr.installation.Site1.hosts</name> <value>203.0.113.10,203.0.113.11,203.0.113.12</value> </property>

  7. Add the fs.vipr.installation.[installation_name].resolution property, and set it to one of the following values:
    Option Description
    dynamic Use when accessing ViPR data nodes directly without a load balancer.
    fixed Use when accessing ViPR data nodes through a load balancer.
    In the following example, installation_name is set to Site1.

    <property> <name>fs.vipr.installation.Site1.hosts.resolution</name> <value>dynamic</value> </property>

    1. If you set fs.vipr.installation.[installation_name].resolution to dynamic, add the fs.vipr.installation.[installation_name].resolution.dynamic.time_to_live_ms property to specify how often to query ViPR for the list of active nodes.
      In the following example, installation_name is set to Site1.

      <property> <name>fs.vipr.installation.Site1.resolution.dynamic.time_to_live_ms</name> <value>900000</value> </property>

  8. Locate the fs.defaultFS property and modify the value to specify the ViPR file system URI using the following format:viprfs://<bucket_name.namespace.installation_name.
    Where
    • bucket_name: The name of the bucket that contains the data you want to use when you run Hadoop jobs. If running in simple authentication mode, the owner of the bucket must grant permission to Everybody. In the following example, the bucket_name is set to mybucket.
    • namespace: The tenant namespace where bucket_name resides. In the following example, the namespace is set to mynamespace.
    • installation_name: The value specified by the fs.vipr.installations property. In the following example, installation_name is set to Site1.

    <property> <name>fs.defaultFS</name> <value>viprfs://mybucket.mynamespace.Site1/</value> </property>

  9. Locate fs.permissions.umask-mode, and set the value to 027.
    In some configurations, this property might not already exist. If it does not, then add it.

    <property> <name>fs.permissions.umask-mode</name> <value>027</value> </property>

  10. Add the fs.viprfs.auth.anonymous_translation property; use it to specify whether to map anonymously owned objects to the current user so the current user has permission to modify it.
    Option Description
    NONE (default) Do not map anonymously owned objects to the current user.
    CURRENT_USER Map anonymously owned objects to the current Unix user.

    <property> <name>fs.viprfs.auth.anonymous_translation</name> <value>CURRENT_USER</value> </property>

  11. Add the fs.viprfs.auth.identity_translation property, and set it to CURRENT_USER_REALM, which maps to the realm of the user signed in via kinit.

    <property> <name>fs.viprfs.auth.identity_translation</name> <value>CURRENT_USER_REALM</value> </property>

  12. Add the viprfs.security.principal property. This property tells the KDC who the ViPR user is.

    <property> <name>viprfs.security.principal</name> <value>ViPRServices@<realm></value> </property>

  13. Add the fs.vipr.auth.service.users property. Use it to map the Unix name of each ViPR service you are using to a service principal.

    <property> <name>fs.vipr.auth.service.users</name> <value>mapred,hive,HBase</value> </property>

  14. Add one fs.vipr.auth.service.[user].principal property for each service user you entered in the previous step. This property maps each service user to its realm.

    <property> <name>fs.vipr.auth.service.mapred.principal</name> <value>mapred/_HOST@ACME.COM</value> </property>

  15. Add the fs.vipr.auth.service.[user].keytab property.

    <property> <name>fs.vipr.auth.service.mapred.keytab</name> <value>/etc/hadoop/conf/mapred/keytab</value> </property>

  16. Restart the HDFS and MapReduce services.
  17. Test the configuration by running the following command to get a directory listing:

    # kinit <service principal>

    # hdfs dfs -ls viprfs://mybucket.mynamespace.Site1/

    13/12/13 22:20:37 INFO vipr.ViPRFileSystem: Initialized ViPRFS for viprfs://mybucket.mynamespace.Site1/

    If you have set fs.defaultFS, you can use:

    # hdfs dfs -ls /