ViPR 2.1 - Configure ViPR HDFS

Table of Contents

Configure ViPR HDFS

This article describes how to configure your existing Hadoop distribution to use the data in your ViPR storage infrastructure with ViPR HDFS. Use this step-by-step procedure if your Hadoop distribution is configured to use simple authentication and not Kerberos authentication.

If your Hadoop distribution is configured for Kerberos authentication, follow the steps described here.

To perform this integration procedure, you must have:

Back to Top

Plan the ViPR HDFS and Hadoop integration

Use this list to verify that you have the information necessary to ensure a successful integration.

To integrate ViPR HDFS with your Hadoop cluster, perform the following tasks:

  1. Obtain the ViPR HDFS installation and support package
  2. Deploy the ViPR HDFS JAR
  3. If using Pivotal HAWQ, Replace Pivotal HDFS lib with ViPR HDFS lib
  4. Edit core-site.xml
  5. Restart the following services:
    • HDFS
    • MapReduce
    • Pivotal HAWQ (only if using this service)
  6. Confirm the services restart correctly.
  7. Verify that you have file system access.

When using HBase, perform these additional tasks:

  1. Edit HBASE hbase-site.xml.
  2. Restart the HBase services.
Back to Top

Obtain the ViPR HDFS installation and support package

The ViPR HDFS JAR and HDFS support tools are provided in a ZIP file, vipr-hdfs-<version>.zip that you can download from the ViPR support pages on support.EMC.com.

The ZIP file contains\client and\tools\bin directories. Before you unzip the file, create a directory to hold the zip contents (your unzip tool might do this for you), then extract the contents to that directory. After you extract the files, the directories will contain the following:

Back to Top

Deploy the ViPR HDFS JAR

Use this procedure to put the ViPR HDFS JAR on the ViPR classpath of each client node in the ViPR cluster.

Before you begin

Obtain the ViPR HDFS JAR for your ViPR distribution from the EMC Support site for ViPR as described in Obtaining the ViPR HDFS installation and support package .

Procedure

  1. Log in to a ViPR client node.
  2. Run the classpath command to get the list of directories in the classpath:
    # hadoop classpath
  3. Copy ViPR HDFS JAR to one of folders listed by the classpath command that occurs after the /conf folder.
    ViPR distribution Class path location (suggested)
    Pivotal HD /usr/lib/gphd/hadoop/lib
    Cloudera /usr/lib/hadoop/lib
    Apache /opt/hadoop/hadoop/lib/native
  4. Repeat this procedure on each ViPR client node.
Back to Top

Configure Pivotal HAWQ

To use the Pivotal HAWQ service withViPR HDFS, you must replace Pivotal's HDFS lib with the ViPR HDFS lib.

Every time you reconfigure, deploy, or upgrade using icm_client, verify the libhdfs3 symlinks still point to libvipr-<version>.so. The HAWQ config file, hdfs-client.xml is not used by ViPR HDFS. When the system is configured to use ViPR HDFS, the HDFS name node fails to start because the defaultFS point to viprfs://vipr-endpoint/.

Procedure

  1. Copy the libvipr-<version>.so you extracted from the ViPR JAR to a local directory on each HAWQ Master and Segment node in the Pivotal cluster.
    For example: /usr/local/vipr/libvipr-<version>.so
  2. Update the libhdfs symlink in the Pivotal master and segment node's HAWQ installation directory (<HAWQ_INSTALL_DIR>)
    For example:
    <HAWQ_INSTALL_DIR>/lib/libhdfs3.so -> /usr/local/vipr/libvipr-<version>.so
             unlink <HAWQ_INSTALL_DIR>/lib/libhdfs3.so
             ln -s /usr/local/vipr/libvipr-<version>.so  <HAWQ_INSTALL_DIR>/lib/libhdfs3.so
    
    <HAWQ_INSTALL_DIR>/lib/libhdfs3.so.1 -> /usr/local/vipr/libvipr-<version>.so
             unlink <HAWQ_INSTALL_DIR>/lib/libhdfs3.so.1
             ln -s /usr/local/vipr/libvipr-<version>.so  <HAWQ_INSTALL_DIR>/lib/libhdfs3.so.1
    
  3. Update the symlink on each node in the Pivotal cluster.
Back to Top

Edit Hadoop core-site.xml file

Use this procedure to update core-site.xml with the properties needed to integrate ViPR HDFS with a Hadoop cluster that uses simple authentication mode.

Before you begin

You must have a set of user credentials that enable you to log in to Hadoop nodes and modify core-site.xml.

The location of core-site.xml depends on the distribution you are using.

core-site.xml resides on each node in the Hadoop cluster. You must modify the same properties in each instance. You can make the change in one node, and then use secure copy command (scp) to copy the file to the other nodes in the cluster.

See core_site.xml property reference for more information about each property you need to set.

Procedure

  1. Log in to one of the HDFS nodes where core-site.xml is located.
  2. Make a backup copy of core-site.xml.
    cp core-site.xml core-site.backup
  3. Using the text editor of your choice, open core-site.xml for editing.
  4. Add the following properties and values to define the Java classes that implement the ViPR HDFS file system:
    <property>
    <name>fs.viprfs.impl</name>
    <value>com.emc.hadoop.fs.vipr.ViPRFileSystem</value>
    </property> 
    <property> 
    <name>fs.AbstractFileSystem.viprfs.impl</name>
    <value>com.emc.hadoop.fs.vipr.ViPRAbstractFileSystem</value>
    </property>
  5. Add the fs.vipr.installations property. In the following example, the value is set to Site1.
    <property>
      <name>fs.vipr.installations</name>
      <value>Site1</value>
    </property>
  6. Add the fs.vipr.installation.[installation_name].hosts property as a comma-separated list of ViPR data nodes or load balancer IP addresses. In the following example, the installation_name is set to Site1.
    <property>
      <name>fs.vipr.installation.Site1.hosts</name>
      <value>203.0.113.10,203.0.113.11,203.0.113.12</value>
    </property>
  7. Add the fs.vipr.installation.[installation_name].resolution property, and set it to one of the following values:
    Option Description
    dynamic Use when accessing ViPR data nodes directly without a load balancer.
    fixed Use when accessing ViPR data nodes through a load balancer.
    In the following example, installation_name is set to Site1.
    <property>
      <name>fs.vipr.installation.Site1.hosts.resolution</name>
      <value>dynamic</value>
    </property>
    1. If you set fs.vipr.installation.[installation_name].resolution to dynamic, add the fs.vipr.installation.[installation_name].resolution.dynamic.time_to_live_ms property to specify how often to query ViPR for the list of active nodes.
      In the following example, installation_name is set to Site1.
      <property>
      <name>fs.vipr.installation.Site1.resolution.dynamic.time_to_live_ms</name>
      <value>900000</value>
      </property>
  8. Locate the fs.defaultFS property and modify the value to specify the ViPR file system URI using the following format:viprfs://<bucket_name.namespace.installation_name.
    Where
    • bucket_name: The name of the bucket that contains the data you want to use when you run Hadoop jobs. If running in simple authentication mode, the owner of the bucket must grant permission to Everybody. In the following example, the bucket_name is set to mybucket.
    • namespace: The tenant namespace where bucket_name resides. In the following example, the namespace is set to mynamespace.
    • installation_name: The value specified by the fs.vipr.installations property. In the following example, installation_name is set to Site1.
    <property>
      <name>fs.defaultFS</name>
      <value>viprfs://mybucket.mynamespace.Site1/</value>
    </property>
  9. Locate fs.permissions.umask-mode, and set the value to 022.
    In some configurations, this property might not already exist. If it does not, then add it.
    <property>
      <name>fs.permissions.umask-mode</name>
      <value>022</value>
    </property>
  10. Add the fs.viprfs.auth.anonymous_translation property; use it to specify whether to map anonymously owned objects to the current user so the current user has permission to modify it.
    Option Description
    NONE (default) Do not map anonymously owned objects to the current user.
    CURRENT_USER Map anonymously owned objects to the current Unix user.
    <property>
      <name>fs.viprfs.auth.anonymous_translation</name>
      <value>CURRENT_USER</value>
    </property>
  11. Add the fs.viprfs.auth.identity_translation property. It provides a way to assign users to a realm when Kerberos is not present.
    Option Description
    FIXED_REALM When specified, ViPR HDFS gets the realm name from the value of the fs.vipr.auth.realm property.
    NONE (default) ViPR HDFS does no realm translation.
    <property>
      <name>fs.viprfs.auth.identity_translation</name>
      <value>NONE</value>
    </property>
  12. If you set the fs.viprfs.auth.identity_translation property to FIXED_REALM, add the fs.viprfs.auth.realm property.
  13. If you want to use the Pivotal HAWQ service, add the hawq.vipr.endpoint property. Specify the value using the following format: bucket_name.namespace.installation_name.
    Where:
    • bucket_name:The name of the bucket that contains the data you want to use when you run Hadoop jobs. If running in simple authentication mode, the owner of the bucket must grant permission to Everybody. In the following example, bucket_name is set to mybucket.
    • namespace: The tenant namespace where bucket_name resides. In the following example, the namespace is set to mynamespace.
    • installation_name: The value specified by the fs.vipr.installations property. In the following example, the installation_name is set to Site1.
    You must be running a version of ViPR that supports Pivotal HAWQ. For more information, see the ViPR Support Matrix.
    <property>
    <name>hawq.vipr.endpoint</name>
    <value>mybucket.mynamespace.Site1</value>
    </property> 
  14. Save core-site.xml.
  15. Update the core-site.xml on the required nodes in your Hadoop cluster.
  16. If you are using a Cloudera distribution, use Cloudera Manager to update the core-site.xml safety valve with the same set of properties and values.
  17. Restart the Haddoop services.
    Hadoop Distribution Commands
    Pivotal HD ComputeMaster:

    # service hadoop-yarn-resourcemanager restart

    Data Nodes:

    # service hadoop-hdfs-datanode restart

    # service hadoop-yarn-nodemanager restart

    NameNode:

    # service hadoop-yarn-nodemanager restart

    If you are using the Pivotal HAWQ service, restart it by running the following commands:

    # service hawq stop

    # service hawq start

    When you configure the Pivotal Hadoop cluster to use ViPR HDFS as the default file system (specified by fs.DefaultFS in core-site.xml), you cannot use the icm_client's cluster start/stop functionality, instead, you must start all cluster services (except HDFS) individually. For example:

    icm_client start -s yarn
    icm_client start -s zookeeper 
    and so on.
    Cloudera Use Cloudera Manager to restart the HDFS and MapReduce services
    Apache # stop-all.sh

    # start-all.sh

  18. Test the configuration by running the following command to get a directory listing:
    # hdfs dfs -ls viprfs://mybucket.mynamespace.Site1/
    13/12/13 22:20:37 INFO vipr.ViPRFileSystem: Initialized ViPRFS for viprfs://mybucket.mynamespace.Site1/
    
    If you have set fs.defaultFS, you can use:

    # hdfs dfs -ls /

Back to Top

Edit HBASE hbase-site.xml

When you use HBASE with ViPRHDFS, you must set the hbase.rootdir in hbase-site.xml to the same value as the core-site.xml fs.defaultFS property.

hbase-site.xml is located in one of the following locations:

Procedure

  1. Open hbase-site.xml.
  2. Set the hbase.rootdir property to the same value as fs.defaultFS adding /hbase as the suffix.
  3. Save your changes.
    1. On Cloudera, add the hbase.rootdir property to the HBase Service Configuration Safety Valve for hbase-site.xml.
  4. Restart the services for your distribution.
    HadoopDistribution Description
    Pivotal HD Run this command on the hbase master node:
    # service hbase-master restart

    Run this command on the hbase region server:

    # service hadoop-regionserver restart
    Cloudera Use Cloudera manager to restart the HBase service.
    Apache
    # bin/start-hbase.sh

hbase.rootdir entry

<property>
  <name>hbase.rootdir</name>
  <value>viprfs://testbucket.s3.testsite/hbase</value>
</property>

Back to Top