Create Buckets to Support ViPR Data Services

Table of Contents

Back to Top

Introduction

Containers are required to store object data and to provide storage for HDFS data. In S3 these containers are called buckets and this term has been adopted as a general term in ViPR. In Atmos, the equivalent of a bucket is a subtenant, and in Swift the equivalent of a bucket is a container. This article describes provides information on how to create buckets to support the main use cases.

This article applies to EMC ViPR 2.0.

Buckets can be created for each object protocol using its API, and all bucket types can be configured to support HDFS functionality. An S3, Atmos, Swift, or CAS bucket configured for HDFS access can be read and written using its object protocol and using the HDFS protocol.

Where you are writing clients that use the ViPR object APIs, you will create buckets to hold object data using the programming SDKs. Where you want to create buckets from an environment other than a development environment, support for the creation of S3 buckets is provided in the ViPR UI. In addition, a ViPRAdminTool.sh is provided to enable HDFS buckets to be created from a Hadoop cluster without needing the user to interact with ViPR.

In ViPR, a bucket is associated with a project, which is used for metering purposes. All data in a bucket are accounted against the project associated with this bucket.

Back to Top

Creating buckets using the object APIs

When creating buckets using the object APIs or using tools that call the object APIs, there are a number of headers that determine the behavior.

The following x-emc headers are provided:

x-emc-vpool
Determines the data stores which will be used to store the objects associated with this bucket. If you do not specify a virtual pool using the x-emc-vpool header, ViPR will choose the default virtual pool associated with the namespace.
x-emc-file-system-access-enabled
Configures the bucket for HDFS access. The header must not conflict with the interface that is being used. That is, a createBucket request from HDFS cannot specify x-emc-file-system-access-enabled=false.
x-emc-namespace
Specifies the namespace to be used for this bucket. If the namespace is not specified using the S3 convention of host/path style request, then it can be specified using the x-emc-namespace header. If the namespace is not specified as this header, the namespace associated with the user is used.
x-emc-project-id
Specifies the project ID to associate with the new bucket. This is relevant when reporting metering data for object usage. If this header is not present, the default project id defined for the associated tenant is used.
An example of using the S3curl tool to create a bucket is provided:
Back to Top

Create a bucket using the S3 API (with s3curl)

You can use the S3 API to create a bucket in an object virtual pool. Because ViPR uses custom headers (x-emc), the string to sign must be constructed to include these headers. In this procedure the s3curl tool is used; there are also a number of programmatic clients you can use, for example, the S3 Java client.

Before you begin

  • Data Services must have at least one object virtual pool.
  • Perl must be installed on the Linux machine on which you will run s3curl.
  • You will need to have curl installed and you will need the s3curl module, which acts as a wrapper around curl.

To use s3curl with x-emc headers, minor modifications must be made to the s3curl script. These modifications are described in the procedure.

Procedure

  1. Obtain a secret key for the user who will create the bucket.
    Refer to the article: Generate a Secret Key to Access ViPR Object Storage for details.
  2. Obtain the identity of the data services virtual pool in which you want the bucket to be created.
    You can obtain the virtual pool by using the ViPR REST API:

    GET https://<ViPR Controller IP Address>:4443/vdc/data-service/vpools

    The response provides the name and identity of all data services virtual pools. For example:

    <data_service_vpools> <data_service_vpool> <creation_time>1403519186936</creation_time> <id>urn:storageos:ReplicationGroupInfo:8fc8e19b-edf0-4e81-bee8-79accc867f64:global</id> <inactive>false</inactive> <tags/> <description>IsilonVPool1</description> <name>IsilonVPool1</name> <varrayMappings> <name>urn:storageos:VirtualDataCenter:1de0bbc2-907c-4ede-b133-f5331e03e6fa:vdc1</name> <value>urn:storageos:VirtualArray:793757ab-ad51-4038-b80a-682e124eb25e:vdc1</value> </varrayMappings> </data_service_vpool> </data_service_vpools>

    Here the ID is urn:storageos:ReplicationGroupInfo:8fc8e19b-edf0-4e81-bee8-79accc867f64:global.

  3. Set up s3curl by creating a .s3curl file in which to enter the user credentials.
    The .s3curl file must have permissions 0600 (rw-/---/---) when s3curl.pl is run.
    In the example below, the profile "my_profile" is used to reference the user credentials for the "user@yourco.com" account, and "root_profile" references the credentials for the root account.

    %awsSecretAccessKeys = ( my_profile => { id => 'user@yourco.com', key => 'sZRCTZyk93IWukHEGQ3evPJEvPUq4ASL8Nre0awN' }, root_profile => { id => 'root', key => 'sZRCTZyk93IWukHEGQ3evPJEvPUq4ASL8Nre0awN' }, );

  4. Add the endpoint that you want to use s3curl against to the .s3curl file.
    This will be the address of your data node or the load balancer that sits in front of your data node.
    For example:

    push @endpoints , ( '203.0.113.10', 'lglw3183.lss.emc.com', );

  5. Modify the s3curl.pl script so that it includes the x-emc headers in its "string to sign".
    Replace the following lines:

    elsif ($header =~ /^(?'header'[Xx]-(([Aa][Mm][Zz])|([Ee][Mm][Cc]))-[^:]+): *(?'val'.+)$/) { my $name = lc $+{header}; my $value = $+{val};

    with:

    elsif ($header =~ /^([Xx]-(?:(?:[Aa][Mm][Zz])|(?:[Ee][Mm][Cc]))-[^:]+): *(.+)$/) { my $name = lc $1; my $value = $2;

  6. Create the bucket using s3curl.pl.
    Specify the following:
    • Profile of the user.
    • Identity of the virtual pool in which to create the bucket (<vpool_id>). This must be set using the x-emc-vpool header.
    • x-emc-file-system-access-enabled header to enable file system access.
    • Name of the bucket (<BucketName>).
    The fully specified command looks like this:

    ./s3curl.pl --debug --id=my_profile --acl public-read-write --createBucket -- -H 'x-emc-file-system-access-enabled:true' -H 'x-emc-vpool:<vpool_id>' http://<DataNodeIP>:9020/<BucketName>

    Note that the -acl public-read-write argument is optional, but is needed if you plan to access the bucket in an anonymous environment. For example, if you intend to access to bucket as HDFS from an environment that is not secured using Kerberos.
    If successful (with --debug on) you should see output similar to the following:

    s3curl: Found the url: host=203.0.113.10; port=9020; uri=/S3B4; query=; s3curl: ordinary endpoint signing case s3curl: StringToSign='PUT\n\n\nThu, 12 Dec 2013 07:58:39 +0000\nx-amz-acl:public-read-write \nx-emc-file-system-access-enabled:true\nx-emc-vpool: urn:storageos:ReplicationGroupInfo:8fc8e19b-edf0-4e81-bee8-79accc867f64:global:\n/S3B4' s3curl: exec curl -H Date: Thu, 12 Dec 2013 07:58:39 +0000 -H Authorization: AWS root:AiTcfMDhsi6iSq2rIbHEZon0WNo= -H x-amz-acl: public-read-write -L -H content-type: --data-binary -X PUT -H x-emc-file-system-access-enabled:true -H x-emc-vpool:urn:storageos:ObjectStore:e0506a04-340b-4e78-a694-4c389ce14dc8: http://203.0.113.10:9020/S3B4

    You can list the buckets using the S3 interface, using:

    ./s3curl.pl --debug --id=my_profile http://<DataNodeIP>:9020/

Back to Top

Creating buckets using the UI and controller REST API

The controller REST API provides the ability to create buckets. This functionality is leveraged by the Create Bucket service provided in the ViPR UI service catalog. Creating buckets from the ViPR UI is useful where you want a simple bucket creation mechanism to support the use of object ingestion, or you want to create a bucket for using with Hadoop HDFS (with simple security).

The following topics describe the use of ViPR UI and REST API to create buckets:
Back to Top

Create bucket for Data Services from the UI

The ViPR UI provides a service for creating buckets within a specified object virtual pool.

Before you begin

  • Data Services must have been configured and at least one object virtual pool must have been created and must have a data store assigned in which to create the bucket.
  • In this procedure a bucket must be assigned to a project. For a user to be able to assign a bucket to the default project, a default project must have been configured by a Tenant Administrator.

Procedure

  1. Select User > Service Catalog > Data Services > Create Bucket for Data Services.
  2. Enter a name for the bucket that will be created.
  3. Select the virtual pool in which to create the bucket.
  4. Select the project to which the bucket should be assigned or select the default project that has been assigned.
    You will be offered projects to which you have been assigned and the default project selection.
  5. If you want to enable the bucket to support HDFS access, select the Enable File System Support checkbox.
  6. Select Order to start creation.
Back to Top

Create a bucket using the ViPR Controller REST API

The ViPR Controller REST API provides methods for performing object storage configuration, including creating buckets.

Before you begin

  • ViPR object storage must have been configured using data services nodes or commodity nodes.
  • Object storage must have at least one object virtual pool which is backed by one or more virtual arrays.

A bucket must be created in an object virtual pool and must be associated with a project.

Procedure

  1. Get the UID for the project with which the bucket will be associated.
    The UIDs for all projects can be obtained using GET /projects/bulk. For example:

    GET https://<ViPR IP address>:4443/projects/bulk

    Returns the identity of any projects. In this case there is only one.

    <ids> <id>urn:storageos:Project:489e8831-2a9b-4a59-9dc6-1f9660a07f5f:global </id> </ids>

    You can then obtain the details of each project using GET /projects/{id}.
  2. Get the UID of the object virtual pool in which you want to create the bucket.
    The UIDs for all projects can be obtained using GET /object/data-services-vpools. For example:

    GET https://<ViPR IP address>:4443/vdc/data-service/vpools

    In this example, one pool is returned which has a single varrayMapping and so will have the data stores contributed by the single virtual array specified in that mapping:

    <data_service_vpools> <data_service_vpool> <creation_time>1403531485559</creation_time> <id>urn:storageos:ReplicationGroupInfo:0f360f6b-98ca-40b6-8b84-bac0f9934c0e:global</id> <inactive>false</inactive> <tags/> <description>Vpool with single varray mapping</description> <name>varray567</name> <varrayMappings> <name> urn:storageos:VirtualDataCenter:1de0bbc2-907c-4ede-b133-f5331e03e6fa:vdc1 </name> <value> urn:storageos:VirtualArray:465a3ef1-5ac5-46fc-8e1a-6fb4aefce7fd:vdc1 </value> </varrayMappings> <data_service_vpool> </data_service_vpools>

  3. Create the bucket using POST /object/bucket and specify the protocol that will be allowed to access the bucket: S3, ATMOS, SWIFT, or CAS.
    In addition to the virtual pool and the project, you also need to specify whether you want the bucket to be capable of supporting HDFS operation. To support HDFS operation, <filesystem_enabled> must be set to true. For example:

    POST https://<ViPR IP address>:4443/object/bucket

    with the following payload:

    <object_bucket_create> <name>MyBucket</name> <project>urn:storageos:Project:489e8831-2a9b-4a59-9dc6-1f9660a07f5f:global </project> <vpool> urn:storageos:ReplicationGroupInfo:0f360f6b-98ca-40b6-8b84-bac0f9934c0e:global</vpool> <filesystem_enabled>true</filesystem_enabled> <head_type>S3</head_type> <namespace>rootnamespace</namespace> </object_bucket_create>

    If you omit the namespace, the namespace of the tenant with which the user making the create bucket request is associated will be used.

    The response is of the form shown below.

    <object_bucket> <id>rootnamespace.MyBucket</id> <inactive>false</inactive> <name>MyBucket</name> <tags/> </object_bucket>

Back to Top

Creating buckets for HDFS

Hadoop HDFS support in ViPR uses the ViPR object store. Buckets for use by HDFS can be created in a number of ways, but must be marked for HDFS access.

Where you are creating buckets to support a Hadoop cluster that uses simple security (non-Kerberos), you can uses the ViPR UI or you can use the object APIs, as described in the following topics:
Where the Hadoop cluster is secured using Kerberos, and you want users authenticated against Kerberos domain to be able to create buckets, you can use the ViPR Admin tool described in the following topics:
Note Image

You should not use underscores in bucket names as they are not supported by the URI Java class. For example, viprfs://my_bucket.ns.site/ will not work as this is an invalid URI and is thus not understood by Hadoop.

Back to Top

Create a bucket for HDFS using the ViPRAdminTool

From a Hadoop cluster secured using Kerberos, you can use the ViPRAdminTool.sh to create a bucket for use by HDFS, or Object and HDFS, without requiring knowledge of the ViPR REST API or Object APIs. The tool is a wrapper around a Java class within the ViPR HDFS JAR, so can be run once the Hadoop cluster has been configured to use ViPR HDFS.

Before you begin

  • Obtain the ViPRAdminTool.sh as described in Obtaining the ViPR HDFS installation and support package
  • Hadoop must be installed and the machine on which you are running the tool must have the ViPR HDFS JAR installed and the Hadoop cluster configured to access the ViPR HDFS.
  • Kerberos security must be configured. If you do not have Kerberos security configured, you will need to create a bucket using the S3 API or the ViPR REST API.
  • Data Services must have a virtual pool that supports the type of bucket required.

Reference information for the ViPRAdminTool tool is provided in ViPR Data Services administration tool reference.

Procedure

  1. Use the ViPRAdminTool.sh script and specify: the createBucket command, the path to the ViPR node and namespace, the name of the new bucket, and the permissions to set for the bucket.
    The following command creates a bucket called "newbucket" and sets its permissions as 0755 (rwx/r-x/r-x). The object virtual pool in which the bucket is created is the default pool and the bucket is assigned to the default project.
    • ViPRAdminTool.sh createbucket viprfs://<ViPR DataVM Address>/myNamespace newbucket 0755

    If you want to specify a project and a virtual pool, you can use the controller REST API or CLI to obtain these values.
Back to Top

ViPR Data Services administration tool reference

The ViPR Data Services administration tool (ViPRAdminTool.sh) is provided to enable a HDFS-only or Object + HDFS buckets to be created from a Hadoop cluster without needing to interact with the ViPR UI or API. The tool can also be used to list, delete, and obtain the status of buckets.

Obtaining the tool

The ViPRAdminTool.sh tool is a wrapper around a Java class within the ViPR HDFS JAR, so can be run once the Hadoop cluster has been configured to use ViPR HDFS. It can be obtained as described in Obtaining the ViPR HDFS installation and support package.

Usage

  • To run the tool, use:

    ViPRAdminTool.sh <COMMAND> [ARGUMENTS]

  • You can run the tool without the script and calling the class directly using:

    hadoop com/emc/hadoop/fs/vipr/ViPRAdminClient <COMMAND> [ARGUMENTS]

A command is always required, denoted by <>, arguments are optional, denoted by [].

Note Image
You must supply all arguments up to the last one that you want to specify.

Commands

createbucket <uri><name>[permission][vpoolId][projectId][objectType]
Creates a bucket.
statbucket <uri><name>
Gets the status for specified bucket.
deletebucket <uri><name>
Deletes the specified bucket.
listbucket <uri>
Lists all buckets the current user has permission to read.

Arguments

name
Bucket name to create/stat/delete.
uri
A pointer to the ViPR deployment, in <scheme>://<DataNode Address>/<namespace> format.
permission
Valid for createbucket only. Specify POSIX permissions in octal format, such as '0755' . Default value: 0777.
vpoolId
Valid for createbucket only. Identity of the object virtual pool to be used. Default object virtual pool is used if not specified.
projectId
Valid for createbucket only. Project Id for the bucket. Default project is used if not specified.
objectType
Valid for createbucket only. Specifies the objectType allowed for this bucket: S3, ATMOS, or SWIFT. Only one object type is allowed for a bucket.

Back to Top

Obtaining the ViPR HDFS installation and support package

The VIPR HDFS JAR and HDFS support tools are provided in a ZIP file, vipr-hdfs-<version>.zip that you can download from the ViPR support pages on support.EMC.com.

The zip file contains\client and\tools\bin directories. Before you unzip you should create a directory to hold the zip contents, your unzip tool might do this for you, and unzip to that directory. The directories contain the following:
  • \tools\bin: Contains the following tools.
    • setupViPRKerberosConfiguration.sh: Used to configure the ViPR data VMs with a Kerberos service key to enable Hadoop access to the ViPR HDFS service. This script is used from the machine hosting the KDC.
    • ViPRAdminTools.sh: Enables the creation of buckets that supports HDFS access without needing to use ViPR object protocols or to use the ViPR UI.
  • \client: Contains the JAR files required by each supported Hadoop distribution. The appropriate JAR file must be copied to an appropriate location on each node in the cluster, as described in the procedures within this guide.