ViPR 2.1 - Create Buckets to Support ViPR Data Services

Table of Contents

Introduction

Containers are required to store object data and to provide storage for HDFS data. In S3 these containers are called buckets and this term has been adopted as a general term in ViPR. In Atmos, the equivalent of a bucket is a subtenant, in Swift the equivalent of a bucket is a container, and for CAS a bucket is a CAS pool. This article provides information on how to create buckets to support the main use cases.

Buckets can be created for each object protocol using its API. S3, Atmos, or Swift buckets can be configured to support HDFS and a bucket configured for HDFS access can be read and written using its object protocol and using the HDFS protocol.

Where you are writing clients that use the ViPR object APIs, you will create buckets to hold object data using the programming SDKs. Where you want to create buckets from an environment other than a development environment, support for the creation of S3 buckets is provided in the ViPR UI. In addition, a ViPRAdminTool.sh is provided to enable HDFS buckets to be created from a Hadoop cluster without needing the user to interact with ViPR.

In ViPR, a bucket is associated with a project, which is used for metering purposes. All data in a bucket are accounted against the project associated with this bucket.

Back to Top

Bucket and key naming conventions

Bucket and object/key names must conform to the specification presented here.

Note Image

If you want to use a bucket for HDFS, you should not use underscores in the bucket name as they are supported by the URI Java class. For example, viprfs://my_bucket.ns.site/ will not work as this is an invalid URI and is thus not understood by Hadoop.


Namespace name

The following rules apply to the naming of ViPR namespaces:
  • Cannot be null or an empty string
  • Length range is 1..255 (Unicode char)
  • Valid characters are defined by regex /[a-zA-Z0-9-_]+/. Hence:
    • Alphanumeric characters
    • Special characters: hyphen (-) and underscore (_).

Back to Top

S3 bucket and object naming in ViPR

This topic details the rules that apply to the naming of buckets and objects when using the ViPR S3 Object API.

Bucket name

The following rules apply to the naming of S3 buckets in ViPR:

  • Names must be between one and 255 characters in length. (S3 requires bucket names to be from 1 to 255 characters long).
  • Names can include dot (.), hyphen (-), and underscore (_) characters and alphanumeric characters ([a-zA-Z0-9]).

  • Names can start with a hyphen (-) or alphanumeric character.
  • The name does not support:
    • Starting with a dot (.)
    • Containing a double dot (.)
    • Ending with a dot (.)
    • Name must not be formatted as IPv4 address.

You can compare this with naming restriction specified by the S3 specification: http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html.

Object Name

The following rules apply to the naming of ViPR S3 objects:
  • Cannot be null or an empty string
  • Length range is 1..255 (Unicode char)
  • No validation on characters.

Back to Top

OpenStack Swift container and object naming in ViPR

This topic details the rules that apply to the naming of buckets and objects when using the ViPR OpenStack Swift Object API.

Container Name

The following rules apply to the naming of Swift containers:
  • Cannot be null or an empty string
  • Length range is 1..255 (Unicode char)
  • Valid characters are defined by regex /[a-zA-Z0-9\\.\\-_]+/
    • Alphanumeric characters
    • Special characters: dot (.), hyphen (-), and underscore (_).

Object Name

The following rules apply to the naming of Swift objects:
  • Cannot be null or an empty string
  • Length range is 1..255 (Unicode char)
  • No validation on characters.

Back to Top

Atmos bucket and object naming in ViPR

This topic details the rules that apply to the naming of buckets and objects when using the ViPR Atmos Object API.

Subtenant (bucket)

This is created by the server, so the client does not need to now the naming scheme.

Object name

The following rules apply to the naming of Atmos objects:
  • Cannot be null or an empty string
  • Length range is 1..255 (Unicode char)
  • No validation on characters.

Name should be percent-encoded UTF-8.

Back to Top

CAS pool and object naming in ViPR

This topic details the rules that apply to the naming of CAS pools and objects ('clips' in CAS terminology) when using the CAS API.

CAS pool naming

The following rules apply to the naming of CAS pools in ViPR:
  • a maximum of 255 characters
  • cannot contain: ' " / & ? * < > <tab> <newline> or <space>

Clip naming

There are no user defined keys in the CAS API. When an application using CAS API creates a clip, it opens a pool, creates a new clip, and adds tags, attributes, streams etc. After a clip is complete it is written to a device.

A corresponding clip ID is returned by CAS engine and can be referred to using <pool name>/<clip id>.

Back to Top

Creating buckets using the object APIs

When creating buckets using the object APIs or using tools that call the object APIs, there are a number of headers that determine the behavior.

The following x-emc headers are provided:

x-emc-vpool
Determines the data stores which will be used to store the objects associated with this bucket. If you do not specify a virtual pool using the x-emc-vpool header, ViPR will choose the default virtual pool associated with the namespace.
x-emc-file-system-access-enabled
Configures the bucket for HDFS access. The header must not conflict with the interface that is being used. That is, a createBucket request from HDFS cannot specify x-emc-file-system-access-enabled=false.
x-emc-namespace
Specifies the namespace to be used for this bucket. If the namespace is not specified using the S3 convention of host/path style request, then it can be specified using the x-emc-namespace header. If the namespace is not specified as this header, the namespace associated with the user is used.
x-emc-project-id
Specifies the project ID to associate with the new bucket. This is relevant when reporting metering data for object usage. If this header is not present, the default project id defined for the associated tenant is used.
An example of using the S3curl tool to create a bucket is provided:
Back to Top

Create a bucket using the S3 API (with s3curl)

You can use the S3 API to create a bucket in an object virtual pool. Because ViPR uses custom headers (x-emc), the string to sign must be constructed to include these headers. In this procedure the s3curl tool is used; there are also a number of programmatic clients you can use, for example, the S3 Java client.

Before you begin

  • Data Services must have at least one object virtual pool.
  • Perl must be installed on the Linux machine on which you will run s3curl.
  • You will need to have curl installed and you will need the s3curl module, which acts as a wrapper around curl.

To use s3curl with x-emc headers, minor modifications must be made to the s3curl script. These modifications are described in the procedure.

Procedure

  1. Obtain a secret key for the user who will create the bucket.
    Refer to the article: Generate a Secret Key to Access ViPR Object Storage for details.
  2. Obtain the identity of the data services virtual pool in which you want the bucket to be created.
    You can obtain the virtual pool by using the ViPR REST API:
    GET https://<ViPR Controller IP Address>:4443/vdc/data-service/vpools
    
    The response provides the name and identity of all data services virtual pools. For example:
    <data_service_vpools>
    <data_service_vpool>
        <creation_time>1403519186936</creation_time>
        <id>urn:storageos:ReplicationGroupInfo:8fc8e19b-edf0-4e81-bee8-79accc867f64:global</id>
        <inactive>false</inactive>
        <tags/>
        <description>IsilonVPool1</description>
        <name>IsilonVPool1</name>
        <varrayMappings>
            <name>urn:storageos:VirtualDataCenter:1de0bbc2-907c-4ede-b133-f5331e03e6fa:vdc1</name>
            <value>urn:storageos:VirtualArray:793757ab-ad51-4038-b80a-682e124eb25e:vdc1</value>
        </varrayMappings>
    </data_service_vpool>
    </data_service_vpools>

    Here the ID is urn:storageos:ReplicationGroupInfo:8fc8e19b-edf0-4e81-bee8-79accc867f64:global.

  3. Set up s3curl by creating a .s3curl file in which to enter the user credentials.
    The .s3curl file must have permissions 0600 (rw-/---/---) when s3curl.pl is run.
    In the example below, the profile "my_profile" is used to reference the user credentials for the "user@yourco.com" account, and "root_profile" references the credentials for the root account.
    %awsSecretAccessKeys = (
        my_profile => {
            id  => 'user@yourco.com',
            key => 'sZRCTZyk93IWukHEGQ3evPJEvPUq4ASL8Nre0awN'
        },
       root_profile => {
            id  => 'root',
            key => 'sZRCTZyk93IWukHEGQ3evPJEvPUq4ASL8Nre0awN'
        },
    );
    
  4. Add the endpoint that you want to use s3curl against to the .s3curl file.
    This will be the address of your data node or the load balancer that sits in front of your data node.
    For example:
    push @endpoints , (
        '203.0.113.10',  'lglw3183.lss.emc.com',
    );
    
  5. Modify the s3curl.pl script so that it includes the x-emc headers in its "string to sign".
    Replace the following lines:
    elsif ($header =~ /^(?'header'[Xx]-(([Aa][Mm][Zz])|([Ee][Mm][Cc]))-[^:]+): *(?'val'.+)$/) {
    
        my $name = lc $+{header};
        my $value = $+{val};
    
    
    with:
     
    
    elsif ($header =~ /^([Xx]-(?:(?:[Aa][Mm][Zz])|(?:[Ee][Mm][Cc]))-[^:]+): *(.+)$/) {
       
        my $name = lc $1;
        my $value = $2;
    
    
  6. Create the bucket using s3curl.pl.
    Specify the following:
    • Profile of the user.
    • Identity of the virtual pool in which to create the bucket (<vpool_id>). This must be set using the x-emc-vpool header.
    • x-emc-file-system-access-enabled header to enable file system access.
    • Name of the bucket (<BucketName>).
    The fully specified command looks like this:
    ./s3curl.pl --debug --id=my_profile --acl public-read-write 
    --createBucket -- -H 'x-emc-file-system-access-enabled:true' 
    -H 'x-emc-vpool:<vpool_id>' http://<DataNodeIP>:9020/<BucketName>
    
    
    Note that the -acl public-read-write argument is optional, but is needed if you plan to access the bucket in an anonymous environment. For example, if you intend to access to bucket as HDFS from an environment that is not secured using Kerberos.
    If successful (with --debug on) you should see output similar to the following:
    s3curl: Found the url: host=203.0.113.10; port=9020; uri=/S3B4; query=;
    s3curl: ordinary endpoint signing case
    s3curl: StringToSign='PUT\n\n\nThu, 12 Dec 2013 07:58:39 +0000\nx-amz-acl:public-read-write
    \nx-emc-file-system-access-enabled:true\nx-emc-vpool:
    urn:storageos:ReplicationGroupInfo:8fc8e19b-edf0-4e81-bee8-79accc867f64:global:\n/S3B4'
    s3curl: exec curl -H Date: Thu, 12 Dec 2013 07:58:39 +0000 -H Authorization: AWS 
    root:AiTcfMDhsi6iSq2rIbHEZon0WNo= -H x-amz-acl: public-read-write -L -H content-type:  
    --data-binary  -X PUT -H x-emc-file-system-access-enabled:true 
    -H x-emc-vpool:urn:storageos:ObjectStore:e0506a04-340b-4e78-a694-4c389ce14dc8: http://203.0.113.10:9020/S3B4
    
    You can list the buckets using the S3 interface, using:
    ./s3curl.pl --debug --id=my_profile http://<DataNodeIP>:9020/
Back to Top

Creating buckets using the UI and controller REST API

The controller REST API provides the ability to create buckets. This functionality is leveraged by the Create Bucket service provided in the ViPR UI service catalog. Creating buckets from the ViPR UI is useful where you want a simple bucket creation mechanism to support the use of object ingestion, or you want to create a bucket for using with Hadoop HDFS (with simple security).

The following topics describe the use of ViPR UI and REST API to create buckets:
Back to Top

Create bucket for Data Services from the UI

The ViPR UI provides a service for creating buckets within a specified object virtual pool.

Before you begin

  • Data Services must have been configured and at least one object virtual pool must have been created and must have a data store assigned in which to create the bucket.
  • In this procedure a bucket must be assigned to a project. For a user to be able to assign a bucket to the default project, a default project must have been configured by a Tenant Administrator.

Procedure

  1. Select User > Service Catalog > Data Services > Create Bucket for Data Services.
  2. Enter a name for the bucket that will be created.
  3. Select the virtual pool in which to create the bucket.
  4. Select the project to which the bucket should be assigned or select the default project that has been assigned.
    You will be offered projects to which you have been assigned and the default project selection.
  5. If you want to enable the bucket to support HDFS access, select the Enable File System Support checkbox.
  6. Select Order to start creation.
Back to Top

Create a bucket using the ViPR Controller REST API

The ViPR Controller REST API provides methods for performing object storage configuration, including creating buckets.

Before you begin

  • ViPR object storage must have been configured using data services nodes or commodity nodes.
  • Object storage must have at least one object virtual pool which is backed by one or more virtual arrays.

A bucket must be created in an object virtual pool and must be associated with a project.

Procedure

  1. Get the UID for the project with which the bucket will be associated.
    The UIDs for all projects can be obtained using GET /projects/bulk. For example:
    GET https://<ViPR IP address>:4443/projects/bulk
    
    Returns the identity of any projects. In this case there is only one.
    <ids>
        <id>urn:storageos:Project:489e8831-2a9b-4a59-9dc6-1f9660a07f5f:global
        </id>
    </ids>
    
    You can then obtain the details of each project using GET /projects/{id}.
  2. Get the UID of the object virtual pool in which you want to create the bucket.
    The UIDs for all projects can be obtained using GET /object/data-services-vpools. For example:
    GET https://<ViPR IP address>:4443/vdc/data-service/vpools
    
    In this example, one pool is returned which has a single varrayMapping and so will have the data stores contributed by the single virtual array specified in that mapping:
    <data_service_vpools>
    <data_service_vpool>
        <creation_time>1403531485559</creation_time>
        <id>urn:storageos:ReplicationGroupInfo:0f360f6b-98ca-40b6-8b84-bac0f9934c0e:global</id>
        <inactive>false</inactive>
        <tags/>
        <description>Vpool with single varray mapping</description>
        <name>varray567</name>
        <varrayMappings>
        <name>
        urn:storageos:VirtualDataCenter:1de0bbc2-907c-4ede-b133-f5331e03e6fa:vdc1
        </name>
        <value>
        urn:storageos:VirtualArray:465a3ef1-5ac5-46fc-8e1a-6fb4aefce7fd:vdc1
        </value>
        </varrayMappings>
    <data_service_vpool>
    </data_service_vpools>
  3. Create the bucket using POST /object/bucket and specify the protocol that will be allowed to access the bucket: S3, ATMOS, SWIFT, or CAS.
    In addition to the virtual pool and the project, you also need to specify whether you want the bucket to be capable of supporting HDFS operation. To support HDFS operation, <filesystem_enabled> must be set to true. For example:
    POST https://<ViPR IP address>:4443/object/bucket
    
    with the following payload:
    <object_bucket_create>
       <name>MyBucket</name>
       <project>urn:storageos:Project:489e8831-2a9b-4a59-9dc6-1f9660a07f5f:global
       </project>
       <vpool> 
       urn:storageos:ReplicationGroupInfo:0f360f6b-98ca-40b6-8b84-bac0f9934c0e:global</vpool>
       <filesystem_enabled>true</filesystem_enabled>
       <head_type>S3</head_type>
       <namespace>rootnamespace</namespace>
    </object_bucket_create>

    If you omit the namespace, the namespace of the tenant with which the user making the create bucket request is associated will be used.

    The response is of the form shown below.
    <object_bucket>
        <id>rootnamespace.MyBucket</id>
        <inactive>false</inactive>
        <name>MyBucket</name>
        <tags/>
    </object_bucket>
    
Back to Top

Creating buckets for HDFS

Hadoop HDFS support in ViPR uses the ViPR object store. Buckets for use by HDFS can be created in a number of ways, but must be marked for HDFS access.

Where you are creating buckets to support a Hadoop cluster that uses simple security (non-Kerberos), you can uses the ViPR UI or you can use the object APIs, as described in the following topics:
Where the Hadoop cluster is secured using Kerberos, and you want users authenticated against Kerberos domain to be able to create buckets, you can use the ViPR Admin tool described in the following topics:
Note Image

You should not use underscores in bucket names as they are not supported by the URI Java class. For example, viprfs://my_bucket.ns.site/ will not work as this is an invalid URI and is thus not understood by Hadoop.


Back to Top

Create a bucket for HDFS using the ViPRAdminTool

From a Hadoop cluster secured using Kerberos, you can use the ViPRAdminTool.sh to create a bucket for use by HDFS, or Object and HDFS, without requiring knowledge of the ViPR REST API or Object APIs. The tool is a wrapper around a Java class within the ViPR HDFS JAR, so can be run once the Hadoop cluster has been configured to use ViPR HDFS.

Before you begin

  • Obtain the ViPRAdminTool.sh as described in Obtaining the ViPR HDFS installation and support package
  • Hadoop must be installed and the machine on which you are running the tool must have the ViPR HDFS JAR installed and the Hadoop cluster configured to access the ViPR HDFS.
  • Kerberos security must be configured. If you do not have Kerberos security configured, you will need to create a bucket using the S3 API or the ViPR REST API.
  • Data Services must have a virtual pool that supports the type of bucket required.

Reference information for the ViPRAdminTool tool is provided in ViPR Data Services administration tool reference.

Procedure

  1. Use the ViPRAdminTool.sh script and specify: the createBucket command, the path to the ViPR node and namespace, the name of the new bucket, and the permissions to set for the bucket.
    The following command creates a bucket called "newbucket" and sets its permissions as 0755 (rwx/r-x/r-x). The object virtual pool in which the bucket is created is the default pool and the bucket is assigned to the default project.
    • ViPRAdminTool.sh createbucket viprfs://<ViPR DataVM Address>/myNamespace newbucket 0755
      				
    If you want to specify a project and a virtual pool, you can use the controller REST API or CLI to obtain these values.
Back to Top

ViPR Data Services administration tool reference

The ViPR Data Services administration tool (ViPRAdminTool.sh) is provided to enable a HDFS-only or Object + HDFS buckets to be created from a Hadoop cluster without needing to interact with the ViPR UI or API. The tool can also be used to list, delete, and obtain the status of buckets.

Obtaining the tool

The ViPRAdminTool.sh tool is a wrapper around a Java class within the ViPR HDFS JAR, so can be run once the Hadoop cluster has been configured to use ViPR HDFS. It can be obtained as described in Obtaining the ViPR HDFS installation and support package.

Usage

  • To run the tool, use:
    ViPRAdminTool.sh <COMMAND> [ARGUMENTS]
  • You can run the tool without the script and calling the class directly using:
    hadoop com/emc/hadoop/fs/vipr/ViPRAdminClient <COMMAND> [ARGUMENTS]

A command is always required, denoted by <>, arguments are optional, denoted by [].

Note Image
You must supply all arguments up to the last one that you want to specify.

Commands

createbucket <uri><name>[permission][vpoolId][projectId][objectType]
Creates a bucket.
statbucket <uri><name>
Gets the status for specified bucket.
deletebucket <uri><name>
Deletes the specified bucket.
listbucket <uri>
Lists all buckets the current user has permission to read.

Arguments

name
Bucket name to create/stat/delete.
uri
A pointer to the ViPR deployment, in <scheme>://<DataNode Address>/<namespace> format.
permission
Valid for createbucket only. Specify POSIX permissions in octal format, such as '0755' . Default value: 0777.
vpoolId
Valid for createbucket only. Identity of the object virtual pool to be used. Default object virtual pool is used if not specified.
projectId
Valid for createbucket only. Project Id for the bucket. Default project is used if not specified.
objectType
Valid for createbucket only. Specifies the objectType allowed for this bucket: S3, ATMOS, or SWIFT. Only one object type is allowed for a bucket.

Back to Top

Obtaining the ViPR HDFS installation and support package

The ViPR HDFS JAR and HDFS support tools are provided in a ZIP file, vipr-hdfs-<version>.zip that you can download from the ViPR support pages on support.EMC.com.

The ZIP file contains\client and\tools\bin directories. Before you unzip the file, create a directory to hold the zip contents (your unzip tool might do this for you), then extract the contents to that directory. After you extract the files, the directories will contain the following:

  • \tools\bin: Contains the following tools.
    • setupViPRKerberosConfiguration.sh: Configures the ViPR data nodes with a Kerberos service key to enable Hadoop access to the ViPR HDFS service. Run this script from the machine hosting the KDC.
    • ViPRAdminTools.sh: Enables the creation of buckets that supports HDFS access without needing to use ViPR object protocols or to use the ViPR UI.
  • \client: Contains the following files:
    • ViPR JAR files: Used to configure different Hadoop distributions.
    • libvipr-<version>.so: Used to configure Pivotal HAWQ for use with ViPR HDFS.
Back to Top