ViPR 2.1 - Understanding Commodity and ECS Appliance data protection in a single site

Table of Contents


ViPR implements end-to-end protection of object data for node and disk failures on Commodity and ECS Appliance systems through a combination of local replication and erasure coding. ViPR ensures data durability, reliability, and availability of objects by creating and distributing three copies of objects and their metadata across the set of nodes and disks in the local site. After the three copies are successfully written, ViPR erasure-codes the object copies to reduce storage overhead.

Single site data protection requires no initial setup, maintenance, or additional backup software or devices. The system handles failure and recovery operations automatically. For multi-site deployments, ViPR supports a geo-protection strategy. To learn more about geo-protection, see the ViPR Data Services Geo-protection and Multizone Access article.

Back to Top

Data protection components

The ViPR storage engine and data fabric support data availability and data protection.

About the storage engine

The ViPR storage engine is the primary software component that ensures data availability and protection against data corruption, hardware failures, and data center disasters. It manages transactions and persists data to commodity nodes.

The storage engine writes object-related data (such as, user data, metadata, object location data) to logical containers of contiguous disk space known as chunks. Containers are open and accepting writes, or they are full. After a container is full, it is closed to further writes, and the storage engine protects the contents of the container by erasure-coding it.

The storage engine writes data in an append-only pattern so that existing data is never overwritten or modified. This strategy improves performance because locking and cache validation is not required for I/O operations. All nodes can process write requests for the same object simultaneously while writing to different sets of disks.

The storage engine tracks object location through an index that records object name, chunk id, and offset. The object location index contains three location pointers before erasure coding, and multiple location pointers after erasure coding. The storage engine performs all of the storage operations (such as, erasure coding and object recovery) on chunk containers.

About the data fabric

The data fabric is installed on each node. At the node level, the data fabric acts as a local element manager; when combined, they form a distributed cluster manager. The data fabric is responsible for:

Back to Top

Data protection on object writes

When a client submits a write request through one of the supported interfaces, the storage engine stores the object's data and metadata on nodes and disks within the single site. ViPR responds to object write requests as follows:

  1. An application submits a request to store an object.
  2. The storage engine receives the request, and it writes three copies of the object to chunk containers on different nodes in parallel. For example, the storage engine might write the object to chunk containers on nodes 1, 5, and 8.
  3. The storage engine writes the location of the chunks to the object location index.
  4. When all of the chunks are written successfully, ViPR acknowledges the write to the requesting application.

After ViPR acknowledges the write, it erasure-codes the chunk containers.

The storage engine also writes three copies of the object location index to three different nodes. The storage engine chooses the index locations independently from the object replica locations. It does not erasure code the three copies of the object location index.

Back to Top

Erasure coding

ViPR uses erasure coding because it provides better storage efficiency without compromising data protection.

The storage engine implements the Reed Solomon 12/4 erasure coding scheme in which an object is broken into 12 data fragments and 4 coding fragments. The resulting 16 fragments are dispersed across the nodes in the local site. The storage engine can reconstruct an object from 12 fragments.

ViPR requires a minimum of four nodes running the object service in a single site. The system tolerates failures based on the number of nodes.

When an object is erasure coded, the original chunk data is present as a single copy that consists of 16 fragments dispersed throughout the cluster. When an object has been erasure-coded, ViPR can read objects directly without any decoding or reconstruction. ViPR only uses the code fragments for object reconstruction when there is hardware failure.

Back to Top

Data protection on object reads and object updates

Object reads

When a client submits a read request, the storage engine uses the object location index to find which chunk containers are storing the object, it retrieves the erasure-coded fragments from multiple storage nodes in parallel, and automatically reconstructs and returns the object to the client.

Object updates

When am application updates an object, the storage engine writes a new object (following the principles described earlier). The storage engine then updates the object location index to point to the new location. Because the old location is no longer referenced by an index, the original object is available for garbage collection.

Back to Top

Data recovery on disk and node failures

ViPR continuously monitors the health of the nodes, their disks, and objects stored in the cluster. Since ViPR disperses data protection responsibilities across the cluster, it is able to automatically re-protect at-risk objects when nodes or disks fail.

Disk health

The data fabric reports disk health as Good, Suspect, or Bad.

ViPR writes only to disks in good health; it does not write to disks in suspect or bad health. ViPR reads from good disks and from suspect disks. When two of an object’s copies are located on suspect disks, ViPR writes two new copies of it.

Node health

The data fabric reports node health as Good, Suspect, Degraded, or Bad.

ViPR writes only to nodes in good health; it does not write to nodes in suspect, degraded, or bad health. ViPR reads from good and suspect nodes. When two of an object’s copies are located on suspect nodes, ViPR writes two new copies of it. When a node is reported as suspect or bad, all of the disks it manages are also considered suspect or bad.

Data recovery

When there is a failure of a node or drive in the site, ViPR:

  1. Identifies the objects affected by the failure.
  2. Reconstructs the affected objects.
  3. Writes the new object copies to a node that does not currently have a copy of the object.

Back to Top