ViPR 2.1 - Understanding Commodity and ECS Appliance data protection in a single site
Table of Contents
Single site data protection requires no initial setup, maintenance, or additional backup software or devices. The system handles failure and recovery operations automatically. For multi-site deployments, ViPR supports a geo-protection strategy. To learn more about geo-protection, see the ViPR Data Services Geo-protection and Multizone Access article.Back to Top
The ViPR storage engine and data fabric support data availability and data protection.
About the storage engine
The ViPR storage engine is the primary software component that ensures data availability and protection against data corruption, hardware failures, and data center disasters. It manages transactions and persists data to commodity nodes.
The storage engine writes object-related data (such as, user data, metadata, object location data) to logical containers of contiguous disk space known as chunks. Containers are open and accepting writes, or they are full. After a container is full, it is closed to further writes, and the storage engine protects the contents of the container by erasure-coding it.
The storage engine writes data in an append-only pattern so that existing data is never overwritten or modified. This strategy improves performance because locking and cache validation is not required for I/O operations. All nodes can process write requests for the same object simultaneously while writing to different sets of disks.
The storage engine tracks object location through an index that records object name, chunk id, and offset. The object location index contains three location pointers before erasure coding, and multiple location pointers after erasure coding. The storage engine performs all of the storage operations (such as, erasure coding and object recovery) on chunk containers.
About the data fabric
The data fabric is installed on each node. At the node level, the data fabric acts as a local element manager; when combined, they form a distributed cluster manager. The data fabric is responsible for:
- Cluster health: The data fabric aggregates node-specific hardware faults, and reports on the overall health of the cluster.
- Node health: The data fabric monitors the physical state of the nodes, and detects and reports faults.
- Disk health: The data fabric monitors the health of the disks and file systems. It provides raw, fast, lock-free read/write operations to the storage engine, exposes information about the individual disk drives and their status, so that the storage engine can place data across the disk drives according to the storage engine's built-in data protection algorithms.
- Software management: The data fabric has tools for installing and running services, and for installing and upgrading the fabric software on nodes in the cluster.
When a client submits a write request through one of the supported interfaces, the storage engine stores the object's data and metadata on nodes and disks within the single site. ViPR responds to object write requests as follows:
- An application submits a request to store an object.
- The storage engine receives the request, and it writes three copies of the object to chunk containers on different nodes in parallel. For example, the storage engine might write the object to chunk containers on nodes 1, 5, and 8.
- The storage engine writes the location of the chunks to the object location index.
- When all of the chunks are written successfully, ViPR acknowledges the write to the requesting application.
After ViPR acknowledges the write, it erasure-codes the chunk containers.
The storage engine also writes three copies of the object location index to three different nodes. The storage engine chooses the index locations independently from the object replica locations. It does not erasure code the three copies of the object location index.Back to Top
The storage engine implements the Reed Solomon 12/4 erasure coding scheme in which an object is broken into 12 data fragments and 4 coding fragments. The resulting 16 fragments are dispersed across the nodes in the local site. The storage engine can reconstruct an object from 12 fragments.
ViPR requires a minimum of four nodes running the object service in a single site. The system tolerates failures based on the number of nodes.
When an object is erasure coded, the original chunk data is present as a single copy that consists of 16 fragments dispersed throughout the cluster. When an object has been erasure-coded, ViPR can read objects directly without any decoding or reconstruction. ViPR only uses the code fragments for object reconstruction when there is hardware failure.Back to Top
When a client submits a read request, the storage engine uses the object location index to find which chunk containers are storing the object, it retrieves the erasure-coded fragments from multiple storage nodes in parallel, and automatically reconstructs and returns the object to the client.
When am application updates an object, the storage engine writes a new object (following the principles described earlier). The storage engine then updates the object location index to point to the new location. Because the old location is no longer referenced by an index, the original object is available for garbage collection.Back to Top
ViPR continuously monitors the health of the nodes, their disks, and objects stored in the cluster. Since ViPR disperses data protection responsibilities across the cluster, it is able to automatically re-protect at-risk objects when nodes or disks fail.
The data fabric reports disk health as Good, Suspect, or Bad.
- Good — The disk’s partitions can be read from and written to.
- Suspect — The disk has not yet met the threshold to be considered bad.
- Bad — A certain threshold has been met. Once met, no data can be read or written.
ViPR writes only to disks in good health; it does not write to disks in suspect or bad health. ViPR reads from good disks and from suspect disks. When two of an object’s copies are located on suspect disks, ViPR writes two new copies of it.
The data fabric reports node health as Good, Suspect, Degraded, or Bad.
- Good: The node is available and responding to I/O requests in a timely manner. Internal health monitoring indicates that it is in good health.
- Suspect: The node is available, but is reporting internal health information such as a fan failure (if there are multiple fans), a single power supply failure (if there are redundant power supplies). Or, the node is unreachable by the other nodes in the array, but it is visible to BMC probes and is in an unknown state.
- Degraded: The node is available but is reporting bad or suspect disks.
- Bad: The node is reachable, but internal health monitoring indicates poor health. For example, the node's fans are offline, the CPU temperature is too high, there are too many memory errors, and so on. Bad health can also be reported when the node is offline, and BMC probes indicate the health is not acceptable.
ViPR writes only to nodes in good health; it does not write to nodes in suspect, degraded, or bad health. ViPR reads from good and suspect nodes. When two of an object’s copies are located on suspect nodes, ViPR writes two new copies of it. When a node is reported as suspect or bad, all of the disks it manages are also considered suspect or bad.
When there is a failure of a node or drive in the site, ViPR:
- Identifies the objects affected by the failure.
- Reconstructs the affected objects.
- Writes the new object copies to a node that does not currently have a copy of the object.