|
Introduction to RAID5 One of the most common protection technique against failed disks is RAID5. RAID5 works by using an extra disk so that parity information can be stored and used to recover against a single disk failure. If there are N data disks and 1 parity disk, the RAID5 parity stripe spans N+1 disks. The storage efficiency, which measures the amount of usable disk space, for RAID5 system is N / (N+1). The storage efficiency for a RAID10 system (one that mirrors each data block across 2 disks) with 2N disks is 1/2. So RAID5 system will have higher storage efficiency over a RAID10 system, for the same number of disks used. Figure 1 shows a typical RAID5 parity computation. D1 - D4 denotes blocks of data from each disks. Parity block P is computed by XOR'ing blocks D1 to D4 together.
Figure 1: RAID5 Parity Computation When a disk fails, parity information and data from the remaining disks are used to reconstruct the data from the failed disk. In Figure 2, to recover the data block D1, blocks D2-D4 and P are used. When a read request for data from a failed disk is received, such computation must be performed. If the parity is consistent, the data from the failed disk will be correctly reconstructed.
Figure 2: RAID5 Data Recovery It is important to ensure that data and parity remain consistent under all conditions. Consistency is very important because if an inconsistent parity is used to reconstruct data from a failed disk, the reconstructed data will be incorrect, thus leading to data corruption. In conventional RAID5 implementation, it is difficult to guarantee that data and corresponding parity are simultaneously updated. In the ideal case, the update of data and corresponding parity should be simultaneous, or atomic (done as a single operation, either both succeed or both fail). Figure 3 shows the ideal case for an update process that seeks to update block D2 with block D2'. This process in conventional RAID5 implementation is called Read-Modify-Write: (1) old data D2 and old parity P are read, (2) D2 is updated with D2', and new parity is updated using D2, D2' and old parity P, (3) write D2' and P' atomically to disks. Read-Modify-Write requires four disk accesses for a block update: two read accesses and two write accesses.
Figure 3: RAID5 Read-Modify-Write It is possible that parity can be updated before or after the data update. Parity can become inconsistent if failure occurs in the storage systems, such as power failure or hardware failure, after data is updated to disks but before parity is updated. Figure 4 shows an example. D2' is written to disk, system crashes but P' is not updated. In this case, the parity is no longer consistent with the data.
Figure 4: System crash before parity update To properly ensure that parity is always consistent, high-end SCSI / Fibre Channel RAID5 system uses battery backed non-volatile RAM (NVRAM). NVRAM provides persistent storage and holds its content even after a power failure. Information sufficient to recover the parity (e.g., which disks, sectors, etc.) is logged into the NVRAM before excuting each write operations. The information only needs to be saved in the NVRAM until the next write operations. In implementations that do not have access to NVRAM (such as software implementation or low cost ATA RAID controllers), it is generally not possible to protect against such failures without significant performance penalty. One way is to log the needed information onto disks, thus incurring additional disk access for each write operation. The Linux software RAID5 implementation enforces parity consistency by recalculating all parity after an unclean shutdown (you can see this in operation by checking /proc/mdstat on Linux) --- if a disk fails during this parity generation time window (can be over one hour for a 4 disk RAID5 system), data corruption may happen. SR5 and RAID5SR5 is a software RAID5 technology that guarantees atomic parity updates without using expensive NVRAM or using disk-based logging techniques that will impact performance. In addition, SR5 provides high performance XOR computations without requiring special XOR engines. Please check out the following links to learn more about SR5.
References
|
|
Last update: October 27, 2003. Copyright © 2003 Boon Storage Technologies, Inc. All Rights Reserved. |