On the pages of this website, I’ve lamented the fact that although RAID seems to be on the back side of its useful life, nothing else seems to be on the horizon as a suitable replacement. Drive capacities are ever-increasing, as are RAID rebuild times. At some point, RAID will become intolerable as a device protection method.
But when? Probably long before mechanical storage devices disappear – can you imagine having to rebuild a 10TB or 20TB disk drive? Oh, and by the way, most SSD vendors still use RAID as the primary protection method – consuming up to 50% of the cost of these expensive devices.
In my attempt to discover what’s next for RAID, I decided to do some research into the original godfathers of RAID – Patterson, Katz, and Garth – to see what they are up to today, some 23 years after the Berkeley RAID research paper (which changed the face of storage) was published. Here is what I found:
David Patterson and Randy Katz are still at Berkeley as professors of computer science. Although distinguished staff members, neither appears to have had any inclination to continue research on RAID or any other means of storage device protection. It appears that both viewed the original project as just another research idea (funded by the National Science Foundation) that was interesting at the time, but nothing other than another project to tinker with.
Garth Gibson, however, seems to have kept his RAID interests alive. Now on the faculty at Carnegie Mellon University, Gibson is also the founder and CTO of Panasas, a company that specializes in high performance parallel storage arrays.
Looking deeper, I found that Garth and Panasas were indeed taking a different approach to RAID with their Tiered Parity Architecture (TPA):
Panasas Tiered Parity is a comprehensive architecture that enhances system reliability and availability. The three tiers are complimentary to each other and collectively provide the most comprehensive and scalable reliability architecture available for high performance storage today.
Vertical Parity maintains the reliability of the individual disk drive. It addresses the challenge of ever increasing numbers of media errors by isolating and repairing them at the disk level before they are seen by the RAID array.
Horizontal Parity maintains the reliability of the RAID group across multiple drives. It addresses the challenges associated with reconstruction times by using ObjectRAID to more quickly and efficiently complete reconstructions.
Network Parity maintains the integrity of the data path between the storage system and the clients. It addresses the challenge of silent data corruption introduced by the network infrastructure by performing data integrity verification at the client node itself.
Sounds like RAID on steroids, and in some respects it is. From the white paper published this year and available in the storage-brain library, there are some differences between traditional RAID and Panasas RAID:
1) If a drive sector media error is detected, Panasas used “vertical parity” to rebuild data across that sector only, preventing a complete drive rebuild.
2) If a complete drive rebuild is required, Panasas uses an “objectRAID” approach, where only the useful data on the drive is reconstructed, i.e. not the unused space on the drive.
3) When a drive is required to be rebuilt, Panasas uses multiple RAID controllers operating in parallel (“horizontal parity”) to dramatically reduce RAID rebuild times.
Is Panasas on to something here? Time will tell, but it is nice to know that at least one of the founding fathers of RAID is seeking new and better ways to offer protection against drive failures.
In the future I’ll investigate other innovations that are emerging to replace or augment traditional RAID. Stay tuned!
Larry
