Over the past 30 years, online storage capacity grew from about 10GB to well over 100TB for the average IT shop, 10,000X growth during that period. Expanding this math out over the next 30 years, We can expect to see the average data center storing 1 million TB’s, or 1 exabyte by 2040.
Will we be able to store all that data? My guess is yes, easily. Individual storage devices will have grown to 100’s of TB’s by that time, probably storing data bits at the atomic level.
Will we be able to access all that data quickly? Yes, I believe so, since by 2040 data will not be housed on slow spinning platters but rather on silicon or some other exotic material and will travel close to the speed of light (or maybe faster than the speed of light?)
Will we be able to manage all that data? Well, now, that’s the tricky part, isn’t it? If you think that the management of a petabyte of data is tough today, multiply that toughness by 1,000 – and that’s what the largest data centers will be dealing with someday sooner than you think. If the average data center will have an exabyte in 2040, what will the largest data center have? A zettabyte? A yottabyte? We are going to run out of names to describe the magnitude of data we’ll have…
Because of the continual growth of data, I believe that data management will quickly become the biggest problem facing IT shops. Racking up a petabyte of storage isn’t any easier than it sounds, but provisioning this storage, protecting it, and serving it up with any reasonable speed is darn near impossible.
For the answer to this looming problem, I think we need to look back – back to the mainframe. You see, the mainframe folks realized a long time ago that lots of data usually meant lots of headaches for administrators, and they came up with the idea that machines could manage these datasets much better than humans could. A notable development in this area was IBM’s DFSMS, described below:
“Data Facility Storage Management Subsystem (DFSMS™) is a software suite that automatically manages data from creation to expiration. DFSMS provides allocation control for availability and performance, backup/recovery and disaster recovery services, space management, tape management, and reporting and simulation for performance and configuration tuning.”
By classifying storage devices and data, policy-based storage was possible. IBM’s data classifications included 5 areas: data class, storage class, management class, storage group, and aggregate group.
Here is how IBM describes what they had created:
“The concept of policy-based storage management involves defining policies that allow the system to take over many storage management tasks that were previously performed manually.
DFSMS separates the logical view of data from the physical view of data. The logical view of data is concerned with what the data look like and what services the data require. The physical view is concerned with where the data actually reside. The policy types that specify the logical view of data are: data class, storage class, and management class. Storage group is the single policy type for specifying physical storage. An aggregate-group policy specifies a grouping of data for purposes of backup and recovery in case of a disaster.”
So if IBM developed the idea of policy-based storage decades ago, why isn’t everyone using it today? There are a couple reasons. First of all, the mainframe has a big advantage over server/client environments. With a mainframe, you know who the boss is. This is a boss who micro-manages everything. One operating system, one set of peripheral devices, and nobody better get out of line or else. A server/client architecture is much more autonomous and a largely peer-based structure. Disk storage systems are allowed free reign over whatever it is they do, and no one complains unless they can’t store or retrieve their data. It’s much more difficult to impose an overarching authority in this environment.
Another reason policy-based storage isn’t so popular today is that is hasn’t really been needed. Individual storage systems could scale large enough to support individual applications, resulting in storage silos that were highly functional albeit inefficient.
Enter 2011, the era of convergence between growing data and shrinking budgets. The recession we experienced during the latter part of the 2000s still stings today. Budgets are tight and inefficiency has left the building. Meanwhile, data keeps growing like the pails of water that the Sorcerer’s Apprentice couldn’t control. The constant flow of data can’t be allowed to overflow, it must be stored somewhere.
Storage silos are yesterday’s news. Virtual storage pools with automated provisioning and protection policies are the future. Expect to see storage vendors put more and more emphasis on the design of these features, and for users suffering under the weight of their data asking for them.
Larry
