Cover V13, i11

Article

nov2004.tar

Enter the Storage Administrator

Greg (shoe) Schuweiler

The tasks of a systems administrator's job are changing. In small and large environments, systems administrators used to spend their days (and nights) managing any number of servers with locally attached disks with an occasional timeout for a game of X-Pilot, Doom, etc. We tuned the servers, managed the disks and the data, and backed up the data to locally attached tape drives.

The failure of a disk usually meant an outage while the failed disk was replaced, formatted, file systems built, and the data was restored from tape. The outage was a fact of life that both the users and administrators accepted. These were not always the good old days. I do not miss trying to justify to management a 540-MB hard drive that cost $3500. I do miss running scripts via cron that automatically sent an email stating the top five disk hogs to everyone with an account on the system and letting peer pressure do the rest. Alas, one must be a gentler systems administrator nowadays. Sigh.

Now I have a filing cabinet drawer full of 4.5-GB, 9-GB, 18-GB, and some 36-GB SCSI drives that are perfectly serviceable but too small to be used in production. I just cannot bear to part with them. We now purchase wads of storage that are racked into datacenters in 0.5 and 1-TB clumps (a techie term) or disk arrays that are monolithic blobs (another techie term) that snap together like Lego building blocks and add storage by the tens of terabytes.

Our businesses' quest for acquiring data makes a salmon's run upstream to its spawning place look like a Sunday stroll around a pond. Because of regulatory compliance reasons, business policies, or both, data is being stored longer and more copies are being kept. We are slicing, dicing, reassembling, and stuffing the results into databases of sizes unimaginable just 10 years ago. Queries and processing against this data take place 24x7, shrinking our backup windows from all night to nothing.

A good part of a sys admin's time is now spent, if not directly managing storage, answering requests for more space. As server hardware and operating systems improve in reliability, the management of these servers will soon be superseded by the management of storage and the data that is on that storage.

Systems administrators are trying to keep up with server, network-attached storage equipment such as switches, GBICs, arrays, operating system changes, firmware levels, boot proms, virtualization, consolidation, user demands, and scores of other things that are involved in the data acquisition, processing, and storage environment. While we are overwhelmed with requests from customers and managers, our management and executives are inundated with ads and sales people and are making decisions that will ultimately affect us and the systems for which we're responsible.

The time has come for us to look seriously at a whole new career path that is opening up in the world of systems administration. This new path is the management of storage and the data contained on it. I propose we name this position right up front. Let's call it "System Storage Administrator" or maybe "System Storage Architect". Either will do, but I am betting that the latter will get you at least $6K more a year and possibly an office (it might be next to the furnace room but, hey, won't Mom be proud).

So, one of the first things on your agenda should be to influence management all the way to the CIO that a dedicated storage expert or team is necessary to ensure business continuity. Depending on the size of your organization and number of the levels between you and the CIO, this idea may become someone else's along the way. Don't fret, because in this case the means justifies the end. It may be poor judgment on my part, but I am going to assume here that various levels of management have already heard from vendors or elsewhere of the concept of storage virtualization, of the savings from storage consolidation, and a host of reasons why they (the vendor) can build your business a Stepford Storage Area Network. If they haven't, then this will be another task on your list of things to do.

Most of the discussions you will have about forming a storage administration position or team with your management should be centered around two things:

1. Saving the business money in the long run

2. Providing a high level of data reliability (i.e., ensuring that the data will be there when it needs to be)

To do this, start by running a bit of history past them. For much of the history of computing, storage has been seen as an intrinsic part of a computer system. It was regarded as a "peripheral". In more recent years, it has become thought of as a storage subsystem but is still uniquely associated with a computer. Exceptions are the mainframe and some computer clusters where a modest number of cooperating computer systems share a common set of storage devices.

As companies have become more dependent upon computing, they have also become more dependent upon data, the life-blood of computing. While a failed processor can readily be replaced and operations continued almost immediately after the replacement, a failed storage resource requires replacement followed by typically time-consuming restoration of data. This restoration all too often involves some loss of recent changes to that data that requires recovery action before operations can continue. As a result, storage and the disciplines of caring for data and the storage on which it resides have grown in visibility and importance.

Additionally, the fraction of the purchase price of a computer system that is represented by the storage component has grown to the point that now the storage component of a computer system is often in the vicinity of half or more of the total price. Beyond the purchase price of storage, the total cost of owning storage has become a significant part of the cost of maintaining the computing environment. In other words, the acquisition cost is a small portion of the total cost of ownership of storage over the lifetime of the storage.

Computing environments necessarily have grown as we have become increasingly reliant on computing. Thus, the number of computer systems we manage has grown in size and in number. Because the traditional computing model associates storage uniquely with a computer system, a computing environment with many computer systems has many storage and storage management environments to maintain and operate -- one per system.

Responding to these trends, the IT community has begun to view storage as a resource that should be purchased and managed independently from the computer systems that it serves. The IT community has also increasingly come to view storage as a resource that should be shared among computer systems. These changes allow more focused attention on storage, which leads to reduced costs, higher levels of service, and more flexibility through the sharing of the storage resource. This, in turn, allows a storage administrator or storage team to provide improved quality and response time as business needs change. As the storage system for a computing environment becomes a shared, independent resource, additional requirements emerge:

  • Reliability -- As required of any large, shared critical resource.
  • Scalability -- To match the size, performance, and physical and geographic placement of computing environments.
  • Manageability -- To provide high levels of service and achieve the expected reduction in operational expenses.
  • Standards-based interoperation -- To avoid excessive vendor dependence in a large, critical component of data centers.

When all these aspects are considered, a structure emerges that achieves the goals of reliability, scalability, and manageability. That structure is a storage system comprising many computer systems that are the consumers of the storage system, many storage devices, and extensive management capabilities. These systems are richly interconnected and demand high performance. These are the characteristics of a shared storage environment, and these are the benefits that a storage administrator or team can bring to the business.

If I have convinced you, and you in turn have convinced management that a storage administrator or team is needed, the next step is to determine that storage administrator's responsibilities. This administrator should have three primary goals that are intertwined with each other:

  • Ensuring data protection (including security)
  • Reducing total cost of ownership (TCO) through optimization
  • Ensuring data availability

Loss of data in any business should be considered intolerable. Depending on the importance and the content of the data, the storage team will need to work with the application owners, legal departments, even dare I say it, marketing, to determine the level of protection that different types of data will require. Critical data to the operation of the business may be RAID5 with replication to a remote datacenter, whereas views from databases that can be rebuilt may be a RAID0 for quick access. Additionally, the data must be protected from accidental or malicious access from host systems or individuals not permitted to view or access data not intended for them.

The storage team will reduce the total cost of ownership by optimizing hardware utilization, backup software, management, and management software of the components of the storage network pieces. Because the storage team's concern is storage only, they will be able to be computer system vendor neutral. This will allow the OS and storage administrators to concentrate on their core areas of responsibility.

The availability of the data must meet the needs of the business. Storage administrators will need to work with systems administrators to provide the required data availability for the applications and departments as needed to provide business continuity.

So, now that we have a new title, what exactly should be the job description? Where should the duties between systems administrator and storage architect be divided? What should the storage administrator be in charge of? Depending on the makeup of your organization, there will be some grey areas of responsibility with the systems, storages, and network administrator teams or individuals. Here is a suggested checklist for the storage administrators:

  • Backup, recovery, and disaster recovery (DR) schemes, schedules, and test plans
  • Backup servers, backup managers, APIs resident on slave servers
  • Enterprise storage arrays, Gigabit Interface Controller (GBIC)
  • Enterprise tape drives, libraries, Active Template Library (ATL), Vtape, optical libraries
  • External, SAN management software (Logical Unit Number (LUN) mapping, management) for both in-band and out-of-band implementations
  • Fibre Channel (FC) extenders, dense wavelength division multiplexing (DWDM), and other native FC multiplexing or amplification devices used for long-distance extension
  • Fibre Channel hubs, FC switches, FC directors
  • Fibre Channel-asynchronous transfer mode (FC-ATM) or other gateway devices that use the WAN to tunnel FC or otherwise extend FC through WAN access
  • Fibre Channel cables, plant management, patch panels
  • Fibre Channel/Internet Protocol (FC/IP) or Internet Small Computer Systems Interface (iSCSI) gateway device
  • Fibre Channel-Small Computer Systems Interface (FC-SCSI) bridges
  • HBA utilities and software
  • Host or storage data replication, snapshot, redundant arrays of independent disks (RAID), or data mover software
  • Host volume management software
  • Just A Bunch Of Disks (JBOD) connected to pooled fabric architecture
  • NAS appliance, NAS filer, or other general-purpose server used for Common Internet File System (CIFS) or Network File System (NFS) file serving
  • SAN management software imbedded within switches or directors
  • Server Host Bust Adapters (HBAs) selection
  • Storage virtualization software
  • Switch and director OS software

Additionally, storage administrators would be responsible for selecting product standards, defining standard configurations, collaborating on component architecture design principles with other teams as they architect storage systems for new requirements, and planning and executing projects for data storage.

After reading the above list, I can picture some administrators climbing over their piles of antiquated hardware and manuals for non-existent software to get out the door and start shouting that I am a heretic. As I said previously, there will definitely be some grey areas where administrators from different areas will need to work together. Additionally, backups may not be part of storage administration in your company. Some companies already have people dedicated to backups and restores.

Where do you go from here? You might look at the Association of Storage Network Professionals (ASNP) at http://www.asnp.org. It's a user organization put together by users; membership is free, and there are some great discussion forums on the Web site. Be wary of vendors who believe that they can provide you with a Stepford Storage Area Network. Nothing is perfect. I recommend reading Designing Storage Area Networks by Tom Clark (Addison-Wesley). It provides a very good introduction to this subject. White papers from vendors are very helpful, but remember that they are almost always vendor-specific.

Greg (shoe) Schuweiler has worked in the friendly Midwest for the past 20 years as a consultant, an embedded software designer, Oracle DBA, and a host of other strange titles. He has been in the Unix systems administration area for the past 8 years. He is one of the early joiners of the Association of Storage Network Professionals and can be reached at: gregs@asnp.org.