SPEND a little time using personal computers, and you will soon find that faster processing and more memory are the name of the game. "Bigger and faster" are bywords in the supercomputing world as well.
As computers have become more powerful and able to perform ever more complex tasks, data files have rapidly expanded. Unfortunately, data transfer rates and storage system capacities have not kept pace with the expansion of processing and memory capabilities. When preliminary work began at several Department of Energy national laboratories on the Accelerated Strategic Computing Initiative (ASCI), improving high-speed storage became a high priority. ASCI, a major component of DOE's science-based Stockpile Stewardship and Management Program and one of the largest supercomputing projects of all time, will be used to assess the safety, security, and reliability of our nuclear stockpile.
For work in creating an entirely new approach to data storage management and transfer, Lawrence Livermore, Los Alamos, Sandia, and Oak Ridge national laboratories, together with IBM Global Government Industry, recently won an R&D 100 Award. They designed and implemented the High-Performance Storage System (HPSS), which is capable of managing the storage and data transfer needs of even the most demanding supercomputers and high-speed networks.
The collaboration is based on the premise that no single organization has the ability to confront all of the issues that must be resolved for significant advances in high-performance storage system technology. Lawrence Livermore and its sister DOE laboratories have played leadership roles in this work because of their long history of development and innovation in high-end computing in order to accomplish their national defense and scientific missions.

High-Performance Storage System
The HPSS software is designed to manage data storage capacity into the petabyte range (a quadrillion, or 1015, bytes) and data transfer rates in the gigabyte-per-second range (a billion, or 109, bytes per second). It can move very large data files among high-performance computers, workstation clusters, and storage libraries at speeds 100 to 1,000 times faster than are possible with conventional storage software systems. These speeds support new large-scale applications in high-performance computing, data collection and analysis, and imaging. For example, high-definition digitized video in real time is now a possibility. The key breakthroughs are a network-centered design, parallel data input/output, and servers that can be distributed and replicated.
All competing products were designed at a time when terabyte (a trillion, or 1012 bytes) storage capacities and megabyte- (million-) per-second transfer rates were the target. Unlike most large-scale data storage systems preceding it, HPSS has a network-centered design. In conventional systems, general-purpose computers act as storage servers that connect to storage units such as disks and tapes. The servers act as intermediaries in passing data to client systems. As data rates increase for storage devices and communications links, the size of the server must also increase to provide the required capacity and total data throughput bandwidth. These high data rates and capacity demands tend to drive the storage server into the mainframe class, which can be expensive to purchase and maintain.
If the storage software system and storage devices are instead distributed over a network, control of the storage system can be separated from the flow of data. The bottleneck is removed, allowing more rapid data transmission and expanded performance and capacity. Workstation-class systems used as storage servers provide the high performance required and reduce the cost for storage-server hardware in the bargain.

Focus on the Network
Operating on a high-performance network, the HPSS is designed to allow data to be transferred directly from one or more disk or tape controllers to a client. The HPSS accommodates the simultaneous transfer of parallel data streams from multiple storage devices to computers exercising parallel applications. As the speeds and capacities of storage media and devices increase, this software will easily assimilate them to produce transfer rates in the range of multiple gigabytes per second. For example, if a system has a storage device that can deliver 100 megabytes per second but a gigabyte per second is needed, then 10 devices in parallel, controlled by HPSS software, can be used to expand, or "scale up," to the new requirement. With this design, the HPSS will be able to handle almost unlimited storage capacity, data transfer rates of billions of bytes per second and beyond, virtually unlimited file sizes, millions of naming directories, and hundreds to thousands of simultaneous clients.
HPSS uses several mechanisms, including "transactions," to ensure data reliability and integrity. Transactions are groups of operations that either take place together or not at all. The problem with distributed servers working together on a common job is that one server may fail or not be able to do its part. Transactions assure that all servers successfully complete their job or the function is aborted. Transactional integrity is common in relational data management systems, but it is new in storage systems.
HPSS is designed to support a range of supercomputing and multiprocessor client platforms, operate on many vendors' platforms, and use industry-standard storage hardware. The basic infrastructure of HPSS is the Open Software Foundation's Distributed Computing Environment because of its wide adoption among vendors and its almost universal acceptance by the computer industry.
IBM began marketing the system commercially in the fall of 1996. The HPSS has already been adopted by the California Institute of Technology/Jet Propulsion Laboratory, Cornell Theory Center, Fermi National Accelerator Laboratory, Maui High-Performance Computer Center, NASA Langley Research Center, San Diego Supercomputer Center, and the University of Washington, as well as by the participating Department of Energy laboratories. Lawrence Berkeley National Laboratory has also adopted the system through its National Energy Research Supercomputer Center, which was an original participant in the HPSS work.
There are a number of other prospective users of the HPSS. A possible customer is considering digitizing its entire film archive to produce several petabytes of data. Other applications might include oil company databases, the finance and insurance sector, the Human Genome Project, medical imaging and records, and high-energy physics. In combination with computers that can produce and manipulate huge amounts of data at ever-increasing rates, the HPSS expandable, parallel, network-based design gives users the capability to solve problems and manage information more easily than ever.

--Katie Walter

Key Words: Accelerated Strategic Computing Initiative (ASCI), computer network, hierarchical storage management, High-Performance Storage System (HPSS), large-scale computer storage, parallel computing, R&D 100 Award, supercomputing.

For further information, contact Dick Watson (510) 422-9216 (dwatson@llnl.gov), or visit the HPSS Internet home page at http://www.sdsc.edu/hpss/.

Back to October 1997