High-Tech Computer Experts
Your source for computing technology
                                  Storage


HTCE Home      E-mail us
 
Products by general specifications (Click the link of the product you prefer to go there now)
Introduction
Three technologies address the ever-growing requirements for storage by incorporating multiple units into a larger system. Optical jukeboxes manipulate many optical discs to provide hundreds of gigabytes of optical storage. Tape libraries, or autochangers, use multiple drives and robotic arms to manipulate as many as 18,000 tape cartridges.1 RAID is a system incorporating two or more magnetic hard disks into a system that provides data protection, as well as the potential for increased capacity and speed.

Databases have been created for each of these technology areas. Included in each database are the true Mass Storage systems--defined by the Goddard Conference as greater than 0.1 terabyte. The following sections will discuss each mass storage type more fully.
 

Optical Jukebox
An optical jukebox is a storage system that has one or more drives, slots to hold (store) the disks, and robotics that move the disks to or from slot or drive. Information is written to or read from marks on the surface of a disk, or platter, using a laser-based optical stylus. Each of the optical drive technologies discussed --WORM (write-once-read-many), CD-ROM, rewritable, and multifunction--can be found incorporated into optical jukeboxes.
 

Write-Once-Read-Many
The original jukeboxes used WORM technology on 12- or 14-inch platters. Today's 12-inch disks can hold as much as 10.2 gigabytes on a single platter. One reference2 lists a Filenet jukebox containing 340 platters with a capacity of over 2 terabytes, run on up to 6 drives. This same reference, however, discusses the fact that having too few drives results in a sluggish system. It recommends a ratio of disks to drives of 50:1.

The Navy Research Lab's Ruth H. Hooker Research Library is utilizing SONY WDA-610 optical jukeboxes to convert research papers into stored digital images.3 The collection of 600,000 titles, dating back to the 1940s, is expected to fit into 2 jukeboxes. Additional jukeboxes can be daisy-chained together to provide up to 1.3 terabytes of storage through one SCSI interface. After scanning and verification that a good image has been obtained, the digital image is stored to 12-inch WORM disks, and the paper documents and duplicates are disposed of. Each WORM disk can hold up to 130,000 pages (6.5 GB) of information.
 

CD-ROM
CD-ROM, the first commercial optical-storage technology, is a data version of the audio compact disc. Close to 1 MB of data can be stored on the disk. CD-ROM drives have slow average seek times of 300 to 500 msec, with transfer rates of 300 to 600 KB/sec. CD-ROM jukeboxes are generally small 6-disc minichangers with a capacity of less than 6 GB. Pioneer recently introduced an 18-disc jukebox with about 11 GB capacity. However, the single reader allows access to only one disk at a time and limits the utility of the jukebox to small networks or a single user.

The low cost (under $200) of CD-ROM drives, low media cost, media removability, and read-only format make CD-ROM a publishing and distribution medium. Some manufacturers are gearing up to participate in the mass storage market. Kubik Enterprizes Inc. sells a 240 disk CD-ROM jukebox containing up to four drives. JVC Information Products Co. of America has announced the DOS-compatible Pro-CD Library, which consists of a 100-disc jukebox, double-speed CD-ROM drive and storage-management software, at a price of $8995. In addition, Pioneer has just announced the marketing of a 500-disk CD-ROM changer that should be available this fall with a version with the ability to write to blank disks.
 

Rewritable
Rewritable disks employ one of two non-interchangeable technologies to store information.4 Data is stored to phase-change media by a high-power laser which changes the media's crystallinity. The read laser detects the difference in reflectivity between the crystalline spots and the amorphous surroundings. Data is erased by returning the crystalline spots to the amorphous phase.

The second type of rewritable technology is magneto-optical (M-O). The write laser heats spots on the disk to the temperature at which an electromagnet can change the magnetic polarity of the spot. Light from the read laser is rotated right or left depending upon the polarity of the spots. This rotation is detected by the M-O drive. Once an M-O disk has been recorded, two passes are required to rewrite the disk: one pass to realign all the domains north-pole-down and one pass to write the new data.
 

Multifunction
Jukeboxes that contain multifunction drives can handle both WORM and rewritable media. These systems provide flexibility for the user who needs some unalterable storage and also requires rewritability. Some multifunction drives combine WORM and phase-change technology while others combine WORM and M-O capability. Therefore, a user moving into a jukebox from a single drive should be careful to purchase a multifunction drive compatible with previously recorded media.
 

Tape Library
Tape libraries are automated tape handling systems with the potential to provide "near-line" access to many terabytes of storage. Autochangers are available in a variety of tape formats including: DAT, 8 mm, QIC, 3480/3490, DLT, VHS, and DD-2. A listing of tape library vendors and information about the products they offer appears at the end of this section in 9.4 Tape Library Vendors Database.

The nation's new weather surveillance radar (Nexrad) is utilizing 8 mm jukeboxes from Exabyte to store the data gathered.5 The Doppler sensing system of Nexrad (Next Generation Weather Radar) can produce more than 4 GB of data per day at each radar installation. Nine sites are operational and up to 140 are planned by 1996.

Each remote site has a series 8200 or 8500 tape drive with a cartridge handling jukebox to archive all the data generated from the radar on 8 mm tape cartridges. Originally, a 12-inch optical platter system was planned for the system but, each optical disc would only hold 1 GB of data (or about 6 hrs for a typical radar site). Each 8 mm cartridge holds 5 GB and the jukebox will store about 300 hours of data. Approximately every two weeks maintenance workers visit each site to replace the tapes with 10 blank cartridges. The filled tapes are delivered to the National Climatic Data Center in Asheville, N.C. for analysis and archiving.
 

RAID
RAID (Redundant Array of Inexpensive Disks) is a storage technology that groups multiple hard drives into what appears to be one logical volume. The term RAID was introduced in a late-1987 paper by Patterson, Gibson, and Katz of the University of California-Berkley entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)." The paper compared RAID to SLED (Single Large Expensive Disk) and described five-disk array architectures, or levels. RAID technology is currently the hottest mass storage topic in the literature.

Disk arrays generally improve system performance by supporting multiple simultaneous read and/or write operations as well as by increasing capacity and providing fault tolerance. The use of multiple drives in an array actually increases the chances that a drive failure will occur. However, the data redundancy of RAID allows the array to tolerate a drive failure. A basic description of each of the levels of RAID follows.
 

RAID 0

This form of RAID is not RAID as described in the Berkley paper because there is no data redundancy. Most disk arrays use striping, or distribution of data across multiple drives. RAID 0 implements striping without redundancy and is, therefore, less reliable than a single drive. The only advantage is increased speed.
 

RAID 1

RAID 1 implements "mirroring," or shadowing, of disks. Each drive in the system has a copy, or "mirror," of itself. If a drive fails, the duplicate drive keeps working with no lost data or downtime.

Since there are two sources of data, the average access time for a read request will be faster than that for a single drive. For a write request, which is almost always preceded by a read, the decrease in read seek time of RAID 1 is offset by the increase in write seek time (since the data has to be written to two disks). A read and two writes takes the same time as a read/write for a single drive. RAID 1, with an optimized controller, has slightly lower overall access times than a single drive.

The main advantage of RAID 1 over other RAID architectures is simplicity. It only requires a dual channel controller or a minimal device driver using one or two controllers to implement. No change to the operating system is needed. RAID 1 is relatively expensive to implement because only half the available disk space is used for data storage. In addition, the necessity of duplicate drives requires more power and more space for the same storage capacity.

RAID 0/1 (sometimes also called 10) is a hybrid of RAID 0 and RAID 1. The data is striped across the drives in the array as in RAID 0. In addition, each striped drive group is "mirrored" by a duplicate drive group attached to a second drive controller.
 

RAID 2

RAID 2 is an architecture that succeeds in reducing disk overhead (the cost of storage space lost to redundancy) by using Hamming codes to detect and correct errors. Check disks are required in addition to the data disks. The data is striped across the disks along with an interleaved Hamming code. Because all of the data disks must seek before a read starts, and because for a write the data disks must seek, read data, all drives (including check disks) must seek again, and then the data is written, seek times are very slow compared to a single drive. However, once the seek is completed, data transfer rates are very high. For an array with 8 data drives, the drives will transmit data in parallel. The transfer rate of the array will be 8 times that of a single drive.

RAID 2 is best for reading and writing large data blocks at high data transfer rates. In the microcomputer environment the existing error detection/ correction features result in redundant error isolation data for RAID 2 and make RAID 2 impractical for microcomputers. By letting the drives manage error detection, it is possible to implement RAID requiring only one check disk for error correction.
 

RAID 3

By assuming that each disk drive in the array can detect and report errors, the RAID system only has to maintain redundancy in the data necessary to correct errors. RAID 3 employs a single check disk (parity disk) for each group of drives. Data is striped across each of the data disks. The check disk receives the XOR (exclusive OR) of all the data written to the data drives. Data for a failed drive can be reconstructed by computing the XOR of all the remaining drives. This approach reduces disk overhead from RAIDs 1 and 2. For a five-disk array, four of the drives store data; providing 4 GB of data storage in a 5 GB array. RAID 3 also has the same high transfer rates as RAID 2. However, because every data drive is involved in every read or write, a penalty is paid.

RAID 3 can process only one I/O transaction at a time. In addition, the minimum amount of data that can be written or read from a RAID 3 array is the number of data drives multiplied by the number of bytes per sector, referred to as a transfer unit. A typical five-drive array would have four data disks, one parity disk, and might have a 512-byte sector size on each disk. The transfer unit would be 2048 bytes (4 x 512). When a data read is smaller than the transfer unit, the entire unit is read anyway, increasing the length of a read operation. For a data write smaller than the transfer unit, although only a portion of a sector of each disk needs to be modified, the array must still deal with complete transfer units. A complete unit must be read from the array; the data must be rewritten where necessary; and the modified data must be written back to the data disks and the check disk updated. RAID 3 works well in applications that process large chunks of data.
 

RAID 4

RAID 4 addresses the problems associated with bit-striping a transfer block of data across the array. As in RAID 3, one drive in the array is reserved as the check disk. This architecture, however, utilizes block or sector striping to the drives, resulting in read transactions involving only one drive and timing comparable to a single drive. In addition, multiple read requests can be handled at the same time. However, since every write accesses the parity disk, only one write at a time is possible. RAID 4 is most useful in an environment where the ratio of reads to writes is very high.
 

RAID 5

Because RAID levels 2 through 4 each use a dedicated check disk, only one write transaction is possible at any time. RAID 5 overcomes the write bottleneck by distributing the error correcting codes (ECC) across each of the disks in the array. Therefore, each disk in the array contains both data and check-data.

Distributing check-data across the array allows reads and writes to be done in parallel. Data recovery and seek times are comparable to RAID 4.
 

Disk Array Implementations
Actual disk array implementations are not always as simple or straightforward as described above. Some manufacturers combine features of different RAID levels to create a hybrid, as in RAID 0/1. RAID implementations that are extremely fault-tolerant provide redundancy beyond that of the drives. Additional redundancy is accomplished by providing a system with redundant drive controllers, redundant power supplies, redundant SCSI controllers, and so on. Some manufacturers offer "hot swappability," the ability to replace a failed drive (or other hardware units) without shutting the system down. Other manufacturers offer a spare drive that is automatically put into use rebuilding the failed drive as soon as the system senses a failure.

Still other RAID manufacturers offer only software-based RAID. The RAID architecture is contained in software that the customer implements with his own hardware. And finally, some RAID systems offer more than one level of RAID in the same package to handle mixed applications more efficiently.
 

Conclusions
The information contained in this section of the report provides an overview of the vendors of mass storage systems and their products. When determining which system is best suited to an application, the quantity and type of information, and the frequency of access to information to be stored must be considered in addition to the dollars available to do the job. The selection of a mass storage system depends upon the user's requirements. There is no one system which will satisfy every user's needs. As was discussed previously, even within a media class, such as RAID or optical disc, the media choice (such as CD-ROM, WORM, rewritable, etc.) or storage protocol (RAID level) must be selected with things like file size and frequency of access in mind.

For applications with a mix of requirements, systems that offer hierarchical storage management (HSM) present a solution. HSM automatically migrates files from hard drive to optical disk or magnetic tape depending upon how recently the file was last accessed. A record of all migrated files is maintained and all information is available to the system, although not all information is "on-line."

Back to the top of this page