Databases have been created for each of these technology areas. Included
in each database are the true Mass Storage systems--defined by the Goddard
Conference as greater than 0.1 terabyte. The following sections will discuss
each mass storage type more fully.
Optical Jukebox
An optical jukebox is a storage system that has one or more drives,
slots to hold (store) the disks, and robotics that move the disks to or
from slot or drive. Information is written to or read from marks on the
surface of a disk, or platter, using a laser-based optical stylus. Each
of the optical drive technologies discussed --WORM (write-once-read-many),
CD-ROM, rewritable, and multifunction--can be found incorporated into optical
jukeboxes.
Write-Once-Read-Many
The original jukeboxes used WORM technology on 12- or 14-inch platters.
Today's 12-inch disks can hold as much as 10.2 gigabytes on a single platter.
One reference2 lists a Filenet jukebox containing 340 platters with a capacity
of over 2 terabytes, run on up to 6 drives. This same reference, however,
discusses the fact that having too few drives results in a sluggish system.
It recommends a ratio of disks to drives of 50:1.
The Navy Research Lab's Ruth H. Hooker Research Library is utilizing
SONY WDA-610 optical jukeboxes to convert research papers into stored digital
images.3 The collection of 600,000 titles, dating back to the 1940s, is
expected to fit into 2 jukeboxes. Additional jukeboxes can be daisy-chained
together to provide up to 1.3 terabytes of storage through one SCSI interface.
After scanning and verification that a good image has been obtained, the
digital image is stored to 12-inch WORM disks, and the paper documents
and duplicates are disposed of. Each WORM disk can hold up to 130,000 pages
(6.5 GB) of information.
CD-ROM
CD-ROM, the first commercial optical-storage technology, is a data
version of the audio compact disc. Close to 1 MB of data can be stored
on the disk. CD-ROM drives have slow average seek times of 300 to 500 msec,
with transfer rates of 300 to 600 KB/sec. CD-ROM jukeboxes are generally
small 6-disc minichangers with a capacity of less than 6 GB. Pioneer recently
introduced an 18-disc jukebox with about 11 GB capacity. However, the single
reader allows access to only one disk at a time and limits the utility
of the jukebox to small networks or a single user.
The low cost (under $200) of CD-ROM drives, low media cost, media removability,
and read-only format make CD-ROM a publishing and distribution medium.
Some manufacturers are gearing up to participate in the mass storage market.
Kubik Enterprizes Inc. sells a 240 disk CD-ROM jukebox containing up to
four drives. JVC Information Products Co. of America has announced the
DOS-compatible Pro-CD Library, which consists of a 100-disc jukebox, double-speed
CD-ROM drive and storage-management software, at a price of $8995. In addition,
Pioneer has just announced the marketing of a 500-disk CD-ROM changer that
should be available this fall with a version with the ability to write
to blank disks.
Rewritable
Rewritable disks employ one of two non-interchangeable technologies
to store information.4 Data is stored to phase-change media by a high-power
laser which changes the media's crystallinity. The read laser detects the
difference in reflectivity between the crystalline spots and the amorphous
surroundings. Data is erased by returning the crystalline spots to the
amorphous phase.
The second type of rewritable technology is magneto-optical (M-O). The
write laser heats spots on the disk to the temperature at which an electromagnet
can change the magnetic polarity of the spot. Light from the read laser
is rotated right or left depending upon the polarity of the spots. This
rotation is detected by the M-O drive. Once an M-O disk has been recorded,
two passes are required to rewrite the disk: one pass to realign all the
domains north-pole-down and one pass to write the new data.
Multifunction
Jukeboxes that contain multifunction drives can handle both WORM and
rewritable media. These systems provide flexibility for the user who needs
some unalterable storage and also requires rewritability. Some multifunction
drives combine WORM and phase-change technology while others combine WORM
and M-O capability. Therefore, a user moving into a jukebox from a single
drive should be careful to purchase a multifunction drive compatible with
previously recorded media.
Tape Library
Tape libraries are automated tape handling systems with the potential
to provide "near-line" access to many terabytes of storage. Autochangers
are available in a variety of tape formats including: DAT, 8 mm, QIC, 3480/3490,
DLT, VHS, and DD-2. A listing of tape library vendors and information about
the products they offer appears at the end of this section in 9.4 Tape
Library Vendors Database.
The nation's new weather surveillance radar (Nexrad) is utilizing 8 mm jukeboxes from Exabyte to store the data gathered.5 The Doppler sensing system of Nexrad (Next Generation Weather Radar) can produce more than 4 GB of data per day at each radar installation. Nine sites are operational and up to 140 are planned by 1996.
Each remote site has a series 8200 or 8500 tape drive with a cartridge
handling jukebox to archive all the data generated from the radar on 8
mm tape cartridges. Originally, a 12-inch optical platter system was planned
for the system but, each optical disc would only hold 1 GB of data (or
about 6 hrs for a typical radar site). Each 8 mm cartridge holds 5 GB and
the jukebox will store about 300 hours of data. Approximately every two
weeks maintenance workers visit each site to replace the tapes with 10
blank cartridges. The filled tapes are delivered to the National Climatic
Data Center in Asheville, N.C. for analysis and archiving.
RAID
RAID (Redundant Array of Inexpensive Disks) is a storage technology
that groups multiple hard drives into what appears to be one logical volume.
The term RAID was introduced in a late-1987 paper by Patterson, Gibson,
and Katz of the University of California-Berkley entitled "A Case for Redundant
Arrays of Inexpensive Disks (RAID)." The paper compared RAID to SLED (Single
Large Expensive Disk) and described five-disk array architectures, or levels.
RAID technology is currently the hottest mass storage topic in the literature.
Disk arrays generally improve system performance by supporting multiple
simultaneous read and/or write operations as well as by increasing capacity
and providing fault tolerance. The use of multiple drives in an array actually
increases the chances that a drive failure will occur. However, the data
redundancy of RAID allows the array to tolerate a drive failure. A basic
description of each of the levels of RAID follows.
RAID 0
This form of RAID is not RAID as described in the Berkley paper because
there is no data redundancy. Most disk arrays use striping, or distribution
of data across multiple drives. RAID 0 implements striping without redundancy
and is, therefore, less reliable than a single drive. The only advantage
is increased speed.
RAID 1
RAID 1 implements "mirroring," or shadowing, of disks. Each drive in the system has a copy, or "mirror," of itself. If a drive fails, the duplicate drive keeps working with no lost data or downtime.
Since there are two sources of data, the average access time for a read request will be faster than that for a single drive. For a write request, which is almost always preceded by a read, the decrease in read seek time of RAID 1 is offset by the increase in write seek time (since the data has to be written to two disks). A read and two writes takes the same time as a read/write for a single drive. RAID 1, with an optimized controller, has slightly lower overall access times than a single drive.
The main advantage of RAID 1 over other RAID architectures is simplicity. It only requires a dual channel controller or a minimal device driver using one or two controllers to implement. No change to the operating system is needed. RAID 1 is relatively expensive to implement because only half the available disk space is used for data storage. In addition, the necessity of duplicate drives requires more power and more space for the same storage capacity.
RAID 0/1 (sometimes also called 10) is a hybrid of RAID 0 and RAID 1.
The data is striped across the drives in the array as in RAID 0. In addition,
each striped drive group is "mirrored" by a duplicate drive group attached
to a second drive controller.
RAID 2
RAID 2 is an architecture that succeeds in reducing disk overhead (the cost of storage space lost to redundancy) by using Hamming codes to detect and correct errors. Check disks are required in addition to the data disks. The data is striped across the disks along with an interleaved Hamming code. Because all of the data disks must seek before a read starts, and because for a write the data disks must seek, read data, all drives (including check disks) must seek again, and then the data is written, seek times are very slow compared to a single drive. However, once the seek is completed, data transfer rates are very high. For an array with 8 data drives, the drives will transmit data in parallel. The transfer rate of the array will be 8 times that of a single drive.
RAID 2 is best for reading and writing large data blocks at high data
transfer rates. In the microcomputer environment the existing error detection/
correction features result in redundant error isolation data for RAID 2
and make RAID 2 impractical for microcomputers. By letting the drives manage
error detection, it is possible to implement RAID requiring only one check
disk for error correction.
RAID 3
By assuming that each disk drive in the array can detect and report errors, the RAID system only has to maintain redundancy in the data necessary to correct errors. RAID 3 employs a single check disk (parity disk) for each group of drives. Data is striped across each of the data disks. The check disk receives the XOR (exclusive OR) of all the data written to the data drives. Data for a failed drive can be reconstructed by computing the XOR of all the remaining drives. This approach reduces disk overhead from RAIDs 1 and 2. For a five-disk array, four of the drives store data; providing 4 GB of data storage in a 5 GB array. RAID 3 also has the same high transfer rates as RAID 2. However, because every data drive is involved in every read or write, a penalty is paid.
RAID 3 can process only one I/O transaction at a time. In addition,
the minimum amount of data that can be written or read from a RAID 3 array
is the number of data drives multiplied by the number of bytes per sector,
referred to as a transfer unit. A typical five-drive array would have four
data disks, one parity disk, and might have a 512-byte sector size on each
disk. The transfer unit would be 2048 bytes (4 x 512). When a data read
is smaller than the transfer unit, the entire unit is read anyway, increasing
the length of a read operation. For a data write smaller than the transfer
unit, although only a portion of a sector of each disk needs to be modified,
the array must still deal with complete transfer units. A complete unit
must be read from the array; the data must be rewritten where necessary;
and the modified data must be written back to the data disks and the check
disk updated. RAID 3 works well in applications that process large chunks
of data.
RAID 4
RAID 4 addresses the problems associated with bit-striping a transfer
block of data across the array. As in RAID 3, one drive in the array is
reserved as the check disk. This architecture, however, utilizes block
or sector striping to the drives, resulting in read transactions involving
only one drive and timing comparable to a single drive. In addition, multiple
read requests can be handled at the same time. However, since every write
accesses the parity disk, only one write at a time is possible. RAID 4
is most useful in an environment where the ratio of reads to writes is
very high.
RAID 5
Because RAID levels 2 through 4 each use a dedicated check disk, only one write transaction is possible at any time. RAID 5 overcomes the write bottleneck by distributing the error correcting codes (ECC) across each of the disks in the array. Therefore, each disk in the array contains both data and check-data.
Distributing check-data across the array allows reads and writes to
be done in parallel. Data recovery and seek times are comparable to RAID
4.
Disk Array Implementations
Actual disk array implementations are not always as simple or straightforward
as described above. Some manufacturers combine features of different RAID
levels to create a hybrid, as in RAID 0/1. RAID implementations that are
extremely fault-tolerant provide redundancy beyond that of the drives.
Additional redundancy is accomplished by providing a system with redundant
drive controllers, redundant power supplies, redundant SCSI controllers,
and so on. Some manufacturers offer "hot swappability," the ability to
replace a failed drive (or other hardware units) without shutting the system
down. Other manufacturers offer a spare drive that is automatically put
into use rebuilding the failed drive as soon as the system senses a failure.
Still other RAID manufacturers offer only software-based RAID. The RAID
architecture is contained in software that the customer implements with
his own hardware. And finally, some RAID systems offer more than one level
of RAID in the same package to handle mixed applications more efficiently.
Conclusions
The information contained in this section of the report provides an
overview of the vendors of mass storage systems and their products. When
determining which system is best suited to an application, the quantity
and type of information, and the frequency of access to information to
be stored must be considered in addition to the dollars available to do
the job. The selection of a mass storage system depends upon the user's
requirements. There is no one system which will satisfy every user's needs.
As was discussed previously, even within a media class, such as RAID or
optical disc, the media choice (such as CD-ROM, WORM, rewritable, etc.)
or storage protocol (RAID level) must be selected with things like file
size and frequency of access in mind.
For applications with a mix of requirements, systems that offer hierarchical storage management (HSM) present a solution. HSM automatically migrates files from hard drive to optical disk or magnetic tape depending upon how recently the file was last accessed. A record of all migrated files is maintained and all information is available to the system, although not all information is "on-line."