HTCE Document Management Terms

This page contains definitions to a host of commonly used document imaging, document management and workflow terms. If you can't find the definition you're looking for, drop us an email and we'll send you the definition you need.



Annotation and Markup Features - allow you to add comments to an electronic document in much the same way that you would use highlighters or Post-it® notes to draw attention to specific areas of a printed document.

Aperture Card - is a standard Hollerith encoded IBM-style punch card that acts as a transport for a 35mm transparency. Typically, aperture cards are used to store blueprints and engineering drawings.

Aperture Card Scanner - is a type of scanner that allows aperture cards to be converted into electronic documents.

APRP (Adaptive Pattern Recognition Processing) - is one of the most sophisticated technologies currently available in modern text retrieval software. APRP automatically indexes the binary patterns in digital information, creating a pattern-based memory that is optimized for the content of the data. It eliminates the costly labor of manually defining keywords and sorting and labeling information in database fields. APRP has a high tolerance for input data errors, eliminating the need for
OCR clean up.

Cache - Pronounced "cash." Small portion of high-speed computer memory used for temporary storage of frequently used date. Reduces the time it would take to access that data, since it no longer has to be retrieved from the disk.

CD-ROM (Compact Disk-Read Only Memory) - is a circular disk used to store large amounts of electronic data. CD-ROMs can hold up to 680 MB of computer data. The media is low cost and durable, and in large scale applications can be inexpensively duplicated into thousands of copies. Unlike optical disks which can be written to many times, a CD-ROM cannot be written to once it has been mastered.

Client-Server Based System - is a system that stores electronic documents on one computer--a server, while making those documents available to other computers--clients, via a network.

COD (Computer Originated Document) - refers to any document that was originally created on a computer.

COLD (Computer Output to Laser Disk) - software allows you to transfer documents from expensive mainframe storage, onto an inexpensive, long-term optical disk storage system.

Collection - refers to two or more electronic documents containing related information that have been grouped together to facilitate retrieval.

Compression - is a process that reduces the number of bytes required to define a document in order to save disk space or transmission time. Compression is achieved by replacing commonly occurring sequences of pixels with shorter codes. Some compression methods--like JPEG, throw away some data seeking only to preserve the appearance of the image. Others--like Group-IV, preserve all of the original information.

Cross-Platform - software enables you to share information between computers running different operating systems, such as a Macintosh and Windows workstations.

DAT - Digital Audio Tape. A technology that records noise-free digital data on magnetic tape. Generally used for audio, a DAT cassette can hold more than a gigabyte when adapted for data storage.

Database - is an organized collection of electronic documents stored on a computer.  The database is structured to facilitate the search and retrieval of information contained in the database.

Database Field - refers to the searchable items in a database. Database fields can be customized according to your needs. A typical database will include fields such as: 'customer name,' 'address,' and 'account number.'  Together the fields make up what is called a database record.

Database Publishing - enables you to publish a select group of documents from a large-scale document database to laptops and CD-ROMs, allowing you to create miniature, portable databases.

Database Query Screen - is a computer generated form which allows you to search for information contained in a database. By entering information in defined text fields, you instruct the computer to search the database for documents which contain that information.  Some document management systems allow you to customize the query screens to accept information that is applicable to the database you wish to search.

Database Record - is a collection of database fields.

Digital Documents - are documents that are stored on a computer. The documents may have been created on a computer, as with word-processing files and spreadsheets, or they may have been converted into digital documents by means of document imaging.  Digital documents are also referred to as electronic documents.

Document - is a broadly used term that refers to word-processing files, e-mail messages, spreadsheets, database tables, faxes, business forms, images, or any other collection of organized data. Documents are also referred to as 'records.'

Document Imaging - is the process by which print and film documents are fed into a scanner and converted into electronic documents.  During the scanning process documents can be OCRed and indexed to insure quick retrieval at a later date.

Document Management Systems - enable you to store documents electronically. This facilitates the process of retrieving, sharing, tracking, revising, and distributing documents and the information they contain. A complete Electronic Document Management System (EDMS) provides you with all the software and hardware required to insure that you maintain control over all your documents, both scanned images, and files that were created on a computer--like spreadsheets, word
processing documents and graphics. A complete EDMS includes document imaging, OCR, text retrieval, workflow, and Computer Output to Laser Disk capabilities.

Document Retrieval - is the process by which you can search and 'retrieve' an archived document from a database. This is done by entering information in a database query screen.

Domain Name System (DNS) - The DNS is a static, hierarchical name service used with TCP/IP hosts, and is housed on a number of servers on the Internet. Basically, it maintains a database for figuring out and finding (or resolving) host names and IP addresses on the Internet. This allows users to specify remote computers by host names rather than numerical IP addresses (if you’ve used UNIX, you may have heard the DNS referred to as the BSD UNIX BIND service). For example, go to a DOS prompt in Windows 95, the % prompt in UNIX, or use a ping client for Windows 3.1 or Mac, and type "PING UTW.COM". This will check the DNS server you have configured, look up the numerical IP address for UTW.COM, and then ping UTW’s IP address. The advantage of the DNS is that you don’t have to remember numerical IP addresses for all the Internet sites you want to access.

EDMS - is an acronym for Electronic Document Management System.

Electronic Documents - are documents that are stored on a computer. The documents may have been created on a computer, as with word-processing files and spreadsheets, or they may have been converted into digital documents by means of document imaging.  Electronic documents are also referred to as digital documents.

Electronic Mail (E-mail) - A method by which computer users can exchange messages with each other over a network. E-mail is probably the most widely used communications tool on the Internet. There are many quirky conventions to E-mail, but most entail a "To:", "From:", and "Subject:" line. One of E-mails advantages is its ability to be forwarded and replied to easily. If an E-mail is badly received by a group or user, the sender is likely to get "flamed."

E-mail Address - Your E-mail address is made up of several parts. By convention, addresses use lowercase letters with no spaces. The first part of the address, the user name, identifies a unique user on a server. The "@" (pronounced "at") separates the user name from the host name. The host name uniquely identifies the server computer and is the last part of the Internet E-mail address (for example, my E-mail address is Error! Reference source not found.. Large servers, such as those used at universities or large companies sometimes contain multiple parts, called sub domains. Sub domains and the host name are separated by a "." (pronounced "dot"). The three-letter suffix in the host name identifies the kind of organization operating the server (some locations use a two-letter geographical suffix). The most common suffixes are: .com (commercial) .edu (educational) .gov (government) .mil (military) .net (networking) .org (non-commercial). Addresses outside of the U.S. sometimes use a two-letter suffix that identifies the country in which the server is located. Some examples are: .jp (Japan) .nl (The Netherlands) .uk (United Kingdom) .ca (Canada) .tw (Taiwan).

Ethernet - A standard and probably the most popular connection type for Local Area Networks (LANs). It was first developed by Xerox, and later refined by Digital, Intel and Xerox (see also "DIX"). In an Ethernet configuration, computers are connected by coaxial or twisted-pair cable where they contend for network access using a Carrier Sense Multiple Access with Collision Detection (CSMA/CD) paradigm. Ethernet can transfer information at up to 10 Megabit-per-second (Mb/s).

File Transfer Protocol (FTP) - The most widely used way of downloading and uploading (getting and putting) files across an Internet connection. The File Transfer Protocol is a standardized way to connect computers so that files can be shared between them easily. There is a set of commands in FTP for making and changing directories, transferring, copying, moving, and deleting files. Formerly, all FTP connections were text based, but graphical applications are now available that make FTP commands as easy as dragging and dropping. Numerous FTP clients exist for a number of platforms.

Full-Text Retrieval - is a capability that enables you to search for documents stored in a database based on the text contained in the documents. It can be used in conjunction with index-based searching which relies on a description of the document entered by a scan operator.

GUI - Graphical User Interface. Computer control system that allows the user to command the computer by "pointing-and-clicking," usually with a mouse, at pictures, or "icons," rather than by typing in commands.

Graphical Route Developer Tools - enable you to easily create, and modify, workflow routes by letting you 'draw' a workflow route on the screen, in much the way they would draw a picture with a computerized drawing program. In effect, users draw a map of how they want documents to flow through their organization.

Group-IV - is a compression method designed by CCITT for use with Group IV fax machines. This method is optimized for compressing scanned text.

HTTP - HyperText Transfer Protocol. The protocol that tells the server what to send the client. (text, images, other documents...). The underlying system whereby documents with Hyperlinks are transferred over the Internet, and whereby Hyperlinks in those documents trigger the subsequent transfer of other documents and other activities.

Hyperlinks - allow you to 'link' any document stored in a database with any other document.  You can link a spreadsheet to an image, a database to a graphic, or a word processing file to a site on the World Wide Web. You can then navigate from one related document to another, simply by clicking on the hyperlinks. Sophisticated hyperlinking technology supports "drag and drop" technology, so that you can link one document to another simply by clicking and dragging a          documents image on top of another document.

Hypertext Markup Language (HTML) - The standard way to mark text documents for publishing on the World Wide Web. HTML is marked-up using "tags" surrounded by brackets. To see what tagged HTML text looks like, select the View Source feature from the menus in the program you are using to view this document now, and you’ll see a display of the HTML text used to create this page.

Index - refers to the information contained in an electronic document that enables you to retrieve it from a database. The index can include physical location information (e.g., where the document is stored) and document identification information (e.g., date archived, creator, and contents).

Internet Protocol (IP) - An industry standard, connectionless, best-effort packet switching protocol used as the network layer in the TCP/IP Protocol Suite.

Internet Protocol Address (IP Address) - The 32-bit address defined by the Internet Protocol. Every resource on the Internet has a unique numerical IP address, represented in dotted decimal notation. IP addresses are the closest thing the Internet has to phone numbers. When you "call" that number (using any number of connection methods such as FTP, HTTP, Gopher, etc.) you get connected to the computer that "owns" that IP address.

Internet Service Provider (ISP) - An ISP is a company that maintains a network that is linked to the Internet via a dedicated communication line, usually a high-speed link known as a T1. An ISP offers use of its dedicated communication lines to companies or individuals (like me) who can’t afford $1,300 a month for a direct connection. Using a modem, you can dial up to a service provider whose computers will connect you to the Internet, typically for a fee.

JPEG (Joint Photographic Experts Group) - is a standard image compression mechanism.  JPEG compression is "lossy," meaning that the compression scheme sacrifices some image quality in exchange for a reduction in the file's size.

Life Cycle - refers to the period of time between when a document is archived and when it is destroyed.

Magnetic Disk - is a piece of hardware used for the storage of computerized information. It consists of a flat circular platter covered with a magnetized surface layer.

Microfilm/Microfiche Scanner - is a type of scanner that converts microfilm or microfiche documents into electronic documents.

Network - refers to two or more computers that have been linked together to enable them to communicate with each other, exchange information, and share resources.

OCR (Optical Character Recognition) - refers to the process by which scanned images are electronically "read" to convert them into editable text. This conversion is performed after scanning, and may output formatted text or text-only files (flat ASCII files). Text generated by OCR is often input into text search databases, allowing retrieval of the original scanned image based on its content.

Optical Disk - is a 5 1/2-inch, 12-inch, or 14-inch circular disk coated with a recording alloy that can store large numbers of electronic documents. Binary information is written to, and read from, the disk by optical means (usually laser). CD-ROMs are often mistakenly referred to as optical disks since they too are read using lasers, but unlike optical disks, CD-ROMs cannot be written to once they have been mastered.

Optical Disk Jukebox - is a piece of hardware that stores, and provides rapid access to multiple optical disks.

Patch Card - is a document that contains scanner and indexing instructions in the form of a bar code. Patch Cards can be inserted at specific points in a 'scan batch' where you desire new scanner or indexing settings to begin or end. Patch cards can instruct document imaging software to store a document in a specific database, assign the document an incremental sequence number, assign a job name, or record the scan date of a document. Patch cards are also capable of adjusting scanner settings and performing image enhancement operations such as 'deskew,' 'rotate,' and 'despeckle'.

RAID (Redundant Array of Inexpensive Disks) - is a storage technique that enables you to obtain increased storage reliability and performance by writing data to a series of disks referred to as a logical volume. Data reliability is achieved with error correction techniques or data duplication. Disk performance is achieved by parallel data transfers to a set of disks--this technique known as 'data striping.'

Record Retention Schedule - is a form that details the categories of records an organization is required to store. It outlines the length of time different categories of records should be stored, and when they can be deleted.

Resolution - refers to the 'image-sharpness' of a document, usually measured In dots per inch (dpi). Documents can be scanned at various resolutions depending on your document, the greater the image-sharpness, and the larger the file size will be. Resolution also refers to the image-sharpness that printers and monitors are capable of reproducing.

Retention Period - is the length of time documents must be stored and maintained to satisfy business or legal requirements.

Scaleable - refers to the ability to enlarge or reduce the size of an image. A document management system is said to be 'scaleable' if its capabilities can be increased to support additional users or platforms.

Scan Batch - is a collection of documents that are fed into a scanner for the purpose of being converted into digital or electronic documents.

Scanner Interface Board - is a piece of hardware that enables software programs to communicate with various models of scanners.

Scriptable and Recordable Software - enables you to automate repetitive computer tasks. You can instruct a 'script' to open one program, carry out a task, close that program, open a new program, carry out a new task, and so on until the project is completed. Or, you can 'record' a series of steps as you perform them, and save those steps as a single script.

Semantic Network Technology - is an underlying technology of sophisticated text retrieval software. It offers you a built-in 'dictionary' of 400,000 word meanings and over 1.6 million word relationships. It recognizes phrases like 'real estate' and 'kangaroo court' as single units of meaning, not individual words. It also recognizes words with multiple meanings such as 'concrete'. To choose the meaning appropriate for your query, you simply click on the meaning you intend. Semantic Network Technology helps to insure that you find the documents you are looking for quickly and easily.

SQL (Structured Query Language) - is a database access language that originated on mainframes and minicomputers, and which is now popular on PCs.

Text Retrieval Software - enables you to retrieve electronic documents from databases by entering 'key' words in a text search field.  Documents containing the text you entered are retrieved from the database, and presented to you in a list ranked by relevancy.

TIFF (Tagged Image File Format) - is an industry standard file format developed for the purpose of storing high-resolution bit-mapped, gray-scale, and color images.

TWAIN - is a scanning interface standard developed to address the need for consistent, easy integration of scanners with document imaging programs. Software programs that are written to support the TWAIN standard are capable of controlling any TWAIN compliant scanner.

Workflow Software - allows businesses to move electronic documents along a user-defined 'routing' path, from one workstation to the next, around a local or          wide-area network. Once the document arrives at any given workstation, the receiver can add notations to, or modify, the document as they see fit. An insurance company might use workflow software to route claim forms through their organization. A user at one step might wish to review the forms and add a new document to the electronic 'package' before sending it to the next workstation. The next user might wish to add several notations to the forms before sending it on to the final
workstation for approval. The route can be as simple or as complex as a business process requires.

Back to the top of this page