Published online by Cambridge University Press: 05 May 2010
ABSTRACT
Document storage and retrieval systems should possess fast string search capabilities. The access paths needed to reduce the search times require substantial amounts of storage in addition to the very large storage requirements for the documents themselves. In this paper we investigate a technique that supports access paths on compressed documents, so that the total storage requirements for the access paths and the compressed documents are less than that for the original documents.
Introduction
Advances in hardware technology are unlikely to keep pace with the increasing growth of on-line document storage. In an environment where the trend is towards local and wide area networks (there is the promise of an interconnected society around the corner), a large number of documents would be transmitted between nodes. Document storage, their communication along network paths and between peripherals and processors requires, for the provision of a satisfactory service at reasonable cost, that the documents be held more compactly than at present. Natural language being highly redundant a suitable encoding scheme could be utilized with any resultant compression reducing both storage and communication cost. In an online environment the compression and decompression schemes must not involve excessive overheads in either time or space; since the documents would need to be compressed only once for storage while decompressed (or retrieved) more often, it is possible to tolerate higher levels of overhead during the compression stage.
Document retrieval requires fast string search capabilities, and it is usual to provide additional access paths to reduce the search times e.g. by providing inverted lists on words. In [Goyal83] a scheme was proposed that made use of inverted indexes associated with compressed documents.
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.