|
|
|
With the rapid development and popularization of cloud platform document storage services, a large number of documents stored in the cloud platform to provide users with convenient services. In order to save storage space and reduce transmission spending, these documents on the cloud platform stored in compressed form. However, the form that files are stored as compression archives makes it difficult to retrieve these documents, we need to build an index of the documents based on their keywords after decompression, which spends a lot of time and deterioration of the service experience. Therefore, this paper presents a smart similarity search framework based on simhash over compressed data, the dimensionality reduction reduces the computational complexity of similarity by feature vectors, and also retrieval based on document reduces the dependence on selected keywords. You can retrieve the documents you need as fast as you can without fully decompressing these archives. After the test verification, compared with constructing the index after decompressing the archives, the time this framework costs have reduced by nearly 44.26 percent, so retrieval efficiency is greatly improved.
|