Smart Similarity Search Based on Simhash over Compressed Data in Cloud Computing

张斌  杨嵘  张鹏  刘庆云  杨威 



With the rapid development and popularization of cloud platform document storage services, a large number of documents stored in the cloud platform to provide users with convenient services. In order to save storage space and reduce transmission spending, these documents on the cloud platform stored in compressed form. However, the form that files are stored as compression archives makes it difficult to retrieve these documents, we need to build an index of the documents based on their keywords after decompression, which spends a lot of time and deterioration of the service experience. Therefore, this paper presents a smart similarity search framework based on simhash over compressed data, the dimensionality reduction reduces the computational complexity of similarity by feature vectors, and also retrieval based on document reduces the dependence on selected keywords. You can retrieve the documents you need as fast as you can without fully decompressing these archives. After the test verification, compared with constructing the index after decompressing the archives, the time this framework costs have reduced by nearly 44.26 percent, so retrieval efficiency is greatly improved.




首页
团队介绍
发展历史
组织结构
MESA大事记
新闻中心
通知
组内动态
科研成果
专利
论文
项目
获奖
软著
人才培养
MESA毕业生
MESA在读生
MESA员工
招贤纳士
走进MESA
学长分享
招聘通知
招生宣传
知识库
文章
地址:北京市朝阳区华严北里甲22号楼五层 | 邮编:100029
邮箱:nelist@iie.ac.cn
京ICP备15019404号-1