Limited Dictionary Builder: An Approach to Select Representative Tokens for Malicious URLs Detection

沙泓州  周舟  刘庆云  柳厅文  郑超 



Abstract: Cybercriminals use Malicious Uniform Resource Locators (URLs) as the entry to implement a variety of web attacks, such as phishing, spamming, and malware distribution, which may lead to huge finance and data loss. Thus, malicious URLs should be detected as accurately and quickly as possible. Heuristic-based detection approaches are one of the most popular methods to achieve the above goals. The detection results come from the usage of many heuristic features in this approach. However, tremendous new pages and meaningless tokens lead to the explosion of feature sets, and exhaust memory space finally. In this paper, we try to address the problem by selecting some representative members from the initial feature set, which should have the best predictive ability among the same number of selected features. For each feature, we give an evaluation method of O(1) complexity to measure its predictive ability. Then we make the selection based on all the measured values with linear complexity. Experimental results show that our approach can achieve almost the same false negative rate using only 8.3% features for malicious URLs detection, comparing with prior approaches. Moreover, our approach may work efficiently in the big data era, as it can averagely handle 20 thousand URLs per second in our experiments.



首页
团队介绍
发展历史
组织结构
MESA大事记
新闻中心
通知
组内动态
科研成果
专利
论文
项目
获奖
软著
人才培养
MESA毕业生
MESA在读生
MESA员工
招贤纳士
走进MESA
学长分享
招聘通知
招生宣传
知识库
文章
地址:北京市朝阳区华严北里甲22号楼五层 | 邮编:100029
邮箱:nelist@iie.ac.cn
京ICP备15019404号-1