Improving Data Quality of Proxy Logs for Intrusion Detection

沙泓州  柳厅文  秦鹏  孙永  刘庆云 



Log correlation analysis plays an important role in many information security areas. For example, it can be used to help nd abnormal navigation behaviors in inside threat detection. Besides, it can be used as the data source for intrusion detection [1]. However the original logs are lled with noises. Therefore, data cleaning is an indispensable preprocessing step in log correlation analysis in order to improve detection eciency and reduce storage space.
Many methods have been proposed to improve data quality by removing irrelevant items such as jpeg, gif les or sound les and access generated by spider navigation. Most of them are designed for web servers (such as e-commerce web site). These methods work by inspecting the elds of user-agent, http status and URL sux in web requests. However, they cannot be used to address the problem of improving data quality of proxy logs (recording web requests through intermediate roles) very well. Because proxy logs show di erent features compared with server logs. The biggest di erence is that proxy logs should be cleaned without knowing the information of the web site accessed by a web request, such as its web structure and content type. It makes traditional data cleaning methods incapable of ltering speci c noises in proxy logs, such as software updates and requests from network behavior analyzers. Moreover, proxy logs experience rapid growth of web requests that are generated by unlimited websites and users, which makes the problem more dicult to tackle.



首页
团队介绍
发展历史
组织结构
MESA大事记
新闻中心
通知
组内动态
科研成果
专利
论文
项目
获奖
软著
人才培养
MESA毕业生
MESA在读生
MESA员工
招贤纳士
走进MESA
学长分享
招聘通知
招生宣传
知识库
文章
地址:北京市朝阳区华严北里甲22号楼五层 | 邮编:100029
邮箱:nelist@iie.ac.cn
京ICP备15019404号-1