With the Internet becoming the dominant channel for business and life, web proxies are also increasingly used for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets.
In this paper, using thousands of web proxy URLs crawled, we performed a large-scale study on the DOM (Document Object Model) structure features. Our study reveals the existence of the dedicated web proxy DOM among hosts that play orchestrating roles in proxy activities. Motivated by their distinctive features in DOM and URL, we developed an automatical stepping-stone detection system——ProxyDetector. Specially, we explored the potential benefits of considering DOM-based features, which improved 25% recall rate than before. We extensively evaluated ProxyDetector with four methods on a diverse spectrum of corpora with 2,068 web proxy sites and 26,066 legitimate sites. Capable of achieving over 95% precision of web proxy sites with a high recall rate of 96.5% on average, our ProxyDetector has been demonstrated to be an effective solution of detecting the web proxy sites.