Adoption of SSL/TLS to protect the privacy of web users has become increasingly common. In fact, as of September 2015, more than 68% of top-1M websites deploy SSL/TLS to encrypt their traffic. The transition from HTTP to HTTPS has brought a new challenge for network operators who need to understand the hostnames of encrypted web traffic for various reasons. To meet the challenge, this work develops a novel framework called SFMap, which estimates names of HTTPS servers by analyzing precedent DNS queries/responses in a statistical way. The SFMap framework introduces domain name graph, which can characterize highly dynamic and diverse nature of DNS mechanisms. Such complexity arises from the recent deployment and implementation of DNS ecosystems; i.e., canonical name tricks used by CDNs, the dynamic and diverse nature of DNS TTL settings, and incomplete and unpredictable measurements due to the existence of various DNS caching instances. First, we demonstrate that SFMap establishes good estimation accuracies and outperforms a state-of-the-art approach. We also aim to identify the optimized setting of the SFMap framework. Next, based on the preliminary analysis, we introduce techniques to make the SFMap framework scalable to large-scale traffic data. We validate the effectiveness of the approach using large-scale Internet traffic.
ASJC Scopus subject areas
- Computer Networks and Communications