Most modern Internet services are carried over the web. A significant amount of web transactions is now encrypted and the transition to encryption has made it difficult for network operators to understand traffic mix. The goal of this study is to enable network operators to infer hostnames within HTTPS traffic because hostname information is useful to understand the breakdown of encrypted web traffic. The proposed approach correlates HTTPS flows and DNS queries/responses. Although this approach may appear trivial, recent deployment and implementation of DNS ecosystems have made it a challenging research problem; i.e., canonical name tricks used by CDNs, the dynamic and diverse nature of DNS TTL settings, and incomplete measurements due to the existence of various caching mechanisms. To tackle these challenges, we introduce domain name graph (DNG), which is a formal expression that characterizes the highly
dynamic and diverse nature of DNS mechanisms. Furthermore, we have developed a framework called Service-Flow map (SFMap) that works on top of the DNG. SFMap statistically estimates the hostname of an HTTPS server, given a pair of client and server IP addresses. We evaluate the performance of SFMap through extensive analysis using real packet traces collected from two locations with different scales. We demonstrate that SFMap establishes good estimation accuracies and outperforms a state-of-the-art approach.

Screen Shot 2015-03-23 at 7.55.06 PM

  • T. Mori, T. Inoue, A. Shimoda, K. Sato, K. Ishibashi, and S. Goto, “SFMap: Inferring Services over Encrypted Web Flows using Dynamical Domain Name Graphs,” Proceedings of IFIP Traffic Monitoring and Analysis workshop (TMA 2015), LNCS 9053, pp. 126–139, Apr. 2015. (to appear) DOI: 10.1007/978-3-319-17172-2_9
  • T. Mori, T. Inoue, A. Shimoda, K. Sato, K. Ishibashi, and S. Goto, “Statistical Estimation of the Names of HTTPS Servers with Domain Name Graphs,” Computer Communications, Volume 94, Pages 104–113, November 2016. [abstract]