With the rapid increase of link speed in recent years, packet sampling has become a very attractive and scalable means in collecting flow statistics; however, it also makes inferring original flow characteristics much more difficult. In this paper, we develop techniques and schemes to identify flows with a very large number of packets (also known as heavy-hitter flows) from sampled flow statistics. Our approach follows a two-stage strategy: We first parametrically estimate the original flow length distribution from sampled flows. We then identify heavy-hitter flows with Bayes' theorem, where the flow length distribution estimated at the first stage is used as an a priori distribution. Our approach is validated and evaluated with publicly available packet traces. We show that our approach provides a very flexible framework in striking an appropriate balance between false positives and false negatives when sampling frequency is given.
ASJC Scopus subject areas
- Computer Networks and Communications
- Electrical and Electronic Engineering