Estimating top N hosts in cardinality using small memory resources

Keisuke Ishibashi, Tatsuya Mori, Ryoichi Kawahara, Yutaka Hirokawa, Atsushi Kobayashi, Kimihiro Yamamoto, Hitoaki Sakamoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.

Original languageEnglish
Title of host publicationICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)0769525717, 9780769525716
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event22nd International Conference on Data Engineering Workshops, ICDEW 2006 - Atlanta, United States
Duration: 2006 Apr 32006 Apr 7

Other

Other22nd International Conference on Data Engineering Workshops, ICDEW 2006
CountryUnited States
CityAtlanta
Period06/4/306/4/7

Fingerprint

Data storage equipment
Resources
Data streams
Peers

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management

Cite this

Ishibashi, K., Mori, T., Kawahara, R., Hirokawa, Y., Kobayashi, A., Yamamoto, K., & Sakamoto, H. (2006). Estimating top N hosts in cardinality using small memory resources. In ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops [1623824] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDEW.2006.56

Estimating top N hosts in cardinality using small memory resources. / Ishibashi, Keisuke; Mori, Tatsuya; Kawahara, Ryoichi; Hirokawa, Yutaka; Kobayashi, Atsushi; Yamamoto, Kimihiro; Sakamoto, Hitoaki.

ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops. Institute of Electrical and Electronics Engineers Inc., 2006. 1623824.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ishibashi, K, Mori, T, Kawahara, R, Hirokawa, Y, Kobayashi, A, Yamamoto, K & Sakamoto, H 2006, Estimating top N hosts in cardinality using small memory resources. in ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops., 1623824, Institute of Electrical and Electronics Engineers Inc., 22nd International Conference on Data Engineering Workshops, ICDEW 2006, Atlanta, United States, 06/4/3. https://doi.org/10.1109/ICDEW.2006.56
Ishibashi K, Mori T, Kawahara R, Hirokawa Y, Kobayashi A, Yamamoto K et al. Estimating top N hosts in cardinality using small memory resources. In ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops. Institute of Electrical and Electronics Engineers Inc. 2006. 1623824 https://doi.org/10.1109/ICDEW.2006.56
Ishibashi, Keisuke ; Mori, Tatsuya ; Kawahara, Ryoichi ; Hirokawa, Yutaka ; Kobayashi, Atsushi ; Yamamoto, Kimihiro ; Sakamoto, Hitoaki. / Estimating top N hosts in cardinality using small memory resources. ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops. Institute of Electrical and Electronics Engineers Inc., 2006.
@inproceedings{73cc7ff6f89b4358883c35fdb949e776,
title = "Estimating top N hosts in cardinality using small memory resources",
abstract = "We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.",
author = "Keisuke Ishibashi and Tatsuya Mori and Ryoichi Kawahara and Yutaka Hirokawa and Atsushi Kobayashi and Kimihiro Yamamoto and Hitoaki Sakamoto",
year = "2006",
doi = "10.1109/ICDEW.2006.56",
language = "English",
booktitle = "ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Estimating top N hosts in cardinality using small memory resources

AU - Ishibashi, Keisuke

AU - Mori, Tatsuya

AU - Kawahara, Ryoichi

AU - Hirokawa, Yutaka

AU - Kobayashi, Atsushi

AU - Yamamoto, Kimihiro

AU - Sakamoto, Hitoaki

PY - 2006

Y1 - 2006

N2 - We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.

AB - We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.

UR - http://www.scopus.com/inward/record.url?scp=70349655967&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349655967&partnerID=8YFLogxK

U2 - 10.1109/ICDEW.2006.56

DO - 10.1109/ICDEW.2006.56

M3 - Conference contribution

AN - SCOPUS:70349655967

BT - ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops

PB - Institute of Electrical and Electronics Engineers Inc.

ER -