Time-decaying Bloom Filters for data streams with skewed distributions

Kai Cheng, Limin Xiang, Haiyan Xu, Mizuho Iwaihara, Mukesh M. Mohania

Research output: Chapter in Book/Report/Conference proceedingConference contribution

35 Citations (Scopus)

Abstract

Bloom Filters are space-efficient data structures for membership queries over sets. To enable queries for multiplicities of multi-sets, the bitmap in a Bloom Filter is replaced by an array of counters whose values increment on each occurrence. In a data stream model, however, data items arrive at varying rates and recent occurrences are often regarded as more significant than past ones. In most data stream applications, it is critical to handle this "time-sensitivity". Furthermore, data streams with skewed distributions are common in many emerging applications, e.g., traffic engineering and billing, intrusion detection, trading surveillance and outlier detection. For such applications, it is inefficient to allocate counters of uniform size to all buckets. In this paper, we present Time-decaying Bloom Filters (TBF), a Bloom Filter that maintains the frequency count for each item in a data stream, and the value of each counter decays with time. For data streams with highly skewed distributions, we proposed further optimization by allowing dynamically allocating free counters to the "large" items. We performed preliminary experiments to verify the optimization.

Original languageEnglish
Title of host publicationProceedings of the IEEE International Workshop on Research Issues in Data Engineering
EditorsJ. Han, H. Kawano
Pages63-69
Number of pages7
Publication statusPublished - 2005
Externally publishedYes
Event15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, RIDE-SDMA 2005 - Tokyo
Duration: 2005 Apr 32005 Apr 4

Other

Other15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, RIDE-SDMA 2005
CityTokyo
Period05/4/305/4/4

Fingerprint

Intrusion detection
Data structures
Experiments

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Engineering (miscellaneous)

Cite this

Cheng, K., Xiang, L., Xu, H., Iwaihara, M., & Mohania, M. M. (2005). Time-decaying Bloom Filters for data streams with skewed distributions. In J. Han, & H. Kawano (Eds.), Proceedings of the IEEE International Workshop on Research Issues in Data Engineering (pp. 63-69)

Time-decaying Bloom Filters for data streams with skewed distributions. / Cheng, Kai; Xiang, Limin; Xu, Haiyan; Iwaihara, Mizuho; Mohania, Mukesh M.

Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. ed. / J. Han; H. Kawano. 2005. p. 63-69.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cheng, K, Xiang, L, Xu, H, Iwaihara, M & Mohania, MM 2005, Time-decaying Bloom Filters for data streams with skewed distributions. in J Han & H Kawano (eds), Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. pp. 63-69, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, RIDE-SDMA 2005, Tokyo, 05/4/3.
Cheng K, Xiang L, Xu H, Iwaihara M, Mohania MM. Time-decaying Bloom Filters for data streams with skewed distributions. In Han J, Kawano H, editors, Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. 2005. p. 63-69
Cheng, Kai ; Xiang, Limin ; Xu, Haiyan ; Iwaihara, Mizuho ; Mohania, Mukesh M. / Time-decaying Bloom Filters for data streams with skewed distributions. Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. editor / J. Han ; H. Kawano. 2005. pp. 63-69
@inproceedings{bdf2e3d681f94bc28203c54a43b21ff8,
title = "Time-decaying Bloom Filters for data streams with skewed distributions",
abstract = "Bloom Filters are space-efficient data structures for membership queries over sets. To enable queries for multiplicities of multi-sets, the bitmap in a Bloom Filter is replaced by an array of counters whose values increment on each occurrence. In a data stream model, however, data items arrive at varying rates and recent occurrences are often regarded as more significant than past ones. In most data stream applications, it is critical to handle this {"}time-sensitivity{"}. Furthermore, data streams with skewed distributions are common in many emerging applications, e.g., traffic engineering and billing, intrusion detection, trading surveillance and outlier detection. For such applications, it is inefficient to allocate counters of uniform size to all buckets. In this paper, we present Time-decaying Bloom Filters (TBF), a Bloom Filter that maintains the frequency count for each item in a data stream, and the value of each counter decays with time. For data streams with highly skewed distributions, we proposed further optimization by allowing dynamically allocating free counters to the {"}large{"} items. We performed preliminary experiments to verify the optimization.",
author = "Kai Cheng and Limin Xiang and Haiyan Xu and Mizuho Iwaihara and Mohania, {Mukesh M.}",
year = "2005",
language = "English",
pages = "63--69",
editor = "J. Han and H. Kawano",
booktitle = "Proceedings of the IEEE International Workshop on Research Issues in Data Engineering",

}

TY - GEN

T1 - Time-decaying Bloom Filters for data streams with skewed distributions

AU - Cheng, Kai

AU - Xiang, Limin

AU - Xu, Haiyan

AU - Iwaihara, Mizuho

AU - Mohania, Mukesh M.

PY - 2005

Y1 - 2005

N2 - Bloom Filters are space-efficient data structures for membership queries over sets. To enable queries for multiplicities of multi-sets, the bitmap in a Bloom Filter is replaced by an array of counters whose values increment on each occurrence. In a data stream model, however, data items arrive at varying rates and recent occurrences are often regarded as more significant than past ones. In most data stream applications, it is critical to handle this "time-sensitivity". Furthermore, data streams with skewed distributions are common in many emerging applications, e.g., traffic engineering and billing, intrusion detection, trading surveillance and outlier detection. For such applications, it is inefficient to allocate counters of uniform size to all buckets. In this paper, we present Time-decaying Bloom Filters (TBF), a Bloom Filter that maintains the frequency count for each item in a data stream, and the value of each counter decays with time. For data streams with highly skewed distributions, we proposed further optimization by allowing dynamically allocating free counters to the "large" items. We performed preliminary experiments to verify the optimization.

AB - Bloom Filters are space-efficient data structures for membership queries over sets. To enable queries for multiplicities of multi-sets, the bitmap in a Bloom Filter is replaced by an array of counters whose values increment on each occurrence. In a data stream model, however, data items arrive at varying rates and recent occurrences are often regarded as more significant than past ones. In most data stream applications, it is critical to handle this "time-sensitivity". Furthermore, data streams with skewed distributions are common in many emerging applications, e.g., traffic engineering and billing, intrusion detection, trading surveillance and outlier detection. For such applications, it is inefficient to allocate counters of uniform size to all buckets. In this paper, we present Time-decaying Bloom Filters (TBF), a Bloom Filter that maintains the frequency count for each item in a data stream, and the value of each counter decays with time. For data streams with highly skewed distributions, we proposed further optimization by allowing dynamically allocating free counters to the "large" items. We performed preliminary experiments to verify the optimization.

UR - http://www.scopus.com/inward/record.url?scp=27144534828&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27144534828&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:27144534828

SP - 63

EP - 69

BT - Proceedings of the IEEE International Workshop on Research Issues in Data Engineering

A2 - Han, J.

A2 - Kawano, H.

ER -