Abstract
This paper addresses the challenge of identifying super spreaders within large, high-speed data streams. In these streams, data is segmented into flows, with each flow’s spread defined as the number of distinct items it contains. A super spreader is characterized as a flow with a notably large spread. Current compact solutions, known as sketches, are designed to fit within the constrained memory of on line devices. However, they struggle to accurately track the spread of all flows due to the substantial memory requirement for monitoring a single flow—a problem exacerbated when numerous flows are involved. To overcome these limitations, this study proposes a more precise sketch-based approach. Our solution introduces an innovative non-duplicate sampler that effectively eliminates duplicates, allowing for accurate post-sampling count of flow spread using only counters. Additionally, it incorporates an exponential-weakening decay technique to highlight large flows, markedly enhancing the accuracy of super spreader identification. We offer a comprehensive theoretical analysis of our method. Trace-driven experiments validate that our approach statistically surpasses existing state-of the-art solutions in identifying super spreaders. It also demonstrates the lowest time required to restore super spreaders and significantly reduces bandwidth consumption by an order of magnitude when offline restoration is conducted remotely.
Original language | English |
---|---|
Pages (from-to) | 3124-3137 |
Number of pages | 14 |
Journal | Proceedings of the VLDB Endowment |
Volume | 17 |
Issue number | 11 |
DOIs | |
State | Published - 2024 |
Event | 50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, China Duration: Aug 24 2024 → Aug 29 2024 |
Bibliographical note
Publisher Copyright:© 2024, VLDB Endowment. All rights reserved.
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science