Enhancing Accuracy for Super Spreader Identification in High-Speed Data Streams

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

This paper addresses the challenge of identifying super spreaders within large, high-speed data streams. In these streams, data is segmented into flows, with each flow’s spread defined as the number of distinct items it contains. A super spreader is characterized as a flow with a notably large spread. Current compact solutions, known as sketches, are designed to fit within the constrained memory of on line devices. However, they struggle to accurately track the spread of all flows due to the substantial memory requirement for monitoring a single flow—a problem exacerbated when numerous flows are involved. To overcome these limitations, this study proposes a more precise sketch-based approach. Our solution introduces an innovative non-duplicate sampler that effectively eliminates duplicates, allowing for accurate post-sampling count of flow spread using only counters. Additionally, it incorporates an exponential-weakening decay technique to highlight large flows, markedly enhancing the accuracy of super spreader identification. We offer a comprehensive theoretical analysis of our method. Trace-driven experiments validate that our approach statistically surpasses existing state-of the-art solutions in identifying super spreaders. It also demonstrates the lowest time required to restore super spreaders and significantly reduces bandwidth consumption by an order of magnitude when offline restoration is conducted remotely.

Original languageEnglish
Pages (from-to)3124-3137
Number of pages14
JournalProceedings of the VLDB Endowment
Volume17
Issue number11
DOIs
StatePublished - 2024
Event50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, China
Duration: Aug 24 2024Aug 29 2024

Bibliographical note

Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Enhancing Accuracy for Super Spreader Identification in High-Speed Data Streams'. Together they form a unique fingerprint.

Cite this