Comparing synopsis techniques for approximate spatial data analysis

A. B. Siddique, Ahmed Eldawy, Vagelis Hristidis

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations

Abstract

The increasing amount of spatial data calls for new scalable query processing techniques. One of the techniques that are getting attention is data synopsis, which summarizes the data using samples or histograms and computes an approximate answer based on the synopsis. This general technique is used in selectivity estimation, clustering, partitioning, load balancing, and visualization, among others. This paper experimentally studies four spatial data synopsis techniques for three common data analysis problems, namely, selectivity estimation, k-means clustering, and spatial partitioning. We run an extensive experimental evaluation on both real and synthetic datasets of up to 2.7 billion records to study the trade-offs between the synopsis methods and their applicability in big spatial data analysis. For each of the three problems, we compare with baseline techniques that operate on the whole dataset and evaluate the synopsis generation time, the time for computing an approximate answer on the synopsis, and the accuracy of the result. We present our observations about when each synopsis technique performs best.

Original languageEnglish
Pages (from-to)1583-1596
Number of pages14
JournalProceedings of the VLDB Endowment
Volume12
Issue number11
DOIs
StatePublished - 2018
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: Aug 26 2017Aug 30 2017

Bibliographical note

Publisher Copyright:
© 2019, is held by the owner/author(s).

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Comparing synopsis techniques for approximate spatial data analysis'. Together they form a unique fingerprint.

Cite this