A framework for management of semistructured probabilistic data

Wenzhong Zhao, Alex Dekhtyar, Judy Goldsmith

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.

Original languageEnglish
Pages (from-to)293-332
Number of pages40
JournalJournal of Intelligent Information Systems
Volume25
Issue number3
DOIs
StatePublished - Nov 2005

Bibliographical note

Funding Information:
This work was partially supported by NSF grants CCR-0100040, ITR-0325063, and ITR-0219924. We’d like to thank the anonymous reviewers of our conference papers, whose suggestions improved this paper. We also want to thank V.S. Subrahmanian for helpful comments and the students working in the Bayesian Advisor group for their input at various stages, and their fabulous energy.

Keywords

  • Data models
  • Probabilistic databases
  • Query algebras
  • Semistructured data

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A framework for management of semistructured probabilistic data'. Together they form a unique fingerprint.

Cite this