This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.
|Number of pages||40|
|Journal||Journal of Intelligent Information Systems|
|State||Published - Nov 2005|
Bibliographical noteFunding Information:
This work was partially supported by NSF grants CCR-0100040, ITR-0325063, and ITR-0219924. We’d like to thank the anonymous reviewers of our conference papers, whose suggestions improved this paper. We also want to thank V.S. Subrahmanian for helpful comments and the students working in the Bayesian Advisor group for their input at various stages, and their fabulous energy.
- Data models
- Probabilistic databases
- Query algebras
- Semistructured data
ASJC Scopus subject areas
- Information Systems
- Hardware and Architecture
- Computer Networks and Communications
- Artificial Intelligence