Abstract
Comparing and contrasting subtle historical patterns is central to time series analysis. Here we introduce a new approach to quantify deviations in the underlying hidden stochastic generators of sequential discrete-valued data streams. The proposed measure is universal in the sense that we can compare data streams without any feature engineering step, and without the need of any hyper-parameters. Our core idea here is the generalization of the Kullback–Leibler divergence, often used to compare probability distributions, to a notion of divergence between finite-valued ergodic stationary stochastic processes. Using this notion of process divergence, we craft a measure of deviation on finite sample paths which we call the sequence likelihood divergence (SLD) which approximates a metric on the space of the underlying generators within a well-defined class of discrete-valued stochastic processes. We compare the performance of SLD against the state of the art approaches, e.g., dynamic time warping (Petitjean et al. in Pattern Recognit 44(3):678–693, 2011) with synthetic data, real-world applications with electroencephalogram data and in gait recognition, and on diverse time-series classification problems from the University of California, Riverside time series classification archive (Thanawin Rakthanmanon and Westover). We demonstrate that the new tool is at par or better in classification accuracy, while being significantly faster in comparable implementations. Released in the publicly domain, we are hopeful that SLD will enhance the standard toolbox used in classification, clustering and inference problems in time series analysis.
Original language | English |
---|---|
Pages (from-to) | 3079-3098 |
Number of pages | 20 |
Journal | Knowledge and Information Systems |
Volume | 65 |
Issue number | 7 |
DOIs | |
State | Published - Jul 2023 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
Keywords
- Dynamic time warping
- Probabilistic finite state automata
- Time series clustering
- Universal metric
ASJC Scopus subject areas
- Software
- Information Systems
- Human-Computer Interaction
- Hardware and Architecture
- Artificial Intelligence