Schäfer, Jörg Peter
(2022)
*Efficient Nearest Neighbor Search on Metric Time Series.*
Dissertation, Humboldt-Universität zu Berlin.

PDF
1MB |

## Abstract

While Deep-Learning approaches beat Nearest-Neighbor classifiers in an increasing number of areas, searching existing uncertain data remains an exclusive task for similarity search. Numerous specific solutions exist for different types of data and queries. This thesis aims at finding fast and general solutions for searching and indexing arbitrarily typed time series. A time series is considered a sequence of elements where the elements' order matters but not their actual time stamps. Since this thesis focuses on measuring distances between time series, the metric space is the most appropriate concept where the time series' elements come from. Hence, this thesis mainly considers metric time series as data type. Simple examples include time series in Euclidean vector spaces or graphs. For general similarity search solutions in time series, two primitive comparison semantics need to be distinguished, the first of which compares the time series' trajectories ignoring time warping. A ubiquitous example of such a distance function is the Dynamic Time Warping distance (DTW) developed in the area of speech recognition. The Dog Keeper distance (DK) is another time-warping distance that, opposed to DTW, is truly invariant under time warping and yields a metric space. After canonically extending DTW to accept multi-dimensional time series, this thesis contributes a new algorithm computing DK that outperforms DTW on time series in high-dimensional vector spaces by more than one order of magnitude. An analytical study of both distance functions reveals the reasons for the superiority of DK over DTW in high-dimensional spaces. The second comparison semantic compares time series in Euclidean vector spaces regardless of their position or orientation. This thesis proposes the Congruence distance that is the Euclidean distance minimized under all isometric transformations; thus, it is invariant under translation, rotation, and reflection of the time series and therefore disregards the position or orientation of the time series. A proof contributed in this thesis shows that there can be no efficient algorithm computing this distance function (unless P=NP). Therefore, this thesis contributes the Delta distance, a metric distance function serving as a lower bound for the Congruence distance. While the Delta distance has quadratic time complexity, the provided evaluation shows a speedup of more than two orders of magnitude against the Congruence distance. Furthermore, the Delta distance is shown to be tight on random time series, although the tightness can be arbitrarily bad in corner-case situations. Orthogonally to the previous mentioned comparison semantics, similarity search on time series consists of two different types of queries: whole sequence matching and subsequence search. Metric index structures (e. g., the M-Tree) only provide whole matching queries natively. This thesis contributes the concept of metric subset spaces and the SuperM-Tree for indexing metric subset spaces as a generic solution for subsequence search. Examples for metric subset spaces include subsequence search regarding the distance functions from the comparison semantics mentioned above. The provided evaluation shows that the SuperM-Tree outperforms a linear search by multiple orders of magnitude.

Item URL in elib: | https://elib.dlr.de/193908/ | ||||||||
---|---|---|---|---|---|---|---|---|---|

Document Type: | Thesis (Dissertation) | ||||||||

Title: | Efficient Nearest Neighbor Search on Metric Time Series | ||||||||

Authors: |
| ||||||||

Date: | December 2022 | ||||||||

Refereed publication: | Yes | ||||||||

Open Access: | Yes | ||||||||

Status: | Published | ||||||||

Keywords: | Time Series, DTW, DK, Metric Spaces, Metric Index Structures | ||||||||

Institution: | Humboldt-Universität zu Berlin | ||||||||

Department: | Mathematisch-Naturwissenschaftliche Fakultät | ||||||||

HGF - Research field: | other | ||||||||

HGF - Program: | other | ||||||||

HGF - Program Themes: | other | ||||||||

DLR - Research area: | no assignment | ||||||||

DLR - Program: | no assignment | ||||||||

DLR - Research theme (Project): | no assignment | ||||||||

Location: | Berlin-Adlershof | ||||||||

Institutes and Institutions: | Institute of Transportation Systems | ||||||||

Deposited By: | Schäfer, Jörg Peter | ||||||||

Deposited On: | 23 Feb 2023 14:16 | ||||||||

Last Modified: | 23 Feb 2023 14:19 |

Repository Staff Only: item control page