TSseek: Regular Expression-Based Similarity Search for Distributed Time Series Datasets

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Xiaoshuai Li, Khalid Alnuaim, Mohamed Y. Eltabakh, Elke A. Rundensteiner 1 min read

Key Points

arXiv:2606.09824v1 Announce Type: new Abstract: Similarity search is a fundamental operation in time series analysis. Most existing techniques, however, require users to supply a precise sequence of values (typically an entire time series object) as the query input. This rigid requirement limits real-world applications, where users instead want to express patterns, trends, or value ranges. Flexible, pattern-based search has been explored in text retrieval and complex event processing, but remains underexplored for large-scale distributed time series. To close this gap, we propose TSseek, a regular-expression-powered search framework for distributed time series datasets. TSseek's query language enables users to compose patterns encompassing trends, value ranges, and wildcard segments. We show that conventional approximation techniques (e.g., PAA and SAX) and their index structures are ill-suited for such queries because they cannot operate on regular-expression query constructs. In TSseek, we map the time series objects and the query constructs into the same space by approximating time series objects as sequences of line segments that retain both trend (slope direction) and value range, and translating query constructs into bounding rectangles. To support efficient processing, we build TSseek-X, a distributed spatial index over the time series segments. TSseek supports two fundamental query types, namely whole-matching queries (over entire series) and subsequence-matching queries (over arbitrary windows within a series). Across benchmark and real-world datasets, full-scan, model-based, and SAX-based baselines all sacrifice either accuracy or speed, whereas TSseek returns exact answers efficiently. Also, for subsequence workloads, it achieves significant speedups over state-of-the-art subsequence matching engines.

TSseek (PERSON) PAA (ORG) SAX (ORG)

Originally published by arXiv CS Read original →

TSseek: Regular Expression-Based Similarity Search for Distributed Time Series Datasets

Related Stories

When 'Island Nemo' went missing, locals suspected foul play

Organic foods are not healthier or pesticide free

Artificial turf contains 400 chemicals tied to cancer and hormone disruption. But is it unsafe?

Japan’s Retail Investor Army Flocks to SpaceX After IPO Drought