Kairos: Towards Adaptive and Generalizable Time Series Foundation Models

Preprint

1ShanghaiTech University 2Ant Group
*Equal contribution
Corresponding author

Kairos achieves superior performance with fewer parameters on zero-shot forecasting benchmarks.

KAIROS Performance vs Parameters

(a) shows the trade-off between performance (MASE) and the number of parameters on GIFT-Eval. Kairos series (red) achieves better performance with significantly fewer parameters compared to other models. (b) and (c) highlight the significant variation in information density, both across different datasets and within individual time series. (d) contrasts different tokenization strategies.

Abstract

Time series foundation models (TSFMs) have emerged as a powerful paradigm for time series analysis, driven by large-scale pretraining on diverse data corpora. However, time series inherently exhibit heterogeneous information density over time, influenced by system states and signal complexity, presenting significant modeling challenges especially in a zero-shot scenario. Current TSFMs rely on non-adaptive processing pipelines that fail to capture this dynamic nature. For example, common tokenization strategies such as fixed-size patching enforce rigid observational granularity, limiting their ability to adapt to varying information densities. Similarly, conventional positional encodings impose a uniform temporal scale, making it difficult to model diverse periodicities and trends across series. To overcome these limitations, we propose Kairos, a flexible TSFM framework that integrates a dynamic patching tokenizer and an instance-adaptive positional embedding. Kairos adaptively selects tokenization granularity and tailors positional encodings to the unique characteristics of each time series instance. Trained on a large-scale Predictability-Stratified Time Series (PreSTS) corpus comprising over 300 billion time points and adopting a multi-patch prediction strategy in the inference stage, Kairos achieves superior performance with much fewer parameters on two common zero-shot benchmarks, GIFT-Eval and the Time-Series-Library benchmark, consistently outperforming established methods across diverse tasks.

Methodology

The Kairos architecture consists of three main components. First, the input time series is tokenized by a Mixture-of-Size Dynamic Patching (MoS-DP) module to extract multi-granularity local information. These embeddings are then processed by a Transformer encoder, which uses our proposed Instance-Adaptive Rotary Position Embedding (IAROPE) to model complex temporal relationships. Finally, a Transformer decoder utilizes a multi-patch prediction strategy for forecasting.

KAIROS Architecture

The overall architecture of Kairos, showing the MoS-DP and IAROPE modules.

GIFT-Eval Benchmark Results

Performance evaluation on the GIFT-Eval benchmark using normalized MASE and CRPS metrics, where lower values indicate higher forecasting accuracy. The baseline models fall into three categories: statistical methods, deep learning (DL) models, and Time Series Foundation Models (TSFMs). TSFMs are further subdivided based on whether the training set included test data (TestData Leakage).

Type Statistical DL (Full-Shot) TSFMs (TestData Leakage) TSFMs (Zero-Shot)
Seasonal
Naïve
DLinear PTST. TTM Chronos Chronos
Bolt
TimesFM Moirai VisionTS Ying. Toto Sundial Kairos (ours) Kairos (ours)
#Params - - - 5M 709M 205M 500M 311M 112M 300M 151M 128M 23M 50M
MASE 1.000 1.061 0.849 1.020 0.870 0.808 0.758 0.875 0.863 0.798 0.750 0.750 0.748 0.742
CRPS 1.000 0.846 0.587 0.873 0.574 0.574 0.550 0.599 0.755 0.548 0.517 0.559 0.554 0.548

TSLib Benchmark Results

Kairos demonstrates remarkable zero-shot forecasting capabilities, outperforming both recent advanced TSFMs and the majority of full-shot deep learning models on the TSLib benchmark. The charts below show the aggregate average Mean Absolute Error (MAE) and Mean Squared Error (MSE), where lower values indicate better performance.

TSLib Benchmark Results

Zero-shot forecasting performance on TSLib. Kairos (Ours) models consistently achieve the lowest error rates.