Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models

Preprint

1ShanghaiTech University 2Ant Group
*Equal contribution
Corresponding author

Kairos achieves superior performance with fewer parameters on zero-shot forecasting benchmarks.

KAIROS Performance vs Parameters

Figure 1. (a) Comparison on the GIFT-Eval benchmark demonstrates that Kairos achieves superior zero-shot forecasting performance (lower normalized MASE) while requiring significantly fewer parameters than existing TSFMs. (b)(c) Significant variation exists in information density (signal complexity) across and within different time series datasets. (d) Unlike existing TSFMs that use point-wise or fixed-size patching, Kairos uses Mixture-of-Size Tokenization to dynamically adapt to information density.

Abstract

Inherent temporal heterogeneity, such as varying sampling densities and periodic structures, has posed substantial challenges in zero-shot generalization for Time Series Foundation Models (TSFMs). Existing TSFMs predominantly rely on massive parameterization to absorb such heterogeneity, as their static tokenization and positional encoding schemes entangle diverse temporal patterns into a fixed representation space, encouraging memorization rather than adaptation. To address this limitation, we propose Kairos, a flexible and parameter-efficient TSFM that decouples temporal heterogeneity from model capacity through a novel tokenization perspective. Kairos introduces a dynamic patching tokenizer and a mixture-of-size encoding that adapt observational granularity to local information density, enabling fine-grained temporal abstraction without increasing model width or depth. In addition, we design a multi-granularity positional embedding based on dynamic rotary encodings, which conditions on instance-level spectral features and temporal structure induced by dynamic patching tokenization, allowing robust modeling of diverse temporal dependencies. Trained on a novel Predictability-Stratified Time-Series (PreSTS) corpus, Kairos achieves superior zero-shot performance with substantially fewer parameters on two mainstream benchmarks, GIFT-Eval and Time-Series-Library.

Methodology

The Kairos architecture is designed to handle temporal heterogeneity efficiently through three key components:

  • Mixture-of-Size Encoder: A dynamic patching tokenizer that adaptively models time series at multiple granularities. It uses a Top-K Granularity Router with null experts to sparsely select optimal patch sizes based on local information density.
  • Heterogeneity-Aware Transformer: The encoder output is processed by a Transformer equipped with Dynamic Rotary Position Embedding (DRoPE). Unlike standard RoPE, DRoPE modulates temporal scales using instance-level spectral features (via FFT) and calibrates positions to account for the varying physical durations of dynamic patches.
  • Multi-Patch Decoder: To mitigate error accumulation in autoregressive generation, the decoder uses learnable forecast tokens to predict multiple future patches in parallel, offering flexibility for variable-length prediction horizons.
KAIROS Architecture

Figure 2. The architecture of Kairos. (i) The Mixture-of-Size Encoder adaptively tokenizes input. (ii) The Heterogeneity-Aware Transformer processes tokens using DRoPE. (iii) The Multi-Patch Decoder predicts future patches in parallel.

GIFT-Eval Benchmark Results

Performance evaluation on the GIFT-Eval benchmark using normalized MASE and CRPS metrics, where lower values indicate higher forecasting accuracy. The baseline models fall into three categories: statistical methods, deep learning (DL) models, and Time Series Foundation Models (TSFMs). TSFMs are further subdivided based on whether the training set included test data (TestData Leakage).

Type Statistical DL (Full-Shot) TSFMs (TestData Leakage) TSFMs (Zero-Shot)
Seasonal
Naïve
DLinear PTST. TTM Chronos Chronos
Bolt
TimesFM Moirai VisionTS Ying. Toto Sundial Kaiross
(Ours)
Kairosb
(Ours)
#Params - - - 5M 709M 205M 500M 311M 112M 300M 151M 128M 23M 53M
MASE ↓ 1.000 1.061 0.849 1.020 0.870 0.808 0.758 0.875 0.863 0.798 0.750 0.750 0.748 0.738
CRPS ↓ 1.000 0.846 0.587 0.873 0.574 0.574 0.550 0.599 0.755 0.548 0.517 0.559 0.554 0.548

Baselines results officially reported by GIFT-Eval. Best results are bolded, second best are underlined.

TSLib Benchmark Results

Kairos demonstrates remarkable zero-shot forecasting capabilities, outperforming both recent advanced TSFMs and the majority of full-shot deep learning models on the TSLib benchmark. The charts below show the aggregate average Mean Absolute Error (MAE) and Mean Squared Error (MSE), where lower values indicate better performance.

TSLib Benchmark Results

Figure 3. Zero-shot performance on TSLib averaged over prediction lengths {96, 192, 336, 720}. Lower AVG MAE and MSE indicate better performance.