Kalman State Space Models, Simply Explained

AI, But Simple Issue #100

Kalman State Space Models, Simply Explained

AI, But Simple Issue #100

Hello from the AI, but simple team! If you enjoy our content and custom visuals, consider sharing this newsletter with others or upgrading so we can keep doing what we do.

For decades, the Kalman filter has been widely used for estimation theory, providing a mathematically elegant method to track hidden states through noisy observations.

At the same time, deep sequence models have evolved from simple recurrent networks to transformer architectures and, most recently, selective state space models.

The Kalman area and the sequence model path have largely developed side-by-side, sometimes borrowing from each other but never fully combining.

The Kalman-Optimal Selective State Space Model (KOSS) (Wang et al., 2025) changes that.

KOSS offers a strong mathematical combination of these two areas.

  • By using Kalman filters to estimate what important details to remember in a sequence (the selection problem), it’s able to be grounded in mathematics rather than heuristic design choices.

The result is a model that is not only theoretically motivated but practically better.

It consistently outperforms state-of-the-art baselines across nine forecasting benchmarks while using fewer parameters and less memory than its competitors.

What Are State Space Models?

When a model reads a sequence (a sentence, a time series, or a stream of sensor data), it should do two things simultaneously: remember what happened earlier and decide what's worth remembering.

A State Space Model (SSM) is one way to do this. Think of it like a conveyor belt with a memory box.

As each new token or data point arrives, the model updates its memory box based on what just came in, then uses that memory to make predictions or pass information forward.

The "state" is just a compact summary of everything the model has seen so far, like a running tally. The "space" refers to the mathematical framework that governs how that summary evolves over time.

SSMs became popular in machine learning because they are much faster than Transformers on long sequences.

Transformers compare every token to every other token (which gets expensive fast), while SSMs compress the past into a fixed-size state and update it step by step, like reading a book and keeping a rolling summary rather than re-reading every page from the start.

Subscribe to keep reading

This content is free, but you must be subscribed to AI, But Simple to continue reading.

Already a subscriber?Sign in.Not now