state space model

https://goombalab.github.io/blog/2025/tradeoffs/

State space models are good for tasks that do not require (long) history(time)-dependence.

This is because you can process the entire series $x_{t}$ with a parallelizable scan by making the processing of each element of the sequence independent of others, i.e. there is no shared state between the elements of the sequence, as opposed to RNNs or LSTMs.
An example task would be character-level language modelling.
If such dependence is necessary, one can stack layers of SSMs, where the hidden state of each layer still can’t have any knowledge of the previous hidden states of the same layer, but can have knowledge of the hidden states of the previous layer - then layers have to be processed sequentially, but the sequence elements (which are usually much more) can still be processed in parallel. This way, information about the past is implicitly passed via the hidden state, which is computed as $h_{t}^{L} = h_{t - 1}^{L} \cdot a_{t} + b_{t}$ , where $a_{t}, b_{t}$ are the weights and biases of the layer $L$ at time $t$ .
This is akin to convolutions, which have a limited, local receptive field, but which gets implicitly expanded by stacking layers (also similar to the segment-level recurrence of transformer-xl).

SSMs

Many SSMs fall under the following framework:
Given a length $L$ sequence of inputs $x_{1 : L} \in R^{L \times D}$ , a general class of linear recurrences with hidden states $h_{1 : L} \in R^{L \times N}$ and outputs $y_{1 : L} \in R^{L \times D}$ can be computed as shown:
$h_{k} = A_{k} h_{k - 1} + B_{k} x_{k} y_{k} = g (h_{k}, x_{k})$
where $A_{k} \in R^{N \times N}$ is the state transition matrix, $B_{k}$ \boldsymbol{B}{k} \in \mathbb{R}^{N\times D} $i s t h e in p u t ma t r i x, an d$ g(\cdot) $i s t h eo u tp u t f u n c t i o n . [[t im e - in v a r ian ce ∣ T im e in v a r ian t]] SSM s ha v es t a t i c d y nami cs p a r am t ers a cross t im e, i . e .$ \boldsymbol{A}{k} = \boldsymbol{A} $an d$ \boldsymbol{B}_{k} = \boldsymbol{B} \quad \forall k$ (for example S4 (SSM))

References

https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
Birdie - Advancing State Space Modelswith Reward-Driven Objectives and Curricula

Max Wolf's Second Brain

Explorer

state space model

References

Graph View

Backlinks