Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
by Community
While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match
OSS
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Added 2 June 2026
Overview
This paper introduces the State Space Duality (SSD) framework, which reveals deep theoretical connections between state-space models (SSMs) like Mamba and transformer attention mechanisms through structured semiseparable matrices. It generalizes both families and leads to a new efficient architecture called Mamba-2 that combines strengths of SSMs and attention.
Best for
Best for
Researchers and advanced engineers designing efficient sequence models with structured state spaces
Use cases
- Understanding theoretical foundations linking SSMs and attention for model design
- Implementing Mamba-2 for efficient sequence modeling with linear-time inference
- Analyzing and comparing different attention and SSM variants using the SSD lens
Notes
This paper introduces the State Space Duality (SSD) framework, which reveals deep theoretical connections between state-space models (SSMs) like Mamba and transformer attention mechanisms through structured semiseparable matrices. It generalizes both families and leads to a new efficient architecture called Mamba-2 that combines strengths of SSMs and attention.
Use cases
- Understanding theoretical foundations linking SSMs and attention for model design
- Implementing Mamba-2 for efficient sequence modeling with linear-time inference
- Analyzing and comparing different attention and SSM variants using the SSD lens
Pros
- Provides a unified theoretical framework that clarifies design choices
- Enables development of more efficient algorithms by leveraging connections
- Directly informs the architecture of Mamba-2, which performs competitively
Cons
- Theoretical focus may require significant background to apply practically
- Mamba-2 is still new and adoption is limited compared to established architectures
- Does not include implementation details or production-ready code in the paper
Indexed from awesome-llm and enriched against its public facts.
Pros
- Provides a unified theoretical framework that clarifies design choices
- Enables development of more efficient algorithms by leveraging connections
- Directly informs the architecture of Mamba-2, which performs competitively
Cons
- Theoretical focus may require significant background to apply practically
- Mamba-2 is still new and adoption is limited compared to established architectures
- Does not include implementation details or production-ready code in the paper