O Open Source Frameworks medium

Attention Is All You Need

by Community

Transformers

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

The seminal 2017 paper that introduced the Transformer architecture, replacing recurrent layers with a multi-head self-attention mechanism for sequence transduction. It demonstrates that attention alone, without recurrence or convolution, can achieve state-of-the-art translation performance and forms the foundation of modern large language models.

Best for

Best for
Researchers and engineers building or modifying transformer-based models for NLP and beyond

Use cases

Foundational reference for implementing Transformer-based NLP models
Understanding self-attention and positional encoding for sequence tasks
Building encoder-decoder architectures for machine translation and summarization

Notes

Use cases

Foundational reference for implementing Transformer-based NLP models
Understanding self-attention and positional encoding for sequence tasks
Building encoder-decoder architectures for machine translation and summarization

Pros

Introduced a highly parallelizable architecture that enabled training on large data
Established attention as a core building block for countless follow-up models
Simple yet powerful concept that generalizes beyond NLP to vision and other modalities

Cons

Lacks inherent positional awareness, requiring explicit positional encodings
Quadratic self-attention cost with sequence length limits long-context efficiency
Original results require large compute and data; not a drop-in beginner tutorial

Indexed from awesome-llm and enriched against its public facts.

Pros

Introduced a highly parallelizable architecture that enabled training on large data
Established attention as a core building block for countless follow-up models
Simple yet powerful concept that generalizes beyond NLP to vision and other modalities

Cons

Lacks inherent positional awareness, requiring explicit positional encodings
Quadratic self-attention cost with sequence length limits long-context efficiency
Original results require large compute and data; not a drop-in beginner tutorial

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with4entries

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 23d ago

O OSS Obs medium

TensorFlow

Community

An Open Source Machine Learning Framework for Everyone

★ 195,356 updated 23d ago

O OSS Obs medium

Jax

Community

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

★ 35,725 updated 23d ago

O OSS Obs medium

Keras

Community

Deep Learning for humans

★ 64,079 updated 23d ago

← Back to Open Source Submit your own entry →