Albert’s Research Blog

Frame 43(3).png

Diffusion language models (dLLMs) have attention patterns that differ significantly from autoregressive LLMs. This makes scaling dLLMs to long contexts challenging. We propose several techniques to reliably extend the context window of open-source dLLMs up to 131k tokens.