CLT and diffusion models
Connections between the Central Limit Theorem (CLT) and the reverse process of diffusion models being able to initialize from normal (Gaussian) distributions.
Connections between the Central Limit Theorem (CLT) and the reverse process of diffusion models being able to initialize from normal (Gaussian) distributions.
Concise comparison between encoder-decoder and decoder-only transformer models.
High-level explanation of transformers, how they work (no math), and the history leading up to them.
How does the model manage to separate the 2 parts of positional encoding?
Visual intuition for the Lottery Ticket Hypothesis.
A summary of techniques for scaling transformers efficiently.