Machine Learning

CLT and diffusion models

Connections between the Central Limit Theorem (CLT) and the reverse process of diffusion models being able to initialize from normal (Gaussian) distributions.

Encoder-decoder vs decoder only transformer models

Concise comparison between encoder-decoder and decoder-only transformer models.

Overview of transformers

High-level explanation of transformers, how they work (no math), and the history leading up to them.

Separating positional encoding and semantic information

How does the model manage to separate the 2 parts of positional encoding?

The Lottery Ticket Hypothesis

Visual intuition for the Lottery Ticket Hypothesis.

Scaling Transformers Efficiently

A summary of techniques for scaling transformers efficiently.