CLT and diffusion models

Connections between the Central Limit Theorem (CLT) and the reverse process of diffusion models being able to initialize from normal (Gaussian) distributions.

August 13, 2024 · 5 min · 1055 words · Victor Liu

Encoder-decoder vs decoder only transformer models

Concise comparison between encoder-decoder and decoder-only transformer models.

August 13, 2024 · 3 min · 441 words · Victor Liu

Overview of transformers

High-level explanation of transformers, how they work (no math), and the history leading up to them.

August 13, 2024 · 3 min · 478 words · Victor Liu

Separating positional encoding and semantic information

How does the model manage to separate the 2 parts of positional encoding?

August 13, 2024 · 3 min · 611 words · Victor Liu

The Lottery Ticket Hypothesis

Visual intuition for the Lottery Ticket Hypothesis.

August 13, 2024 · 3 min · 551 words · Victor Liu

Scaling Transformers Efficiently

A summary of techniques for scaling transformers efficiently.

September 5, 2024 · 3 min · 433 words · Victor Liu