Machine Learning

Teaching Machines How to Draw

Teaching Machines to Draw: A Problem Solver’s Journey This blog post is on the topic of using machine learning for image generation. However, its goal is not to serve as a comprehensive guide to the field, nor a formal treatise on the subject. It is not even intended to be historically accurate, or to teach you about the mathematical formulas behind the techniques. Instead, it is a fictional story written from the perspective of a curious explorer who is attempting to solve a challenging problem....

Attention Mechanism in Transformers Derived with Statistical Mechanics Interpretation

An full explanation and interpretation of scaled dot-product attention via the Boltzmann distribution and probability theory.

Stochastic Gradient Descent

An overview of Stochastic Gradient Descent (SGD)

Learning to Do Math With LLMs

I want to comment on the recent results achieved by OpenAI’s new o1 model family. Specifically, I will be dicussing formal verification, which can be applied to mathematics and computer programming. The name of this post is inspired by their September 12 2024 blog post titled “Learning to Reason with LLMs”. The abstract reads: “We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning....

Overview of Diffusion Models

An overview of diffusion models, including their mathematical foundations, key concepts, and practical applications.

Machine Learning as Interpolation

Exploring how machine learning models generalize by interpolating on data manifolds, with concrete examples from image and language processing

Multi-Head Latent Attention

A short post on Multi-Head Latent Attention as presented in the DeepSeek-V2 paper.

Meaningful Feature Learning in Models

An overview of the challenges and solutions for learning meaningful features in machine learning models.

Expressiveness of Sparse Matrices

An evaluation of the expressiveness of sparse matrices in machine learning.

Strassen's Algorithm in Ternary Matrices

An explanation of the potential benefits of Strassen’s algorithm in ternary matrices.