Motivation and Historical Context

The Fast Fourier Transform (FFT) is a highly efficient algorithm for computing the Discrete Fourier Transform (DFT), which is a foundational tool in the analysis of signals, enabling us to break down signals into their frequency components. The Fourier Transform (FT) and its discrete counterpart, the DFT, allow us to move between the time and frequency domains, helping us understand how different frequencies contribute to a signal over time.

The problem, however, is that the direct computation of the DFT for a signal of size $ N $ has a computational complexity of $ O(N^2) $. This means that for a large dataset, performing the DFT can become computationally expensive and time-consuming, as it requires computing $ N^2 $ complex multiplications and additions. For example, if you double the number of data points in a signal, the computational time increases fourfold.

In 1965, mathematicians James Cooley and John Tukey introduced the FFT, an algorithm that dramatically reduces the complexity of calculating the DFT from $ O(N^2) $ to $ O(N \log N) $. This breakthrough made it feasible to apply Fourier analysis in real-world applications, such as telecommunications, image processing, and audio analysis, where handling large datasets efficiently is critical. The FFT is now an indispensable tool in digital signal processing.

The central problem addressed by the FFT is this: Given a sequence of N complex numbers, how can we compute its DFT more efficiently, in less than $ O(N^2) $ operations?

The Cooley-Tukey FFT algorithm is a solution to this problem. It is a divide-and-conquer approach that recursively splits the input sequence into smaller subproblems, computing the DFT of each part independently.

1. Discrete Fourier Transform (DFT) Definition

For a sequence $ x_0, x_1, \dots, x_{N-1} $, the DFT is defined as:

$$ X_k = \sum_{n=0}^{N-1} x_n e^{-2\pi i kn/N}, \quad k = 0, 1, \dots, N-1 $$

Each $ X_k $ (the $ k $-th frequency component) requires a summation over all $ N $ data points, and for each $ n $, we need to compute a complex exponential $ e^{-2\pi i kn/N} $. This results in $ N^2 $ total computations: $ N $ values of $ X_k $, each requiring $ N $ terms in the sum.

2. FFT and the Radix-2 Cooley-Tukey Algorithm

The key insight behind the FFT is dividing the DFT into smaller, more manageable subproblems, specifically by breaking down the problem recursively using a divide-and-conquer approach. Here’s how it works for the Radix-2 FFT:

Step-by-Step Example for $ N = 8 $:

  1. Divide the sequence into even and odd indexed elements: Let the input sequence be $ x_0, x_1, x_2, x_3, x_4, x_5, x_6, x_7 $.

    • Even-indexed terms: $ x_0, x_2, x_4, x_6 $
    • Odd-indexed terms: $ x_1, x_3, x_5, x_7 $
  2. Recursively apply the DFT to both halves: Now, instead of computing the DFT for all 8 points at once, we compute the DFT of the even-indexed sequence and the odd-indexed sequence separately, both of which have length 4.

    We apply the same divide-and-conquer principle to these length-4 sequences, splitting them into even and odd indexed elements again. For example:

    • Even part: $ x_0, x_4 $
    • Odd part: $ x_2, x_6 $

    Repeat this process until you reach sequences of length 1 (where the DFT is trivial since $ X_k = x_k $).

  3. Combine the results using the “butterfly operation”: Once the DFTs of the smaller sequences are computed, we combine them. This is where the efficiency of the FFT comes in. Instead of calculating each $ X_k $ from scratch, we reuse results from the smaller DFTs by applying twiddle factors $ e^{-2\pi i kn/N} $ (the complex exponentials) to mix the results.

    Specifically, the results from the DFT of the even-indexed terms and the odd-indexed terms are combined using the formula:

    $$ X_k = E_k + W_N^k O_k, \quad X_{k+N/2} = E_k - W_N^k O_k $$

    where $ E_k $ is the DFT of the even-indexed sequence, $ O_k $ is the DFT of the odd-indexed sequence, and $ W_N^k = e^{-2\pi i k/N} $ is the twiddle factor.

  4. Recursive Combination: For each level of recursion, you combine the results using the butterfly operation. The key is that the symmetry and periodicity of the twiddle factors reduce redundant calculations.

3. Numerical Example: Computing an FFT for $ N = 8 $

Let’s take a concrete example with a sequence of 8 real numbers:

$$ x = [1, 2, 3, 4, 5, 6, 7, 8] $$

Step 1: Split into even and odd parts

  • Even-indexed sequence: $ [1, 3, 5, 7] $
  • Odd-indexed sequence: $ [2, 4, 6, 8] $

Step 2: Compute the FFT of each part recursively

Even part $ [1, 3, 5, 7] $:
  • Split again:

    • Even: $ [1, 5] $
    • Odd: $ [3, 7] $

    Compute the FFT of $ [1, 5] $ and $ [3, 7] $. Then combine.

Odd part $ [2, 4, 6, 8] $:
  • Split again:

    • Even: $ [2, 6] $
    • Odd: $ [4, 8] $

    Compute the FFT of $ [2, 6] $ and $ [4, 8] $. Then combine.

Step 3: Apply butterfly operations

At the lowest level, the FFT of sequences of length 1 is just the sequence itself. As you move up the recursion tree, you combine these using the butterfly operation. At each step, you use the twiddle factors to efficiently combine the results.

For example, the first butterfly step would look like:

$$ X_0 = x_0 + W_N^0 x_1, \quad X_1 = x_0 - W_N^0 x_1 $$ $$ X_2 = x_2 + W_N^1 x_3, \quad X_3 = x_2 - W_N^1 x_3 $$

… and so on, combining the DFTs of the smaller sequences at each level.

4. Computational Complexity

Each recursive step splits the problem into two smaller DFTs of size $ N/2 $, and at each level, the butterfly operation requires $ O(N) $ work. Since the number of recursion levels is $ \log_2 N $, the total complexity of the FFT algorithm is:

$$ O(N \log N) $$

This is a significant improvement over the $ O(N^2) $ complexity of the naive DFT approach, especially for large datasets.

Summary of Key Steps in the FFT:

  1. Divide: Split the input sequence into even and odd indexed parts.
  2. Conquer: Recursively compute the DFT of the smaller parts.
  3. Combine: Use the butterfly operation and twiddle factors to combine the results efficiently.
  4. Efficiency: The FFT reduces complexity to $ O(N \log N) $ by avoiding redundant calculations and reusing intermediate results.

Cooley-Tukey Radix-2 DIT FFT Algorithm

Input

  • A complex-valued sequence $x[n]$ of length $N$, where $N = 2^m$ for some integer $m$.

Output

  • The Discrete Fourier Transform (DFT) $X[k]$ of the input sequence.

Algorithm Steps

  1. Base case: If $N = 1$, return $X = x$.

  2. Divide the input sequence into two subsequences:

    • Even-indexed elements: $x_e[n] = x[2n]$ for $n = 0, 1, …, N/2 - 1$
    • Odd-indexed elements: $x_o[n] = x[2n + 1]$ for $n = 0, 1, …, N/2 - 1$
  3. Recursively compute the $N/2$-point DFT of $x_e[n]$ and $x_o[n]$:

    • $$X_e[k] = \text{FFT}(x_e[n])$$
    • $$X_o[k] = \text{FFT}(x_o[n])$$
  4. Combine the results using the butterfly operation: For $k = 0, 1, …, N/2 - 1$:

    $$ \begin{aligned} X[k] &= X_e[k] + W_N^k \cdot X_o[k] \\ X[k + N/2] &= X_e[k] - W_N^k \cdot X_o[k] \end{aligned} $$

    Where $W_N = e^{-2\pi i/N}$ is the twiddle factor.

Mathematical Formulation

The DFT is defined as:

$$ X[k] = \sum_{n=0}^{N-1} x[n] \cdot e^{-2\pi i kn/N}, \quad k = 0, 1, ..., N-1 $$

The FFT algorithm exploits the symmetry and periodicity of the complex exponential factors to reduce the computational complexity. By splitting the sum into even and odd indices, we get:

$$ \begin{aligned} X[k] &= \sum_{n=0}^{N/2-1} x[2n] \cdot e^{-2\pi i k(2n)/N} + \sum_{n=0}^{N/2-1} x[2n+1] \cdot e^{-2\pi i k(2n+1)/N} \\ &= \sum_{n=0}^{N/2-1} x_e[n] \cdot e^{-2\pi i kn/(N/2)} + W_N^k \sum_{n=0}^{N/2-1} x_o[n] \cdot e^{-2\pi i kn/(N/2)} \\ &= X_e[k] + W_N^k \cdot X_o[k] \end{aligned} $$

Complexity Analysis

  • Time complexity: $O(N \log N)$

    • The algorithm divides the problem into two subproblems of size $$N/2$$ at each step.
    • There are $\log_2(N)$ levels of recursion.
    • At each level, $N$ complex multiplications and additions are performed.
  • Space complexity: $O(N)$

    • The algorithm requires additional space for the recursive calls and temporary arrays.

Key Insights

  1. The algorithm exploits the periodicity of the twiddle factors:

    $$W_N^{k+N/2} = -W_N^k$$
  2. The butterfly operation allows for in-place computation, reducing memory requirements.

  3. The divide-and-conquer approach reduces the number of operations from $O(N^2)$ to $O(N \log N)$.

Matrix Form of the FFT

The Fast Fourier Transform (FFT) can be represented more abstractly as a factorization of matrices. This representation provides a powerful mathematical framework for understanding the algorithm’s structure and efficiency.

The Discrete Fourier Transform (DFT) can be expressed as a matrix multiplication:

$$X = F_N x$$

Where:

  • $X$ is the output vector of length N
  • $x$ is the input vector of length N
  • $F_N$ is the N × N DFT matrix

The elements of the DFT matrix $F_N$ are given by:

$$F_N[k,n] = W_N^{kn}$$

Where $W_N = e^{-2\pi i/N}$ is the Nth root of unity.

The key insight of the FFT algorithm is that this DFT matrix can be factored into a product of sparse matrices. For the radix-2 Cooley-Tukey FFT algorithm, the factorization takes the form:

$$F_N = P_N (F_{N/2} \oplus F_{N/2}) D_N (I_{N/2} \otimes F_2)$$

Where:

  • $P_N$ is a permutation matrix that performs the bit-reversal permutation
  • $F_{N/2}$ is the DFT matrix of size N/2
  • $\oplus$ denotes the direct sum of matrices
  • $D_N$ is a diagonal matrix of twiddle factors
  • $I_{N/2}$ is the identity matrix of size N/2
  • $\otimes$ denotes the Kronecker product
  • $F_2$ is the 2×2 DFT matrix

This factorization corresponds to the recursive structure of the FFT algorithm:

  1. The $(I_{N/2} \otimes F_2)$ term represents the butterfly operations at the lowest level.
  2. The $D_N$ term applies the twiddle factors.
  3. The $(F_{N/2} \oplus F_{N/2})$ term represents the recursive application of the FFT to the even and odd subsequences.
  4. The $P_N$ term reorders the results.

The power of this representation lies in its ability to explain the algorithm’s efficiency:

  1. Each factor is a sparse matrix, containing many zero elements.
  2. The number of non-zero elements in each factor is O(N).
  3. There are log₂(N) factors in the product.

This structure leads directly to the O(N log N) complexity of the FFT algorithm, as multiplying a vector by each sparse factor requires O(N) operations, and there are log₂(N) such multiplications.

Moreover, this matrix factorization view:

  • Provides a unified framework for understanding different FFT algorithms
  • Facilitates the development of parallel and vectorized implementations
  • Allows for a deeper mathematical analysis of the algorithm’s properties

In summary, representing the FFT as a matrix factorization not only provides an elegant mathematical description of the algorithm but also offers valuable insights into its structure, efficiency, and potential optimizations.