SPACEKITCAT

Fast Fourier Transform Notes

Lisa Burton — Sat, 02 Jan 2021 21:03:00 GMT

A signal can contain multiple periodic functions with different amplitudes and frequencies. The Fourier Transform is able to decompose a signal down into the individual sine functions that constitute the overall signal.

Let's say your signal is a 262hz (or middle C) sine wave, encoded as a Pulse Code Modulation (PCM) stream with a sample rate of 44100 samples per second. According to the Nyquist–Shannon sampling theorem, a signal can represent a frequency range of half of the sampling rate. A PCM sample rate of 44100 samples per second translates to a frequency range between 0-22050hz.

A DFT configured with a window size of 256 means it will operate on a block of 256 chronologically ordered signal samples. The maximum resolution of the DFT is dictated by the window size and the frequency range of the input signal. This DFT operating on the PCM signal from the previous paragraph would have a resolution of $\frac{44100}{256} = 172.266hz$ and would detect a dominant frequency range between 172.266hz and 344.531hz.

The Discrete Fourier Transform (DFT) is defined by the following equation.

\begin{equation}
X_{k} = \sum_{n=0}^N x_{n} e^{\frac{-2 i \pi n k}{N}}
\end{equation}

$X_{k}$ represents the fourier transform for frequency bin $k$, where $0 \le k \le N$ and each transform for k represents a slice of the frequency spectrum from $k \cdot \frac{y}{N}$ to $k \cdot \frac{y}{N} + \frac{y}{N}$, where $y$ represents the original sample rate of the signal and $N$ represents the window size of the DFT.

n.b. Raising e to an imaginary number represents Euler's formula, i.e. $e^{ix} = cos(x) - isin(x)$. If using a language without built-in complex number support, you would calculate the real and imaginary components separately by passing $\frac{-2 \pi n k}{N}$ into the cosine and sine functions, respectively. Complex numbers are denoted by the letter j in Python, so np.exp(1j * np.pi) is the same as cos(π) - isin(π).

Surfing On Sine Waves

The code below generates a 262hz sine wave represented as a 1 second long, 16-bit floating point PCM stream with a sample rate of 44100 samples per second. The generated graph only shows the time domain of the signal from 0 up to windows_size because this is the signal subset the subsequent Discrete Fourier Transforms will operate on. All of the subsequent code samples depend on this one.

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

window_size = 256 # Experiment with this value, it should always be a power of two.
sample_rate = 44100
frequency = 262
seconds = 1
sample_count = sample_rate * seconds
pcm_stream_16bit = list(map(lambda x : np.float16(0.8 * np.sin(2.0 * np.pi * 
frequency * x / sample_rate)), np.arange(sample_count)))
s = pd.Series(pcm_stream_16bit[:window_size], index=np.arange(sample_count)[:window_size])

s.plot.line(figsize=(20,10))
plt.xlabel('Time Domain')
plt.ylabel('Amplitude')
plt.title(f'{frequency}hz sine wave')
plt.show()

step = sample_rate / window_size
print(f'Minimum unit size: {step}hz\n')

pcm = pcm_stream_16bit[:window_size]
pivot = len(pcm) // 2

The first 256 samples (out of 44100 per second) of a PCM signal containing a 262hz Sine wave. It has two peaks, one at around the 42nd sample and one at around the 214th sample. It dips at the centre point between the two peaks and moves smoothly between the three points.

Brute Force

The next code snippet demonstrates how a brute force DFT on samples 0 to window_size from the PCM stream works. It outputs the sum of each frequency bin and determines which one is dominant. The outer loop has one iteration for each frequency bin / k-bin, the inner loop has one iteration for each PCM sample.

As you can see from the code, we only care about the first half of the DFT for figuring out the dominant frequency because when the DFT input is a sequence of real numbers (instead of complex), the output will have conjugate symmetry. In practical terms, the real valued DFT output mirrors the first half of the output sequence in the second half.

import numpy as np

column_width = 45
pcm_column_width = 20

iterations = 0
largest = 0j
largest_index = 0

for k in range(0, len(pcm)):
    frequency_bin = [0j] * len(pcm)
    for i in range(0, len(pcm)):
        exponent = np.exp(-2.0j * np.pi * i * k / len(pcm))
        result = pcm[i] * exponent
        iterations = iterations + 1
        frequency_bin[i] = frequency_bin[i] + result

    total = sum(frequency_bin)
    print(f'TOTAL (FREQUENCY BIN / K-BIN {k}): {total}')
    
    # We only care about the first half of the DFT because
    # of Conjugate symmetry, as explained earlier.
    if total > largest and k < len(pcm) // 2:
        largest = total
        largest_index = k

print(f'\nMinimum unit size: {step}hz\n')        
print(f'Iterations: {iterations}')
print(f'Largest: {largest} {largest_index}')
print(f'hz: {largest_index * step} - {(largest_index * step) + step}')

Perfect Specimen

Numpy provides a production level Fast Fourier Transform (FFT), it's a good point of reference.

print(list(np.fft.fft(pcm, len(pcm))))

The outputted values are virtually the same, although rounding can make them appear less alike than they really are. The specifics of rounding complex floating point numbers are a rabbit hole for another day! If you doubt what I'm saying, stick the outputs on a graph.

Cooley-Tukey For Programmers

I'm going to walk through how the Cooley-Tukey FFT equation is derived from the DFT equation (shown earlier) using code examples to provide a sort of "inductive walkthrough for programmers". Each snippet depends on the first code snippet and I'd recommend running them all locally and comparing them to get the most from this guide.

The DFT can be calculated from the sum of two smaller DFTs. If you perform a DFT on the even and odd parts of the input, for example, you end up doing two $\frac{N}{2}$ sized FFTs.

\begin{equation}
X_{k} = \sum_{n=0}^N x_{n} e^{\frac{-2 i \pi n k}{N}} = \sum_{n=0}^{N/2} x_{2 n} e^{\frac{-2 i \pi (2 n) k}{N}} + \sum_{n=0}^{N/2} x_{2 n + 1} e^{\frac{-2 i \pi (2 n + 1) k}{N}}
\end{equation}

The code below demonstrates that the equality relationship in the above formula is true.

iterations = 0
transform_vector = [0j] * len(pcm)
for k in range(0, len(pcm)):
    for m in range(0, pivot):
        iterations = iterations + 1
        transform_vector[k] = transform_vector[k] + (pcm[2 * m] * np.exp(-2j * np.pi * (2 * m) * k / len(pcm)))

for k in range(0, len(pcm)):
    for m in range(0, pivot):
        iterations = iterations + 1
        transform_vector[k] = transform_vector[k] + (pcm[2 * m + 1] * np.exp(-2j * np.pi * (2 * m + 1) * k / len(pcm)))

print(f'Iterations: {iterations}\n')
print(list(transform_vector))

The even and odd side of the equation are almost identical, the only difference is that the odd DFT exponent has $2n + 1$ instead of $2n$. If we could make both exponents identical, we would be able to calcualte $e^{\frac{-2 \pi i (2 n) k}{N}}$ once for both sides of the equation.

Fortunately, all we have to do is factor a common multiplier out of the odd side of the equation.

\begin{equation}
X_{k} = \sum_{n=0}^{N/2} x_{2 n} e^{\frac{-2 i \pi n k}{N/2}} + e^{\frac{-2 i \pi k}{N}} \sum_{n=0}^{N/2} x_{2 n + 1} e^{\frac{-2 i \pi n k}{N/2}}
\end{equation}

The odd DFT can now reuse the exponent from the even DFT, we just have to multiply it by the common multiplier. The rearranged formula reduces the total number of calculations required for the full DFT and the number of iterations is also easier to reduce from $N^2$ to $\frac{N^2}{2}$. The order of complexity is still $O(N^2)$.

The code below implements the rearranged equation.

iterations = 0
even_transform_vector = [0j] * len(pcm)
odd_transform_vector = [0j] * len(pcm)
for k in range(0, len(pcm)):
    for m in range(0, pivot):
        iterations = iterations + 1
        e = np.exp(-2j * np.pi * m * k / pivot)
        even_transform_vector[k] = even_transform_vector[k] + (pcm[2 * m] * e)
        odd_transform_vector[k] = odd_transform_vector[k] + (pcm[2 * m + 1] * e)
    odd_transform_vector[k] = odd_transform_vector[k] * np.exp(-2j * np.pi * k / len(pcm)) 

print(f'Iterations: {iterations}\n')
print(list(np.add(even_transform_vector, odd_transform_vector)))

We can calculate $X_{k+\frac{N}{2}}$ using the equation for $X_{k}$ if we add $\frac{N}{2}$ to $k$.

\begin{equation}
X_{k+\frac{N}{2}} = \sum_{n=0}^{N/2} x_{2 n} e^{\frac{-2 i \pi n (k+\frac{N}{2})}{N/2}} + e^{\frac{-2 i \pi (k+\frac{N}{2})}{N}} \sum_{n=0}^{N/2} x_{2 n + 1} e^{\frac{-2 i \pi n (k+\frac{N}{2})}{N/2}}
\end{equation}

This can halve the number of iterations of the k-loop because we can calculate $X_{k+\frac{N}{2}}$ while we're calculating $X_{k}$

The implementation below demonstrates that the iteration count is a quarter of what is was in the previous step.

iterations = 0
even_transform_vector = [0j] * len(pcm)
odd_transform_vector = [0j] * len(pcm)
for k in range(0, pivot):
    for m in range(0, pivot):
        iterations = iterations + 1
        
        # X(k)         
        e = np.exp(-2j * np.pi * m * k / pivot)
        even_transform_vector[k] = even_transform_vector[k] + (pcm[2 * m] * e)
        odd_transform_vector[k] = odd_transform_vector[k] + (pcm[2 * m + 1] * e)

        # X(k + N/2)
        e = np.exp(-2j * np.pi * m * (k + pivot) / pivot)
        even_transform_vector[k + pivot] = even_transform_vector[k + pivot] + (pcm[2 * m] * e)
        odd_transform_vector[k + pivot] = odd_transform_vector[k + pivot] + (pcm[2 * m + 1] * e)
        
    # X(k)
    odd_transform_vector[k] = odd_transform_vector[k] * np.exp(-2j * np.pi * k / len(pcm)) 
    
    # X(k + N/2)
    odd_transform_vector[k + pivot] = odd_transform_vector[k + pivot] * np.exp(-2j * np.pi * (k + pivot) / len(pcm)) 
    
print(f'Iterations: {iterations}\n')
print(list(np.add(even_transform_vector, odd_transform_vector)))

The only real difference between $X_{k}$ and $X_{k+\frac{N}{2}}$ is that the latter adds $\frac{N}{2}$ to k. If we could make them more alike, we could calculate one exponent and use it twice.

Fortunately, we can factor $\frac{N}{2}$ out of the $k+\frac{N}{2}$ side by applying the common multiplier, $e^{-2 \pi i m}$.

\begin{equation}
X_{k+\frac{N}{2}} = \sum_{n=0}^{N/2} x_{2 n} e^{\frac{-2i \pi n k}{N/2}} e^{-2i \pi m} +
e^{\frac{-2 i \pi k}{N}} e^{-\pi i} \sum_{n=0}^{N/2} x_{2 n + 1} e^{\frac{-2 i \pi n k}{N/2}} e^{-2i \pi m}
\end{equation}

The code snippet below demonstrates an implementation of the rearranged equation.

iterations = 0
even_transform_vector = [0j] * len(pcm)
odd_transform_vector = [0j] * len(pcm)
for k in range(0, pivot):
    for m in range(0, pivot):
        iterations = iterations + 1
        
        # X(k)         
        e1 = np.exp(-2j * np.pi * m * k / pivot)
        even_transform_vector[k] = even_transform_vector[k] + (pcm[2 * m] * e1)
        odd_transform_vector[k] = odd_transform_vector[k] + (pcm[2 * m + 1] * e1)

        # X(k + N/2)
        e2 = e1 * np.exp(-2j * np.pi * m)
        even_transform_vector[k + pivot] = even_transform_vector[k + pivot] + (pcm[2 * m] * e2)
        odd_transform_vector[k + pivot] = odd_transform_vector[k + pivot] + (pcm[2 * m + 1] * e2)
        
    # X(k)
    e3 = np.exp(-2j * np.pi * k / len(pcm))
    odd_transform_vector[k] = odd_transform_vector[k] * e3
    
    # X(k + N/2)
    e4 = e3 * np.exp(np.pi * 1j)
    odd_transform_vector[k + pivot] = odd_transform_vector[k + pivot] * e4
    
print(f'Iterations: {iterations}\n')
print(list(np.add(even_transform_vector, odd_transform_vector)))

The snippet below demonstrates why $e^{\frac{-2i \pi n k}{N/2}} e^{-2i \pi m}$ could be replaced by $e^{\frac{-2i \pi n k}{N/2}}$.

sum_one = 0j
sum_two = 0j
for k in range(0, len(pcm)):
    for n in range(0, len(pcm)):
        sum_one = sum_one + np.exp(-2j * np.pi * n * k / pivot)
        sum_two = sum_two + np.exp(-2j * np.pi * n * k / pivot) * np.exp(-2j * np.pi * n)
        
print(round(sum_one, 10))
print(round(sum_two, 10))
print(round(sum_two - sum_one, 10))

So, based on this, we can simply the equation further like so.

\begin{equation}
X_{k+\frac{N}{2}} = \sum_{n=0}^{N/2} x_{2 n} e^{\frac{-2i \pi n k}{N/2}} +
e^{\frac{-2 i \pi k}{N}} e^{-\pi i} \sum_{n=0}^{N/2} x_{2 n + 1} e^{\frac{-2 i \pi n k}{N/2}}
\end{equation}

The snippet below demonstrates why $e^{\frac{-2i \pi k}{N}} e^{-\pi i}$ could be replaced by $-e^{\frac{-2i \pi n k}{N}}$

sum_one = 0j
sum_two = 0j
for k in range(0, pivot):
    sum_one = sum_one + np.exp(-2j * np.pi * k / len(pcm))
    sum_two = sum_two - np.exp(-2j * np.pi * k / len(pcm)) * np.exp(-1j * np.pi)
        
print(sum_one)
print(sum_two)

The final equation is as follows and it essentially describes the butterfly operation for recombining the sub-transforms.

\begin{equation}
X_{k+\frac{N}{2}} = \sum_{n=0}^{N/2} x_{2 n} e^{\frac{-2i \pi n k}{N/2}} -
e^{\frac{-2 i \pi k}{N}} \sum_{n=0}^{N/2} x_{2 n + 1} e^{\frac{-2 i \pi n k}{N/2}}
\end{equation}

The code below demonstrates the basic (poor performance) idea of the above equation. The code snippet after it is a recursive implementation with an order of complexity of $O(n log(n)$

iterations = 0
even_transform_vector = [0j] * len(pcm)
odd_transform_vector = [0j] * len(pcm)
for k in range(0, pivot):
    for m in range(0, pivot):
        iterations = iterations + 1
        
        # X(k)         
        e1 = np.exp(-2j * np.pi * m * k / pivot)
        even_transform_vector[k] = even_transform_vector[k] + (pcm[2 * m] * e1)
        odd_transform_vector[k] = odd_transform_vector[k] + (pcm[2 * m + 1] * e1)


transform = [0j] * len(pcm)
for k in range(0, pivot):
    iterations = iterations + 1
    e = np.exp(-2j * np.pi * k / len(pcm))
    transform[k] = even_transform_vector[k] + e * odd_transform_vector[k]
    transform[k + pivot] = even_transform_vector[k] - e * odd_transform_vector[k]

    
print(f'Iterations: {iterations}\n')
print(list(transform))

def fft(sample_set):
  if len(sample_set) == 1:
    return sample_set

  size = len(sample_set)
  even_fft = fft(sample_set[0::2])
  odd_fft = fft(sample_set[1::2])

  freq_bins = [0.0] * size
  for k in range(0, int(size / 2)):
    freq_bins[k] = even_fft[k]
    freq_bins[k + int(size / 2)] = odd_fft[k]

  for k in range(0, int(size / 2)):
    e = np.exp(-2j * np.pi * k / size)
    t = freq_bins[k]
    freq_bins[k] = t + e * freq_bins[k + int(size / 2)]
    freq_bins[k + int(size / 2)] = t - e * freq_bins[k + int(size / 2)]

  return freq_bins

print(fft(pcm))

Extra Resources

Here's a list of resources that might help you get unstuck if some my post is unclear. Some of the resources are much more in depth and I'm still learning a lot from them.

Interactive Guide To The Fourier Transform
Fourier Series Intro
Cooley-Tukey Fast Fourier Transform
Butterfly diagram
Fourier Analysis by Georgi P. Tolstov
Nyquist-Shannon Sampling Theorem
Pulse Code Modulation
Euler's Formula
Complex Number Intro
Introduction to Imaginary Numbers
De Moivre Formula

Web Audio clock demo

Lisa Burton — Mon, 10 Feb 2020 19:44:00 GMT

This is a little demo web app that uses the Web Audio clock to correctly control the timing of audio playback. I'm contemplating the idea of implementing a web based drum machine and this is background research.

The app starts by fetching a JSON manifest file (from S3). The manifest tells the app where to download a big list of drum samples from. This initialisation is hidden behind a loading screen. The manifest fetch request explicitly busts client side caching, while the sample fetch requests force client side caching.

The running app looks like a Windows 98 pop up dialog and when you click the button, it plays a short string of random TB-808 samples, equally spaced with perfect time.

The importance of the audio clock is that it allows you to schedule the playback of audio for some time in the future exactly while using setInterval or setTimeout simply isn't accurate enough to do audio playback for something like a drum machine.

http://static.spacekitcat.com/wbdm3/webapp/index.html

spacekitcat/wbdm-proto-3

Little experiment with the Web Audio clock. Contribute to spacekitcat/wbdm-proto-3 development by creating an account on GitHub.

spacekitcatGitHub

Source code

Conway's game of life

Lisa Burton — Thu, 08 Aug 2019 16:43:00 GMT

Edit (2021): A better solution: https://github.com/spacekitcat/convolve2d-conways-game-of-life/blob/main/main.py

This is a fairly simple web based implementation of Conway's game of life. I wrote it as a Kata and for interview preparation and thought the result was kinda neat.

It starts randomised, sometimes it will halt almost immediately, other times it will seem to go on forever. That's the point of Conway's game of life in my view. It should behave different each time you reload.

The rendering could be much faster with better layout fluidity. There isn't a Git repository for this as such, it's a quick and dirty bit of JS hacked onto my legacy site.

NES (6502) Assembly Notes

Lisa Burton — Thu, 01 Aug 2019 16:39:00 GMT

The following Gist demonstrates how to create a 16-bit counter using 6502 assembly. It increments the Least Significant Byte (LSB) and the Most Significant Byte (MSB) separately in such a way that the MSB is incremented once for every 256 increments of the LSB, totalling 65536 iterations.

The following Gist demonstrates how to get the 8-bit Nintendo (NES) to render a 4 part sprite of /sprites/ with 6502 pure assembly. I generally shadow SPR-RAM in general purpose RAM from address $0200 and when the VBlank NMI occurs, I ask the Direct Memory Access controller at $4014 to transfer RAM to SPR-RAM.

You cannot write to the Picture Processing Unit’s internal RAM when it’s busy working (unless you want visible rendering glitches), you have to wait for the V-Blank Non-Maskable Interrupt (NMI); V-Blank means that the last scan-line for the current frame has been drawn to the screen and that the PPU will be idle for approximately 2250 (exact number varies depending on the exact NES model) CPU cycles.

LZ77

Lisa Burton — Mon, 08 Apr 2019 21:32:00 GMT

LZ77 is a lossless, streaming compression algorithm invented in 1977. The central concept is its use of a Sliding Window, which is a fixed size, sequentially ordered buffer; sometimes referred to as the history buffer. The LZ77 compression algorithm starts by reading in a string of bytes (from a stream) the size of the sliding window, it then checks if the history buffer contains an identical sequence of bytes. With a match, the algorithm reads the next byte from the stream and uses it as the token field, together with the position and length of the matching byte block (in the history buffer) as the prefix field; without a match, the algorithm will check each substring that always begins at byte zero in the byte stream to see if any of those match. If none of the substrings match, byte zero (from the byte stream) becomes the token field in the next output compression packet. (the prefix is empty, which might be expressed as null)

Decompression starts by checking the prefix field of the current packet and if it finds an entry, it reads the indicated number of bytes (starting from the position indicated). Any prefix byte string is outputted, followed by token immediately after it.

Because it's only the tokens that get pushed onto the history buffer, you guarantee that the buffer content will be identical whether its generating (compression) or consuming (decompression) the packet.

I've implemented a small demonstration of the compression process here, I haven't bothered with decompression because it's super easy to implement. (I did implement decompression in this prototype here, but just know I consider it to be a low quality prototype).

The first git repository implements the sliding window via a custom library (I created) called tiny-toatie-cache as its storage engine. The second repository is a vanilla implementation of LZ77.

tiny-toatie-cache (TTC) is a fixed size storage engine and it allows you to append new bytes to the front and search for specific byte strings. When the internal storage is full, it deletes the oldest items first when it has to make space for new bytes at the front. The find method proxies requests through its internal caching system. The cache is designed to remember the offset and length of matching byte strings it previously found in its internal storage (which behaves exactly like a sliding window). The offsets of cached records are automatically corrected when new bytes push the buffer contents back. It also transparently invalidates cache records when offsets "fall off the end" of the buffer to make space for new bytes at the front.

The caching idea turned out to be a bit of a bust because you don't get too many duplicate prefixes, unless it's a really low entropy source file or something like that (say, if you used dd to generate a zeroed file). Consider this input the the the the the and the output packets:

Token	Prefix
't'	null
'h'	null
'e'	null
' '	null
't'	'the '
'h'	'he t'
'e'	'e th'

If you look at the last three packets, it gives an idea of how the prefixes go through a sort of 'phase shift' of sorts. This leads to very frequent cache misses on prefix search caching. In retrospect, it seems so obvious, but doesn't it always?

The table below is the verbatim output of lz77-nodejs-streams

{ token: , prefix: null }
{ token: , prefix: null }
{ token: , prefix: null }
{ token: , prefix: null }
{
  token: ,
  prefix: { offset: 3, value: , length: 4 }
}
{
  token: ,
  prefix: { offset: 3, length: 4, value:  }
}
{
  token: ,
  prefix: { offset: 3, length: 4, value:  }
}

Verbatim packet generation from compressing 'the the the the the' through lz77-nodejs-streams

Hacker Text JS

Lisa Burton — Tue, 15 May 2018 16:13:00 GMT

HackerTextJS is a small library that renders widgets like the three examples below:

It can render multiple widgets, each with a different frame rate, size, text, renderer strategy and text filter. In an HTML document, you define an object called hacker_text_config in the head section. The hacker text config object has the property targets, an array of widget descriptor objects. Each widget descriptor specifies at a minimum the htmlId (the ID of the target element), text (the display text) and rows (the number of rows to render). Below is an example of what the configuration object might look like for rendering three elements.

After the hacker_text_config object has been defined, hackertext.js can be imported. It will only start initialisation if it can find this config descriptor.

For each widget, Hackertext.js calculates the maximum character length of each line for the current style sheet, so if this is 116, the specified rows field is 15 and the framerate is 10, it will fill the element with 1740 (116 times 15) characters 10 times per second. The renderer sets up a circular iterator over the specified text for each widget. Frames are built sequentially, one character at a time. For each character it can choose to either take the next character from the iterator or output a noise character. The probability is controlled by a dynamically determined noise ratio. When the RandomizedFrameRenderStrategy renderer strategy is used without any parameters, the noise ratio is fixed at 50%; when the SinePhaseFrameRenderStrategy, the noise ratio is controlled by a fixed frequency sine wave. There are other settings, but the documentation covers it adequately I think.

HTML elements aren't really designed to operate like this and I was fully aware of this at the time, but I wanted to know if you could reliably use JS to completely fill a div with text in such a way that it didn't overflow or underflow and to then make it animate. The first widget uses a sine wave renderer, the middle widget uses a completely randomised renderer and the third widget uses a cosine renderer.

At runtime it injects a hidden element (no margin or padding) with a test string. It uses the width of the hidden element to calculate the average width of a character, then it looks at the width of the target element for each widget, dividing it by the width of a character to work out the char count for each line.

It also watches for window resize events, so that it can dynamically adjust the number of characters the renderer spits out for each line.