I’ve always thought that simply bit reducing a sample was enough to give it that “2a03/7” sound, and Skrasoft has done some digging to why that is so.
The Nintendo NES designers didn’t care much about audio fidelity, but did want some type of digital audio playback. You’ve probably played a Nintendo game at some point that warned you, through a wall of half-intelligible fuzz, to “skate or die die die die” or “double dibl.” It took special audio encoding to sound so terrible.
Instead of encoding the volume of each point in time, many NES games stored a sequence of volume differences. It’s a handy format known as Differential PCM. If you have audio data that looks like, in PCM form:
1, 2, 4, 6, 3, 2
in DPCM form it becomes
+1, +1, +2, +2, -3,-1
To get the original data back you must add each term in the DPCM. As a breakdown:
n[0] = 1
n[1] = n[0] + 1 = 2
n[2] = n[1] + 2 = 4
n[3] = n[2] + 2 = 6…
We get our original data back. The advantage here is that instead of caring about the largest value (6), the largest *difference* (3) is what matters. Most audio signals have relatively small differences compared to their highest and lowest values, so a high compression ratio is possible. Instead of needing a whole byte per sample, you could likely get away with a nibble. Of course, the Nintendo didn’t have that many bits to waste! It used 1-bit DPCM. That same PCM stream of numbers, converted to 1-bit DPCM and back, goes like this:
DPCM: +1, +1, +1, +1, -1, -1
PCM: 1, 2, 3, 4, 3, 2
That’s not what we started with! This creates a distortion different from traditional bit-crushing. It has the effect of increasing noise and filtering the signal at the same time. Essentially the 1-bit DPCM format “chases” the incoming audio. High frequencies end up distorted into triangle waves.
Check out the rest of the discussion with graphs and audio examples on his blog