This page is a mirror of Tepples' nesdev forum mirror (URL TBD).

# Question about audio bit depth

I'm programming a basic audio synthesizer and I want it to sound authentic to the NES without actually writing an emulator.

So far I have the basics down pat for creating a WAV file and filling it with a square wav, changing the frequency, etc. I am creating WAV files at 44,100 hz, 16 bits, and they sound nice and chippy

To be authentic, however, I would like to generate 4 bit audio, since I understand that is the NES bit depth for most channels.

What I don't understand is how to quantize (am I using the correct term?) a square wave or other waveform into 4 bits.

My synthesizer will have a value like "50% volume" at any given point.

4 bits gives 16 different values. My first question is which position is silence?

My second question is how should I convert the wave form depending on which part of the duty cycle it is on? For example if the square wave is to be written out "low" at 50%, which might mean a value of 4, but on the "high" part of the duty cycle would it then be 12 or 13?

My last question is how would I up-convert the resulting 4-bit audio into a 16-bit WAV file?

I'm sorry if I'm using the wrong terminology. Does anyone understand what I'm looking to achieve?

Also for further info, when I am writing out my 16-bit audio I take the max value (65536), subtract 1 and divide by 2. So complete silence would be 32768, and there are 32767 positions above or below. I'm not even sure if I'm doing that correctly but that's how I've seen quantinization code online so I stole an example from somebody's c++ app.

Thanks in advance for any guidance.
Quantization is not a path to authenticity. Yes, at some stage of the NES audio path, the square/triangle units are a 4-bit digital signal, but that's not what they output.

The output of the chip is analog. The digital signals are converted to analog via individual DACs and the mixing is nonlinear. If you were to use a 4-bit quantization you would be ignoring that nonlinearity.

Also, silence is not really a particular position. Silence is what you get when the output voltage stops changing. If you're asking what is output by each channel when it's in a "silent" state, that's different on a per-channel basis (squares output 0-15 and the silent state is 0, the triangle is 0-15 and its silent state is to hold on its last value, etc.).

Technically, you could achieve greater authenticity by increasing your bit depth, not decreasing it, but for practical purposes 16-bit is a large amount of fidelity to work with. The more important factors are getting your emulation correct. The underlying digital signals are various bit-depths, and you must emulate them this way, but the output stage is a completely different beast (i.e. analog) and requires an emulation of the analog factors.
16-bit audio is signed short, not unsigned, so it has range -32767 to 32768 with 0 as 'silence', not 0 to 65536 with 32768 as 'silence'.

You square generator should always output either 0 or 1. It does not matter if it has variable duty or not. Then you need current volume of channel, which is 0 (silence) to 15 (max). You need a table that contains 16 values of actual output level of chip's DAC, you can find one in an emulator source. To apply volume to the square wave generator you just use 0 if the generato output is 0, or a value from the table that correspons to current volume if the generator output is 1.

Like, if(!output) sample=0; else sample=dac_table[volume];.
The main effect of the limited bit depth is to give only 16 volume levels for the pulse channels. The 4-bit DAC also causes the triangle channel to be stair-stepped (and in a very particular pattern that gives a 32x harmonic).

Also, resampling from 1.8MHz to 44kHz requires a higher bit depth than the source, so even if the NES only had a one channel linear 4-bit DAC, you'd need more than 4 bits at PC sample rates to properly represent it. In general, you can represent any bit depth as a one-bit stream at a much higher sampling rate.

Finally, look into audio resampling so you don't end up with a scratchy sound. Even linear interpolation is better than nothing.
I'm a little confused about how the NES channels work.

Currently I am writing out a 16bit WAV file playing notes that are generated by user input for values of volume and pitch, with a resolution of 60fps, which sounds like the NES when I do 1/60th arpeggios.

The part that I feel is not accurate is that the channels have a 16 bit resolution, so 65536 total values (I understand this goes negative when signed), when from my reading the real channels only has 16 steps.

Right now I take the volume and divide by 32767 in each direction. So for example 100% volume is 32767 and -32767. I never use 32768 so basically I throw one digit out (I can't tell the difference with such a high bit depth and I saw similar code from a c++ emulator do the same thing).

But for such a low bit depth like 4-bit if I just throw out one integer I know it will sound way off.

So, I'm just curious how the real NES hardware handles this as I'd like to mimick it as closely as possible without writing a full-fledge emulator because that's not really what I'm after.
What are you really trying to do? If you want something that just sounds like the NES, just treat the two pulse channels as 8-sample-long loops containing 0 and 15. When controlling for volume, quantize that to 4 bits, so that you only multiply this signal by 0/15 through 15/15.
For the triangle wave, it's just a 32-sample-long loop containing 0,1,2,3...14,15,15,14,13...2,1,0 and has no volume control. Scale that up until it's 1/5 of full scale or something.
We should ignore the noise channel until after you're happy with these first three.

The actual NES itself doesn't "produce 4 bit audio". It has a set of five separate DACs, one for each channel, four of which are 4-bit and one of which is 7-bit, all of which have different intrinsic volumes and are mixed in analog, followed by highpass and lowpass filters.
The NES runs at a ~1.8 MHz sampling rate. The pulse waves are basically on for n cycles, off for n cycles, where n determines the frequency. When on, they output volume/15.0, where volume is a whole number from 0 through 15 (only 16 distinct values). So n=10 and volume=1 outputs 10 samples of 1.0, 10 of 0.0, 10 of 1.0, etc.

Above the duty cycle is 50%; the NES also supports 25% and 12.5%. For 25%, have if on for n*3 samples and off for n samples. For 12%, on for n*7 samples and on for n samples. To normalize the frequencies across duties, use n*4,n*4 for 50%, n*6,n*2 for 25%, and n*7,n for 12.5%.
lidnariq wrote:
treat the two pulse channels as 8-sample-long loops containing 0 and 15. When controlling for volume, quantize that to 4 bits, so that you only multiply this signal by 0/15 through 15/15.

This is what I'm having a hard time understanding. Let me give you a specific example. The user has entered a "stop" command at frame 5 for pulse channel 1. Currently I would write out a "0" for 16-bit sample. From my understanding I would need to write out a "7" under 4-bit unsigned. No problem so far, the channel will be silenced.

Now on the next frame the user says they want a middle-c note played at 50% volume. Don't worry about how many steps I let the user pick from. Let's just say its half-volume to keep it simple. I have the same problem regardless of the resolution I use for volume.

Currently I would write out a 16384 for the first half of the duty cycle (a 50% duty), and a -16384 for the second half. I have also been able to change my code to support 25% duty cycles and it sounds great.

Now I want to change my code to render 4-bit audio (that I'll later up-sample to 16-bit audio but let's handle one problem at a time!)

I have no idea what to write out. If I use 0-6 for the first part of the duty cycle half volume would be 2.5, so I guess I'd use 3.

For the second half I'm not sure if I'd use 8-16 or 8-15. The first would have a different resolution (8 versus 7), the second would probably not sound right.

I lose one digit of resolution in my current algorithm (I never write out 32768 on the high cycle, I just cap it at 32767) but I don't notice it because my resolution is so high.

Now that I'm switching resolutions I'm not sure how to get the math to sound accurate.

Does this make any sense what I'm doing??
As someone else said, run your waveforms in whatever the output resolution is (16 bits), but first just quantize the user-supplied volume into 16 distinct levels (0/15, 1/15 ... 15/15) before you convert that to a 16-bit volume.

Also, the NES channels are all unsigned; they go from 0 to some magnitude in one direction only. Apply a high-pass filter after this to get the slow centering effect (you can't do it at the synthesis level as it affects how volume changes sound).
blargg wrote:
quantize the user-supplied volume into 16 distinct levels (0/15, 1/15 ... 15/15) before you convert that to a 16-bit volume

But I'm not going to ask the user to quantize it, I'm going to say something between max and min. Then *I* have to convert that using a range of numbers which isn't 0-15 but actually half of that, right? And I'm saying it's different based on which part of the duty cycle it is, right?
This picture might illustrate the problem:

The volume isn't 0-15. That's the output resolution for the entire waveform, right? The volume would shrink both the top AND bottom of the waveform, right?
If you want accurate sound you shouldn't assume that volume acts symmetrically, no.

You should check documentation (or experiment with the hardware) to determine what the hardware does. In general the NES sound hardware does does not apply volume in a symmetrical way. The signals go from 0 to positive only.
A NES pulse wave only has two states: 0 or the current volume (0 to 15). So quantizing the volume works just as well. If the NES had sine waves output on a 4-bit DAC, then yeah, quantizing the volume parameter wouldn't be sufficient.

The demo_chip.c example in blip_buf has some very concise pulse, triangle, and noise oscillators that are true to the NES without being cumbersome. I modified the main() in that example to play a falling pitch on the pulse wave:

Code:
int main( void )
{
init_sound();
wave_open( sample_rate, "out.wav" );

write_chan( 0, 0, timbre, 4 ); // 50%
float p = 200; // really half-period
float v = 1.0;

// generate 4 seconds of sound
while ( wave_sample_count() < 4 * sample_rate )
{
int time = 0; // writes at beginning of frames

// slow adjust period and volume
p *= 1.01;
v *= 0.99;

// 0 = first pulse wave
write_chan( time, 0, period, (int) p );

// quantize volume (it's scaled later by gain to the 16-bit range)
write_chan( time, 0, volume, (int) (v * 15) );

end_frame( clock_rate / 60 ); // 1/60-second frames
}
wave_close();
blip_delete( blip );

return 0;
}