This page is a mirror of Tepples' nesdev forum mirror (URL TBD).

# Efficient number scaling?

This specifically is for APU volumes, but in the end, this is general 6502 and math, so that's why I'm posting it here instead of the music forum.

Basically, I want to be able to scale volumes linearly.

For example, at scale 0 (full scale, i.e. identity), we have:
[code]0123456789ABCDEF[/acode]

At scale "F" (silent), we have:
[code]000000000000000[/code]

At scale "E", I would like:
[code]011111111111111[/code]

so at scale 7, I want:
[code]0112233445566778[/code]

The method I currently have mapped out for this task is through integration (to keep the table small). First, if the volume is 0, we just leave it at 0. Second, if scale is 0, we leave the volume untouched. That means, I have a table with 15 entries, each entry having 15 bits (plus an extra bit for padding), so that's a 30 byte LUT.

So if I have volume V, and I want to scale V to scale 7, I look at entry 7 at my LUT, which is:
[code]010101010101010x - x being the padding[/code]

Now I need to do my integration, and I'll do it like this:

[code]Copy LUT entry to scratch memory
OUT = V
Do {
ASL copy of LUT entry
If (Carry)
Subtract 1 from OUT
Loop (V-1) Times
OUT = Volume[/code]

This gives me the result I want, but in the worst case scenario, this code will be running 3 times (squares + noise) per frame, and that loop will have up to 14 iterations. Yes, I do realize there are ways to optimize the pseudo code for 6502, but I want to know if there's a more efficient way to scale my volumes.

I realize I can just simply subtract the scaler from the input volume, but that's not linear, and the channel volumes will be out of proportion with each other nearly all the time.

Is there a better way to do this, or am I being too confusing?
If you don't mind a 256-byte lookup table you can just do 4bit * 4bit multiplication and shift down:

Code:
4x4_table:
.byte 0*0, 0*1, 0*2, ... 0*15
.byte 1*0, 1*1, 1*1 ... 1*15

lda volume
asl a
asl a
asl a
asl a
ora scale
tax
lda 4x4_table,x
lsr a
lsr a
lsr a
lsr a

; a is now your new volume

you can remove the LSRs if you make the table specific to volumes instead of a general 4x4 multiplication table.

And you can get rid of the ASLs if you make another 16-byte shifter table:

Code:
table:
.byte 0*0 >> 4, ...
.byte 1*0 >> 4, ...

shifttable:
.byte 0<<4, 1<<4, 2<<4, ...

ldx volume
lda shifttable,x
ora scale
tax
lda table,x

; a is now your volume
That's a good method, but I would like to avoid large LUTs if I can.
You can do n-bit by n-bit multiplication using a 2^(n + 1)-entry lookup table. Start with some algebra:

a^2 + b^2 + 2ab = (a + b)^2
2ab = (a + b)^2 - a^2 - b^2
ab = (a + b)^2/2 - a^2/2 - b^2/2

So you'd have a table of (a^2)/2 for a = 0 to 30, and then compute (a + b)^2, a^2, and b^2. It could be as easy as this (untested):
Code:
scale_volume:
clc
lda env_volume
tax
lda squared,x
sec
ldx env_volume
sbc squared,x
ldx note_volume
sbc squared,x

; divide result by 16 and round up
lsr a
lsr a
lsr a
lsr a
rts
How about a simple 4-bit fixed-point multiply?
Code:
lda note_volume
lsr env_volume
bcs :+
lsr a
:   lsr env_volume
bcc :+
:   lsr a
lsr env_volume
bcc :+
:   lsr a
lsr env_volume
bcc :+
:   lsr a

Looks to work fine:
Code:
0123456789ABCDEF
----------------
0|0000000000000000
1|0000000011111111
2|0000011111222222
3|0000111122223333
4|0001112223334444
5|0011122233444555
6|0011223334455666
7|0011223344556677
8|0112233455667788
9|0112334456678899
A|01223445667889AA
B|0122345567889ABB
C|012344567889ABCC
D|01234566789ABCDD
E|0123456789ABCDEE
F|0123456789ABCDEF

Edit: whoops, didn't read your specs closely. You'd have to invert one of them, since you use an attenuation rather than gain, and add some kind of rounding so that anything non-zero comes out as non-zero unless fully attenuated.
Thanks Blargg! I just tried the multiplication like you said (I knew that's how integer multiplication worked (shifting and adding), but I didn't know you could use a variation for decimals too), and it works like a charm. I went with your solution because it didn't need any LUTs, and looked uncostly enough to get away without one.

I think I'm going to throw out the rounding; the result sounds good enough without it, and it would add extra cycles without providing much of a significant impact.

I also made sure to have the routine just simply be skipped if the note volume is 0 or F (0 using a constant instead, F just leaving the value untouched)

It definitely sounds a whole lot better than the simple subtraction-based method I was using before! 