This page is a mirror of Tepples' nesdev forum mirror (URL TBD).

# Assembly routine help wanted: fixed-point 16x16 multiply

In our game, I am currently going with a 12.4 fixed point number format for position and velocity. (I've been thinking of 16.8 for speed of fetching and better resolution but I figure I can save memory for now and upgrade it later)

How would we do a fixed-point multiply on 6502? I figure the general procedure would be to multiply 16x16 with a 32-bit result, and then throw away the bottom 2 bytes. But there should be a more efficient way on 6502, right? This puzzle is a little too advanced for me at the moment so I'm hoping to get some additional heads in on this.

I'm separating the LSB and MSB by a constant, FIXP_STRIDE. Game object properties are stored in interleaved arrays indexed by X/Y

edit: actually, a 16x7 (7 bit signed scalar, both with 4-bit fractional portion) multiply would probably be sufficient? i would be using it for accel/decel, angles to things, distances between things...
On the bright side, bit shifts of multiples of 8 are free. On the downside, if you skip any lower bits, you risk being off by however many carry bits there would be.

There are tricks you can do if you've got shorter hardware-accelerated multiplication (e.g. 8×8→16), but the 6502 doesn't usually give you that.

In practice, I fear just doing long binary multiplication is the best you're going to manage.
Every time I had to do this I used the old "shift and add + throw away lower byte(s)" method. The only way I know to actually speed up multiplications is to use the method explained here. The idea is to build a table using this formula: f(x) = x^2 / 4 and then do f(a+b) - f(a-b) to calculate the result, so instead of the multiplication you do 1 addition and 1 subtraction, 2 table lookups, and finally one more subtraction.
If you can spare having 2KB worth of tables, check this out for some "seriously fast multiplications":

http://codebase64.org/doku.php?id=base: ... iplication

The routines will need modifications (and will be slower) if you don't have WRAM because they use self modifying code.
tokumaru wrote:
Every time I had to do this I used the old "shift and add + throw away lower byte(s)" method. The only way I know to actually speed up multiplications is to use the method explained here. The idea is to build a table using this formula: f(x) = x^2 / 4 and then do f(a+b) - f(a-b) to calculate the result, so instead of the multiplication you do 1 addition and 1 subtraction, 2 table lookups, and finally one more subtraction.

This looks promising, just 2KB seems like a lot to give up. We are currently well on our way to filling up a NROM-256.

Starting to think I'll see how far we can get before needing multiplication and then maybe go down that road you posted. I knew math on 6502 would be tricky, but I have a feeling I'm in over my head! I can code just fine with algebraic math, I just hate coding low level binary math...
sonder wrote:
We are currently well on our way to filling up a NROM-256.

Yeah, when you only have 32KB of PRG-ROM, you shouldn't use a lot of look-up tables.

Quote:
I knew math on 6502 would be tricky, but I have a feeling I'm in over my head! I can code just fine with algebraic math, I just hate coding low level binary math...

Most games made for old consoles don't have such complex physics, most of it can be done with additions and subtractions...
tokumaru wrote:
Yeah, when you only have 32KB of PRG-ROM, you shouldn't use a lot of look-up tables.

We are going to upgrade to UnROM. Maybe. I flip flop a lot internally on what I want the scope of this project to be. We are in that awkward stage between deciding what to do and experimenting with actual code and discovering the actual gameplay through that.

Quote:
Most games made for old consoles don't have such complex physics, most of it can be done with additions and subtractions...

This is what I'm thinking for now. We might be able to get by.

Still it would be neat to come up with a custom multiply for 12.4 x 3.4 ... i have a penchant for optimization For finding distances, angles, and the like, you can use 8x8 multiplies and divides only on the integer part of the pixel coordinates. That's what Thwaite does when aiming missiles. In PAL mode, it also occasionally uses inline code to multiply by 19/16 or 13/16 (approximations of 6/5 and 5/6) to change a speed or period value.