This page is a mirror of Tepples' nesdev forum mirror (URL TBD).

# Game math library

I began a library of 6502 game math subroutines when building the game Thwaite. And with the announcement that uc65 would likely ship without multiply and divide operators, I'm considering offering it as a self-contained library. Here's what I have so far:
• Multiply A by Y, 8 bits, about 150 cycles
• Divide floor(256 * A / Y), for converting rise and run to an 0.8 fixed-point slope
• Arctangent, producing the angle of the vector from (a, b) to (c, d) in units of 1/32 turn, within 380 cycles
• Square root of a 16-bit number within 520 cycles
• Unrolled binary to decimal conversion, 8 bits to 3 digits, within 80 cycles
• Looping binary to decimal conversion, 16 bits to 5 digits, within 652 cycles
• Percentage calculator, producing the decimal expansion of a/b with 16-bit numerator and denominator to up to five digits at 230 cycles per digit. Useful for accuracy counters like the one on Galaga's debrief screen.

I built it alongside a basic 6502 simulator written in Python so that I could exhaustively verify the subroutines' correctness. This way I make sure that what the subroutine thinks is 23 * 45 matches what Python thinks is 23 * 45. I tested the simulator itself using nestest, including the official instructions and the useful unofficial instructions, to make sure it matches the register values on each line of the Nintendulator golden log.

There are two ways to calculate the length of a vector (a, b). Normally, you'd do sqrt(a * a + b * b). But Thwaite uses a shortcut to skip the square root. If you already have the angle theta, the length is two table lookups and two multiplies: a * cos(theta) + b * sin(theta).

Who is interested in having a look at this library? Are there any additions you think others might use?
Is the multiplication the difference-of-squares method, or binary long multiplication?
In light of the types supported by uc65, it might be nice to provide the full set of up-to-32-bit-result multiplications (8x8, 8x16, 8x24, 16x16).
I tried difference of squares years ago, and it took about 90 cycles and a lot more bytes than the 8x8 long mul that Thwaite uses. I was trying to keep routines short enough to keep in a fixed bank as opposed to swapping all the time, seeing as how binary code size is a common complaint against cc65. But I could consider adding difference of squares for completeness, along with mul subroutines for larger integers that call this one in a loop.
tepples wrote:
I tried difference of squares years ago, and it took about 90 cycles and a lot more bytes than the 8x8 long mul that Thwaite uses.

It does take a lot of bytes both for tables and on zp but it's a lot faster than 90 cycles.
in the version attributed to George Taylor (who said he got it from someone else)
in (A+B)^2-(A-B)^2 the A+B and A-B are done using the indexing mechanism
it's basically just a 16 bit indirect indexed addition with some set up of pointers
(and negation of B) that only needs to be done once if B doesn't change
(without checking) my recollection is it's about 28 cycles
I believe there're slightly faster versions that build more into the tables
tepples wrote:
[*]Arctangent, producing the angle of the vector from (a, b) to (c, d) in units of 1/32 turn, within 380 cycles

This should be much faster, although it uses relatively large tables.