Keldon's 119 cycle multiply with 16-bit output

1. 512 bytes ROM.
2. 4 zero page registers for input and return.

The way it works is really simple (basically long multiplication). It's slower than Damien's 90 cycle fmul, executing 118-119 cycles, which makes it on average faster than eurorusty's method.

The lookup tables are pretty reusable for low level bit manipulation. One table is a nibble flip and the other is a nibble multiplication table, which can also double up as a 16x16 transpose table.

It's written for the online 6502 compiler so you can try it in your browser too.


I also found this post with a 38 cycle multiplication with 8-bit in and 16-bit result.

The signed and fixed routines each require:
1. 38 bytes in RAM or 8 bytes in zero page (38 cycles either way).
2. 2kb of ROM.