This page is a mirror of Tepples' nesdev forum mirror (URL TBD).

# Multiplication Routine Lengths

I was just curious, how long is an average routine that multiplies one 8bit number by another? I just made one that simulates the lattus method, and it is around 400 cycles (I really can't tell if this is really bad or really good). But by looking at how long it would take to multiply \$FF by \$FF by just adding, I think it serves a bit of a good purpose. So what do you guys think?
Are you looking for byte lengths or cycle lengths? I know of a lookup-table-based method that uses a lot of ROM but multiplies two arbitrary 1-byte integers in about 100 cycles.

Start with the formula relating the square of the sum to the sum of the squares:
(a + b)^2 = a^2 + 2*a*b + b^2

Subtract (a + b)^2 from both sides:
(a + b)^2 - a^2 - b^2 = 2*a*b

Divide both sides by 2:
(a + b)^2/2 - a^2/2 - b^2/2 = a*b

Now make a lookup table for x^2/2, and you can multiply 8-bit numbers using one addition, three table lookups, and two subtractions.
I drafted a loop-based one earlier that should run in about 280 clocks worst-case. I haven't tested it though. It just goes through each bit of one of the multipliers and shifts the other left each time, adding it to the product when the bit is 1.
Hmm, that seems like a kind of complicated routine, tepples. Have any of you used the lattice (Sorry, I spelled it wrong in the post above) method when doing pen and paper multiplication? If not, you can read about it here.

So I see that my routine is a bit longer (Like 3 times as long ) than the routines mentioned above. How about division? I have a really simple idea for division that might even be easier than the multiplication one. I could make my multiplication routine just a tad bit longer to make 16 bit by 16 bit multiplication if I wanted to. I'm just trying to make a math routine library of my own, so I won't need to make complicated routines on the fly when coding a game.
Here's the basic division algorithm.

A = dividend
B = divisor
Q = quotient (initialize to zero)
R = remainder (initialize to zero)

1. Perform ASL on A.
2. Perform ROL on R.
3. If R >= B, then set R to (R - B) and set the carry flag; otherwise, clear the carry flag.
4. Perform ROL on Q.

Repeat until all bits of A are shifted out (e.g. if A is 16 bits, run the algorithm 16 times).

Hopefully I got it right.
I fail to see the lattice method as being any different than usual...
Code:
469
x  37
---
63
420
2800
270
1800
12000
-----
17353

In binary, it'd be something like this (using smaller numbers):
Code:
1101
x   101
---
1101  *1
00000  *0
110100  *1
-------
1000001

Which is what I described above; add one of the shifted multipliers when the corresponding bits are 1.
Well, it may not be different, but I don't recall saying that it was. Well, whatever.

I'd have to think about you're algorithm, dvdmth. I mean, to really understand why it would work. While I'm doing that, I'll be thinking about how to go about doing my routine.
Celius wrote:
I'd have to think about you're algorithm, dvdmth.

There seems to be a good explanation here. They also have something on multiplication.
tepples wrote:
Now make a lookup table for x^2/2, and you can multiply 8-bit numbers using one addition, three table lookups, and two subtractions.

You can get by with one less table lookup by using the identity (a+b)^2 - (a-b)^2 = 4ab (thus, lookup table for x^2/4).

Also, either both (a+b)^2 and (a-b)^2 will be 0 or 1 (mod 4), so any error from integer division will be cancelled out when subtracting the two. Compare this with using (a+b)^2/2 - a^2/2 - b^2/2, where an extra 1 must be subtracted when both a and b are odd to get the correct answer.
Oh! I think I get both the multiplication and division routines now. That's something I actually would never have thought of, for some reason. I was more looking at whole hex values. It may not seem like there's a difference at first, but there really is, with dividing. That multiplication is totally simple, and a lot more than the lattice method. So I think I will throw my routines away, and just go with the binary idea, it's better. The whole time, I thought that multiplication and division were huge processes, and no body really knew good ways to multiply. I felt it was a lot like the binary to decimal issue. I guess I need to wake up and smell the coffee, huh? Thanks again! Multiplication with the MMC5 :
lda #Multipliand
sta \$5206
lda #Multiplier
sta \$5205
lda \$5205 ;Low result
sta xxx
lda \$5206 ;High result
sta xxx

Can you get faster ??
Does this include the time it takes to find an MMC5 board?
Isn't Castlevania 3: Dracula's Curse an easy one to come by? I could be wrong, but I'd say it's not too uncommon.
tepples wrote:
Does this include the time it takes to find an MMC5 board?

Luckly that's a one time operation and could be considered part of the loading screen tepples wrote:
Does this include the time it takes to find an MMC5 board?

So true, lol !

CV3 may be the less rare of them, but it's still rare.
Though it's not as common as eg. Castlevania 2, I wouldn't consider it rare at all, there are plenty of NES CV3 carts on eBay every day.