This page is a mirror of Tepples' nesdev forum mirror (URL TBD).

# [lazy] Can't get this floating point code

This code: http://6502.org/source/floats/wozfp1.txt
I wandered around aimlessly as usual, and stumbled at this. Happy at the possibility to finally improve my math-fu, I started reading. Then confusion, then lazyness, then this thread.

You can either discuss this somehow, or answer questions below made exclusively for you.

I want to look at this code in fceux's debugger. Maybe anyone have this or simular code in some demo? Running code in my head isn't a pleasant experience. Or maybe it is...///

How is it possible to have logarithm on a processor that doesn't support even multiplication? I lack proper math background to understand it.

Aaaand, the most stupid of theeeeem......! What the hell IS logarithm, in that game it can be useful? Yes, 6502 core hasn't been used only in NES, I know, but blargh.
michikaze wrote:
How is it possible to have logarithm on a processor that doesn't support even multiplication? I lack proper math background to understand it.

Probably CORDIC, a way of computing transcendental functions (sine, cosine, exponent, logarithm, etc.) using only adds and shifts. It is derived using a bunch of trigonometric identities.

Sine and cosine and arctangent are useful for things like determining which way a player's unit is pointed.
michikaze wrote:
I want to look at this code in fceux's debugger.

That would require you to build a valid NES ROM around the code. May I suggest this 6502 Simulator instead? It has some nice debugging features and you can easily get code running with jus an ORG statement indicating where the code should be located in memory.
tepples wrote:
CORDIC

Nice, thank you!
From wikipedia page I got only how to calculate sine and cosine. But it also methoned that it can be used for exponential functions and logarithms. Huh...

tokumaru wrote:
6502 Simulator

Nice, thank you! Going to try soon.
"CORDIC" in the Digital Circuits textbook on Wikibooks explains how it's generalized to exp and log.
Looks more like a couple of iterations of some other numerical method to me, though I must confess to being to lazy (and/or stupid) to decipher it. In any event Feynman's algorithm is probably faster and certainly easier to understand than CORDIC, though it boils down to a suspiciously similar shift-and-add sequence.

Let's say you want to compute the base-2 logarithm of 1≤x<2. This is easily generalized to any binary floating-point number by computing the logarithm of the normalized mantissa and adding the exponent (plus one) to the result. The trick, then, is to factorize x into a series of 1+2^-i factors. That is we start with y=1 and from i=0 on down successively multiply by 1+2^-i, keeping the new factor if the result does not exceed x. Note that these multiplications may be implemented through shift-and-add, e.g. y'=y>>i.

The logarithm y is then simply the sum of the logarithms of all individual 1+2^-i factors which may be pre-calculated and stored in a table.

Here's a quick-and-dirty C implementation in 1.15-bit fixed-point:
Code:
unsigned int feynlog(unsigned int x) {
static const unsigned int tab[] = {
/* i=1  */ 19168, /* i=2  */ 10549,
/* i=3  */ 5568,  /* i=4  */ 2866,
/* i=5  */ 1455,  /* i=6  */ 733,
/* i=7  */ 378,   /* i=8  */ 184,
/* i=9  */ 92,    /* i=10 */ 46,
/* i=11 */ 23,    /* i=12 */ 12,
/* i=13 */ 6,     /* i=14 */ 3,
/* i=15 */ 1,     /* i=16 */ 1
};
unsigned int y = 32768;
unsigned int z = 0;
for(unsigned int i = 1; i < 16; ++i) {
unsigned int t = y + (y >> i);
if(t <= x) {
y = t;
z += tab[i - 1];
}
}
return z;
}

As for uses the most frequent on the 6502 is probably to speed up multiplication and division, much as a human would use a slide-rule, though typically you'd store pre-calculated logarithm and exponent tables rather than computing them at start-up.
doynax wrote:
Feynman's algorithm

......Please let me to stare at this for a few more days, I'll get this for sure......
Don't get that those numbers in the array are.

Quote:
The Feynman Algorithm:

Write down the problem.
Think real hard.
Write down the solution.

The Feynman algorithm was facetiously suggested by Murray Gell-Mann, a colleague of Feynman, in a New York Times interview.

Now retyping the code to get it running in the emulator, please watch warmly.
michikaze wrote:
......Please let me to stare at this for a few more days, I'll get this for sure......
Don't get that those numbers in the array are.
Perhaps an example might serve to illustrate the scheme.

The factorization here is analogous to the positional number system. Imagine that you wanted to encode any number 1≤x<2 by hand as a binary fraction. You might start with 1, then successively add further smaller terms. If the result is still larger after adding a term then include it (a one bit), otherwise omit it and move on (a zero bit).

Let's pick x=1.4142:
Code:
1+2^-1                          = %1.1      = 1.5 > x, so omit the 2^-1 term
1     +2^-2                     = %1.01     = 1.25 < x, so include the 2^-2 term
1     +2^-2+2^-3                = %1.011    = 1.375 < x, so include the 2^-3 term
1     +2^-2+2^-3+2^-4           = %1.0111   = 1.4375 > x, so omit the 2^-4 term
1     +2^-2+2^-3     +2^-5      = %1.01101  = 1.40625 < x, so include the 2^-5 term
1     +2^-2+2^-3     +2^-5+2^-6 = %1.011011 = 1.421875 > x, so omit the 2^-6 term
This is a variant of binary search. At each step the approximation grows more precise, rapidly converging on the true value one bit at a time.

Instead of using a sum of 2^-i terms Feynman's algorithm uses a product of 1+2^-i factors to encode the number:
Code:
1·(1+2^-1)                                               = 1.5 > x, so omit the 1+2^-1 factor
1         ·(1+2^-2)                                      = 1.25 < x, so include the 1+2^-2 factor
1         ·(1+2^-2)·(1+2^-3)                             = 1.40625 < x, so include the 1+2^-3 factor
1         ·(1+2^-2)·(1+2^-3)·(1+2^-4)                    = 1.494140625 > x, so omit the 1+2^-4 factor
1         ·(1+2^-2)·(1+2^-3)         ·(1+2^-5)           = 1.4501953125 > x, so omit the 1+2^-5 factor
1         ·(1+2^-2)·(1+2^-3)                  ·(1+2^-6)  = 1.42822265625 > x, so omit the 1+2^-6 factor
In this case it is less apparent that any x in the range is reachable. The base 2 used here is in fact sub-optimal but is close enough to the ideal one (~2.2656) while being easily evaluated through shifts and adds. In general smaller bases are finer grained, meaning that the results converge more slowly, whereas larger bases have unreachable gaps in the output.

The point of all this is to make use of the logarithmic identity which lowers multiplication into addition, e.g. log(p*q) = log p + log q. Therefore if we can factorize x into a series of 1+2^-1 factors then log x is simply the sum of the logarithms of the included factors.
In our example y = 1·(1+2^-2)·(1+2^-3) ≈ x, therefore log x ≈ log y = log(1·(1+2^-2)·(1+2^-3)) = log 1 + log(1+2^-2) + log(1+2^-3).

This is in fact what the table in the code above is all about. A table of pre-calculated 1+2^-i logarithms. Binary logarithms in fixed-point to be precise, e.g. log2(1+2^-1) · 2^15 ≈ 19168.

michikaze wrote:
Quote:
The Feynman Algorithm:
Write down the problem.
Think real hard.
Write down the solution.
Good advice. He was active in a different field so I suppose I should have called it Feynman's logarithm algorithm to help your Google efforts but logarithm algorithm sounds somehow redundant to me, even though I know the etymologies are quite distinct.

I apologize for these half-on-topic ramblings but I'm a sucker for elegant algorithms and this one certainly qualifies. Despite its original application.
First - you guys are awesome.

Second - I'm playing with function "float" now. And trying to comment. Look if you have nothing better to do.

Emulator's editor and this forum's formatter have different tab lengths, so it's not pretty...

The first part is commented out, because with it, execution starts from adress \$0.

Code:
;   .org \$0
;   .db 0,0,0   ;not used?
;sign:   .db \$ea
;x2:   .db \$ea      ;exponent 2
;m2:   .db 0,0,0   ;mantissa 2
;x1:   .db \$ea
;m1:   .db 0,0,0
;e:   .db 0,0,0,0   ;scratch
;z:   .db 0,0,0,0
;t:   .db 0,0,0,0
;sexp:   .DB 0,0,0,0
;int:   .db 0

sign = \$3
x2 = \$4
m2 = \$5
x1 = \$8
m1 = \$9
e = \$c
z = \$10
t = \$14
sexp = \$18
int = \$1c

.org \$600

io_putc   = \$e001
io_posx   = \$e005
io_posy   = \$e006

;LDA #9
;STA io_posx
;LDA #2
;STA io_posy

; print 'aaa'
;LDA #'a'
;STA io_putc
;STA io_putc
;STA io_putc

; SET 16 BIT INTEGER
LDA #\$0
STA m1      ;HIGH
LDA #\$03
STA m1+1   ;low

; CONVERT 16 BIT INTEGER TO 32 BIT FLOAT
JSR float

BRK

;;;;;;;;;;;;;;;;;;;;FLOAT;;;;;;;;;;;;;;;;;;
;CONVERT 16 BIT INTEGER IN M1(HIGH) AND M1+1(LOW) TO F.P.
;RESULT IN EXP/MANT1.  EXP/MANT2 UNEFFECTED

;So... how it works.
;It sets exponent to \$e (maximal possible for 16bit input)
;and [dec exponent,
;     shiftleft 3byte mantissa bit by bit]
;untill done.

;examples of output:
; 00 01 -> 80 40 00 00
; 00 02 -> 81 40 00 00
; 00 03 -> 81 60 00 00
; 00 41 -> 86 41 00 00
; 41 00 -> 8e 41 00 00
; 00 81 -> 87 40 80 00
; 40 00 -> 8e 40 00 00

; NEGATIVE VALUES!
; 80 00 -> 8e 80 00 00 ;finally mant. starts from 8
; ff ff -> 7f 80 00 00 ;negative?
; 8f ff -> 8e 8f ff 00 ;didn't quite get this
;            ;"complement" mantissa
; 00 00 -> 00 00 00 00 ;and it loops a lot
;(until exp decs to 0)

float:   LDA #\$8e   ;init \$08 and \$0b (\$09 and \$0a should've been
STA x1      ;set exponent to 14dec (\$0e in hex)
LDA #0      ;clear most right byte
STA m1+2   ;/init
BEQ norm   ;normalize result
; Jump is always taken. Zero flag is set by LDA #0.
; They used BEQ insead of JMP because it supports relative
; addressing mode, and so takes one byte less.
norm1:   DEC x1      ;decrease exponent by one
; SHIFT!
ASL m1+2   ;     carry <- [\$0b] <- 0
;WAIT, isn't [\$0b] always zero?
;*proud* *proud*
ROL m1+1   ;     carry <- [\$0a] <- carry
ROL m1      ; who cares <- [\$09] <- carry
norm:   LDA m1      ; huh...
ASL      ; ...continue until...
EOR m1      ; ...left two bits of mantissa are...
BMI rts1   ; ...either 10 or 01.
; 10 is for negative floats
LDA x1
BNE norm1   ; or until we got all zeroes
rts1:   RTS      ;return! Finally!
And new version, this time with add. Wait a few years, and I will get to the interesting stuff.

God, retyping stuff is so hard.
Now with fsub, fmul, fdiv and fix.

I think these kind of things are better to read this way, from short to complex. So, I will post them slowly growing up, step by step.
Anyone knows how to make these work in 6502 simulator? It doesn't support "DCM" instruction, it seems.

Code:
1DCD  7E 6F     LN10   DCM 0.4342945
2D ED
1DD1  80 5A     R22    DCM 1.4142136   SQRT(2)
02 7A
1DD5  7F 58     LE2    DCM 0.69314718  LOG BASE E OF 2
B9 0C
1DD9  80 52     A1     DCM 1.2920074
80 40
1DDD  81 AB     MB     DCM -2.6398577
86 49
1DE1  80 6A     C      DCM 1.6567626
08 66
1DE5  7F 40     MHLF   DCM 0.5
00 00
To use undocumented instructions you usually have to use .db and .dw statements. Look up the opcode for the instruction (considering the addressing mode) in a document like this, and use .db followed by that value. Then comes the address or immediate value, which can be 8 or 16 bits long, so use .db or .dw accordingly.
tokumaru wrote:
To use undocumented instructions you usually have to use .db and .dw statements. Look up the opcode for the instruction (considering the addressing mode) in a document like this, and use .db followed by that value. Then comes the address or immediate value, which can be 8 or 16 bits long, so use .db or .dw accordingly.

Those aren't undocumented instructions, they are floating point values. I'm not aware of any assembler that supports them. Maybe a macro hackery based on a string could be possible on some assemblers.
I just realised, those numbers are in format that is probably used only by this code. Still curious if whatever those guys at Apple used does actually support this keyword.

Ok, put it in as .db chain.
Not sure if any assemblers support this format, but Applesoft BASIC uses a 5-byte floating point format.
And with log, log10, exp. Surprisingly, they are much simplier than fmul and fdiv, just a bunch of jsr and loops. And, at the beginning of log, program gets exponent alone and converts it to float. And at end of exp he makes something opposite.

I'm not sure if it even works correctly. I proceeded one large enough number through log and exp back and forth, it's not supposed to change, is it? e^ln(x) = x, right? Probably a typo somewhere.

Still don't know what is actually goes on in there, a bunch of random fadd and fmul and fdiv and fmul, it doen't make sense for me.

Edit: I feel like this is time then I should start reading Knuth.

Edit2:
So, according to this code, if
x1=2^exp+1.mnt
then
T=(mnt-SQRT(2))/(mnt+SQRT(2))
ln(x1)=( (-2.6398577/(T*T-1.6567626)+1.2920074)*T +0.5 +exp )*LN(2)
Does it make any sense for you? For me it surely doesn't.
And why they preferred to use FSUB with C, instead of placing negative of C in code and using FADD? They only use C once anyway.

Edit3:
http://maxwellsci.com/print/rjaset/v6-119-122.pdf
>The logarithm is widely implemented with: Table lookup method, Taylor series expansion method and linear approximation method.
This surely isn't the CORDIC, since it doesn't restrict itself only to add and shift. And isn't the table lookup method or the taylor series expansion method, since it doesn't use tables.

Edit4:
http://math.stackexchange.com/questions ... -algorithm
I love them.