This page is a mirror of Tepples' nesdev forum mirror (URL TBD).

# CMP setting N flag when it shouldn't?

Hi everyone,

This is probably a newbie question, but I don't fully understand the behavior of CMP in 6502 assembly. When comparing two numbers (say accumulator with something in memory), I thought it set the N flag when the accumulator is smaller than memory. However, in the following example, it sets N even if A > #\$22

Code:
LDA #\$AA
CMP #\$22

Can anyone explain what is happening here?

Thanks!

(Edited to correct a typo)
The N flag is the negative flag. It is set if bit 7 of result is set, and cleared if bit 7 of result is clear (so always a copy of bit 7 of result). Bit 7 is used as the sign bit in signed numbers, but the CPU doesn't really care if you treat a number as signed or unsigned, it will simply always set the flag the same as bit 7 of result.

CMP is actually a subtraction that doesn't affect the accumulator, it affects the N, Z, and C flags the same way as SBC a SEC, SBC-sequence does.

To check if A is smaller than the memory after a CMP, you check the C flag, not the N flag:
C=0: A < M
C=1: A >= M
This only works for unsigned numbers though.

You use the Z flag to check for equality (works with both unsigned and signed numbers):
Z=0: A != M
Z=1: A == M
So with a combination of C and Z flags you can do all types of comparisons of unsigned numbers with CMP (and CPX and CPY).
Thanks for the explanation, that makes sense!
Ringeru wrote:
However, in the following example, it sets N even if A > #\$22

Code:
LDA #\$AA
CMP #\$22

In this case, the accumulator is so large that it becomes negative. Since you're using the N flag, that means you're dealing with signed numbers. 8-bit signed numbers are in the range -128 to 127, so the \$AA you have there is actually representing the number -86, not 170, which can't be represented as a signed 8-bit number. Since -86 IS less than \$22 (34), the result is correct.

Keep in mind that there's no difference at all between signed and unsigned numbers as far as the 6502 is concerned. The bit representation is the same for both, and all results are correct for both, what changes is how the programmer interprets all the bits and status. Some flags are meant for signed operations, others for unsigned operations, and it's your job to respect the valid numerical ranges that are supported for each type and to interpret the results according to these definitions as well.

If you really need to compare signed values as large as 170, you have to bump your numbers to 16-bit, and bump the comparison to a 16-bit subtraction:

Code:
lda #\$AA ;low byte of \$00AA
cmp #\$22 ;low byte of \$0022
lda #\$00 ;high byte of \$00AA
sbc #\$00 ;high byte of \$0022

Note that while CMP can be used to compare the lower 8 bits, SBC is needed for the upper 8 bits because it takes the carry from the previous operation into comparison, while CMP doesn't. With this you can safely compare numbers in the range -32768 to 32767.

If you don't need to work with signed numbers at all, don't use the N flag, use the carry flag instead, and you can compare numbers between 0 and 255 using just the one CMP instruction (no need to bump the math to 16 bits).
According to this, the N flag is "NOT the signed comparison result".

tokumaru wrote:
Note that while CMP can be used to compare the lower 8 bits, SBC is needed for the upper 8 bits because it takes the carry from the previous operation into comparison, while CMP doesn't.
Ah yes, I said "CMP is like SBC that does not affect A". It should really be "CMP is like a sequence of SEC, SBC that does not affect A". I corrected my post above.

BTW Ringeru, one thing to remember is that the name of the flags reflects their most common usage, but not their only usage. The carry flag for example is used for arithmetic carry or borrow but also used for other totally unrelated things, and the N, Z and V flags can also be used for various not so obvious things.
Pokun wrote:
According to this, the N flag is "NOT the signed comparison result".

I never realized this (haven't done much signed math), but that's correct. In the example they gave, the math is 127 - (-128), which is effectively 127 + 128, which causes an overflow because a signed 8-bit number can only go up to 127. So yeah, I guess that comparing against a negative number can easily cause an overflow, rendering the N flag useless. Hopefully the V flag will signal if this is the case (yeah, looking at the section about signed comparisons it says that you need both the N and V flags to find the result of a signed comparison).

Quote:
one thing to remember is that the name of the flags reflects their most common usage, but not their only usage.

This is true for many other things in assembly as well, not just the flags. There are instructions like JSR and RTS which are normally used for calling and returning from subroutines, as their names imply, but when combined with a bit of stack manipulation they can be used for other purposes. There's even the simple case of BEQ/BNE, which mean "branch if equal/not equal" but you don't have to use it only after comparisons, seeing as comparisons are only one of the things that affect the Z flag, which's what ultimately controls the behavior of those instructions. Instructions, flags, addressing modes, etc. in assembly can often be used in more ways than their names imply.
There's a similar edge case in how arithmetic shift right on a signed number is not equivalent to dividing that signed number by a power of 2. (-1 / 2) == 0, but (-1 >> 1) == -1.

EDIT: See below.
Good to know! Is there any other cases where shifting doesn't equal multiplication/division with power of 2?

tokumaru wrote:
I guess that comparing against a negative number can easily cause an overflow, rendering the N flag useless. Hopefully the V flag will signal if this is the case (yeah, looking at the section about signed comparisons it says that you need both the N and V flags to find the result of a signed comparison).
Yes the signed comparison result is in N XOR V after the subtraction. That's another difference between CMP and SBC I failed to mention, CMP does not affect V unlike a SEC, SBC-sequence (also CMP subtractions are not affected by the D flag, but that's irrelevant on NES). I've corrected my post again. So for that reason CMP cannot be used in signed comparison, SEC, SBC-sequence are used instead so that V is affected.

In order to do a signed comparison you can use a SEC, SBC-sequence and then use a trick to get the signed comparison result (N XOR V) into N:
Code:
;8-bit signed comparison
SEC
SBC NUM    ;subtract NUM from A to compare them
BVC label1 ;if V = 0 then V XOR N = N
EOR #\$80   ;1 XOR N, V XOR N = N
label1:
BMI label2 ;if N = 0, A >= NUM, goto label2
BPL label3 ;if N = 1, A < NUM, goto label3
label2:
label3:
Details are explained in the above-linked tutorial. Basically if V is cleared after the subtraction, then N already is the same as V XOR N. Else if V is set, EOR with \$80 (N is bit 7) to get N = 1 XOR N = V XOR N. Now when result is in N, BMI or BPL can be used to branch.

Code:
;16-bit signed comparison
LDA NUM1_L
CMP NUM2_L  ;compare low byte using CMP
LDA NUM1_H
SBC NUM2_H  ;compare high byte using SBC to include C and V
BVC label   ;if V = 0 then V XOR N = N
EOR #\$80    ;1 XOR N, V XOR N = N
label:
BMI label2 ;if N = 0, NUM1 >= NUM2, goto label2
BPL label3 ;if N = 1, NUM1 < NUM2, goto label3
label2:
label3:
Higher than 8-bit is done the same way only each byte must be compared and the C flag must be included. Only the low byte can use CMP, the rest all have to use SBC (without a SEC) for the subtraction so the carry is included and so that the overflow flag is affected.

Signed comparison is useful if you are making an action game with acceleration-based movement. That way you can use positive and negative acceleration and velocity to move objects with.
Nicole wrote:
There's a similar edge case in how arithmetic shift right on a signed number is not equivalent to dividing that signed number by a power of 2. (-1 / 2) == 0, but (-1 >> 1) == -1.

Actually it does, but similarly to when divinding positive numbers by 2, the result is always rounded down. -1/2 = -0.5, rouded down it makes -1 so the result is correct.

If you want the result to be rounded up, you need to do an ADC #\$00 after the shift (this works for both signed and unsigned numbers). (edit: I actually use this in my NES music engine to handle octave shifts of frequencies - without this the pitch tends to sound wrong !).

When it comes to the logic of V and N for signed numbers I've never fully understood it despite years of 6502 coding, but Tokumaru explained it greatly. Basically if V=1 the result stops being meaningful for signed numbers, and when V=0 then N is the sign of the result. But what happens when adding an unsigned 8-bit with a signed 8-bit, a situation that is less really rare in a game for example when moving objects and the coordinates are always positive but the speed can be negative ? Or when mapping a metasprite where the coordinates are always positive but relative position to hotpoint can be negative.
Nicole wrote:
There's a similar edge case in how arithmetic shift right on a signed number is not equivalent to dividing that signed number by a power of 2. (-1 / 2) == 0, but (-1 >> 1) == -1.

Another way to put this is that an arithmetic right shift on a negative number rounds down rather than toward zero. (Though C and C++ implement round toward zero with their division operator, some languages like Python round down in this way, and there are arguments for doing it that way, in particular how the modulo operator corresponds, but the different standards make it a point of confusion.)

This can be fixed with an increment before the right shift.
(-1 + 1) >> 1 = 0

For a signed arithmetic shift right you can detect sign and correct the rounding with an ADC #0. A pseudo operation for signed divide by two might look like:
Code:
ADC #0 ; +1 if signed
CMP #\$80 ; load the new carry
ROR ; right shift

If you need to do more than one shift, you need to reload the carry each time. The rounding up, on the other hand, can be done in one step (i.e. if >> 3 you can add +7 rather than incrementing before each shift). Code to divide a signed number by a larger power of two will probably want to branch on the sign bit and have different code for the negative and positive sides.

Many signed operations on the 6502 are a bit more complex than their unsigned counterparts in similar ways.

Edit: bregalad got to it while I was writing this, heh. I guess this is not entirely redundant though.
Ah, yeah, you're right. Same goes for stuff like -3, -5, etc. so this isn't really an "edge case" at all.
rainwarrior wrote:
Nicole wrote:
There's a similar edge case in how arithmetic shift right on a signed number is For a signed arithmetic shift right you can detect sign and correct the rounding with an ADC #0. A pseudo operation for signed divide by two might look like:
Code:
ADC #0 ; +1 if signed
CMP #\$80 ; load the new carry
ROR ; right shift

If you need to do more than one shift, you need to reload the carry each time.

Or else handle positive and negative values separately.
Code:
; Get value/4, rounded toward zero into accumulator
lda value
bpl positive
negative:
lsr
lsr
bvc done ; Adding negative value to positive number won't overflow
positive:
lsr
lsr
done:

To use floored division rather than truncating, one could substitute "ora #\$C0 / bmi done" for the negative case.
When it comes to the logic of V and N for signed numbers I've never fully understood it despite years of 6502 coding, but Tokumaru explained it greatly. Basically if V=1 the result stops being meaningful for signed numbers, and when V=0 then N is the sign of the result.
Yes if there was an overflow, a negative sign doesn't mean that the minuend of the subtraction is smaller anymore, but actually the reverse:
Code:
V XOR N
0 XOR 0 = 0: no overflow, positive difference, minuend is bigger or equal
0 XOR 1 = 1: no overflow, negative difference, minuend is smaller
1 XOR 0 = 1: overflow, positive difference, minuend is smaller
1 XOR 1 = 0: overflow, negative difference, minuend is bigger or equal
So I guess basically if there was no overflow, positive means minuend can't be smaller than the subtrahend, and negative means it must be smaller. If there was an overflow however the reverse is true.

But what happens when adding an unsigned 8-bit with a signed 8-bit, a situation that is less really rare in a game for example when moving objects and the coordinates are always positive but the speed can be negative ? Or when mapping a metasprite where the coordinates are always positive but relative position to hotpoint can be negative.
If mixing unsigned and signed numbers is a problem, I guess you may convert both 8-bit numbers to signed 16-bit numbers (so that the unsigned number fits) first. Then you can do 16-bit signed comparisons. I'm not sure adding is a problem though. I'm adding an object's velocity value to its position value each frame, and if the velocity is negative it will simply work like a subtraction and the object will move backwards.
My opinion is that bit shifting only makes sense for unsigned numbers.

When you work with an 8 bit CPU... bit shifting is more a means to calculate PPU addresses, or like the example above, music code, where the data you're working with only makes sense as a positive unsigned value.

And 1 shifted to 1/2 should shift to 0. You're past the granularity of 1 pixel, just drop it to zero.
dougeff wrote:
My opinion is that bit shifting only makes sense for unsigned numbers.

Bit shifting signed numbers left is equivalent to multiplication in cases where the arithmetical value of the product would fit in the result type. Bit shifting of signed numbers right is equivalent to floored division. In what way do those not make sense?
dougeff wrote:
My opinion is that bit shifting only makes sense for unsigned numbers.

Even with the floor behaviour on signed numbers, that's still useful for many purposes just by itself. What makes sense or not is how you apply it.

Whether you need round toward zero depends on the situation. It might be what you expect for division on signed numbers, as it is the default in several programming languages, and is how we might truncate a written decimal number.

If you want the negative and positive to be symmetrical, you want round toward zero. On 6502 that means signed division is more expensive, requiring a few more operations, but that's true of so many of its signed operations.

If you want a continuous modulo across positive and negative space, you want round down instead. In some cases, this may also help balance rounding errors and increase numerical stability, e.g. if generating a sine wave this way, negative rounding errors may cancel positive rounding errors in a way they wouldn't if you had rounded toward zero.

In other cases the rounding error is small enough to be unimportant, and you might just go with whatever's fastest. In this case, you can probably accept the round down behaviour for efficiency's sake. (On a ones' complement platform, round to zero might be faster.)

There's a time and place for both of these behaviours. Both of them make sense, but I think the round down behaviour is a little bit unexpected when you're used to round to zero (C, C++, x86 IDIV, etc.). I know I didn't think of it right away.
I understand it. I just don't use it.

For me, the way I code, it doesn't make sense.

Maybe also worth noting that a left shift (i.e. multiply by 2) doesn't have this ambiguity.

<< 1 is x 2 for both positive and negative, unsigned and signed numbers alike.
rainwarrior wrote:
Maybe also worth noting that a left shift (i.e. multiply by 2) doesn't have this ambiguity.

<< 1 is x 2 for both positive and negative, unsigned and signed numbers alike.

Are you kiding ? If you shift a negative number left, you'll get a completely wrong resut if it's not adjusted. Even if the result is adjusted, you have to be watchful for all overflows and bits shifting into the sign bit in all cases.
Are you kiding ? If you shift a negative number left, you'll get a completely wrong resut if it's not adjusted. Even if the result is adjusted, you have to be watchful for all overflows and bits shifting into the sign bit in all cases.
Overflow is exactly equally as problematic regardless of whether the number is signed or unsigned.

(all 8 bit types:)
-120 << 1 = 16
136 << 1 = 16
-30 << 1 = -60
Sure, but the thing is that overflow can not only give a wrong result, but also with the wrong sign, which is tricky.
Also, this is why there's no difference between arithmetic shift left and logical shift left. I wonder why the 6502's instruction is called ASL and not LSL since the latter would be the same but be more symetrical to LSR. Oh well.
Are you kiding ?

No. ?

If the result of the multiply fits in the range of the number, the left shift is a valid x2.

Otherwise there's no possible valid result. Overflow. That's completely normal. Same problem applies to unsigned multiply.

If you shift a negative number left, you'll get a completely wrong resut if it's not adjusted.

What does "adjusted" mean? Unlike shifting right, there is no loss of precision from rounding. A left shift bringing in a zero is a multiply by 2 in two's complement.

you have to be watchful for all overflows and bits shifting into the sign bit in all cases.

No, you only get an invalid sign bit in the overflow case. In all other cases, the sign bit remains valid.

For overflow, you have to be as watchful for that whether it's signed or unsigned.

Sure, but the thing is that overflow can not only give a wrong result, but also with the wrong sign, which is tricky.

Well, if you're trying to detect overflow at runtime, signed is slightly different. Before doing the shift, the overflow indication would be XOR of the top 2 bits, rather than just the top 1 bit for unsigned. Kinda similar to the signed comparison requiring 2 bits of info.

Otherwise you can prevent overflow by knowing your ranges and keeping the numbers small enough, but that principle is the same for unsigned.

...this is why there's no difference between arithmetic shift left and logical shift left.

Yes. This is very strongly correlated with the fact that <<1 is x2 for both unsigned and signed.
Also, this is why there's no difference between arithmetic shift left and logical shift left. I wonder why the 6502's instruction is called ASL and not LSL since the latter would be the same but be more symetrical to LSR. Oh well.

Zilog Z80 has a "logical shift left" at CB 60 through CB 67 that's like SCF then RL (or like SEC then ROL in 6502 syntax). This turned out not to be very useful, so Sharp replaced it in the SM83 (8080-inspired CPU core used in Game Boy's LR35902 SOC and some 8-bit MCUs) with a "swap nibbles" instruction.
Quote:
What does "adjusted" mean? Unlike shifting right, there is no loss of precision from rounding. A left shift bringing in a zero is a multiply by 2 in two's complement.

Oh sorry I confused with shifting right... oh well. I'm getting old guys.
I wasn't aware of the different rounding for signed divide vs signed shift right. But I think that I do now understand why disassembled C code is so crappy!

In ASM code, almost everything is almost always unsigned (eg. memory addresses, loop counters, tile numbers, and so on). In most cases it really doesn't make sense to use signed numbers (and least to shift or divide them).

For C programmers, the main difference between "int" and "uint" is probably that "int" is shorter (and easier to pronounce). And so they might end up with signed "int", without actually being aware of what they are doing (and what the compiler will do if they use a supposedly harmless expression like "i/32" instead of right shifting).

For the example, the 3DS bootrom has some interrupt handling code like this:
Code:
;in:  r4 = irq.no (range 0..7Fh)
;out: r3 = address of 32bit word: (17E01200h+(irq.no/20h*4))
;out: r1 = bit number within 32bit word: (irq.no AND 1Fh)
;---
0001247C 17E1     asrs    r1,r4,1Fh    ;sign-bit of irq.no
0001247E 4B0B     ldr     r3,=17E01200h
00012480 0EC9     lsrs    r1,r1,1Bh    ;sign*1Fh (=00h or 1Fh)
00012482 1909     adds    r1,r1,r4     ;irq.no + sign*1Fh
00012484 114A     asrs    r2,r1,5h     ;irq.no/20h
00012486 0949     lsrs    r1,r1,5h     ;irq.no/20h
00012488 0092     lsls    r2,r2,2h     ;irq.no/20h*4
0001248A 0149     lsls    r1,r1,5h     ;irq.no/20h*20h
0001248C 1A61     subs    r1,r4,r1     ;irq.no - (irq.no/20h*20h)  ;aka AND 1Fh
0001248E 18D3     adds    r3,r2,r3     ;17E01200h + (irq.no/20h*4)
The compiler did apparently try to optimize "div 20h" as "shift 5", but then it went amok on rounding the (un-)signed result towards zero.
The code would be probably twice as small if the programmer had declared r4 as unsigned value (or if the source code had used shift instead of divide).

Assuming that it's a pretty common problem, and that it's impossible to teach C programmers not to use signed numbers... it would almost make sense to implement a "shift-and-round-towards-zero" opcode in newer processors (the newer ARM CPUs do actually have a fairly useless "uxt" opcode which helps on similar compiler-world issues, eg. when compilers think that they must ensure that "mov r0,15h" won't exceed FFFFh; which usually requires two useless opcodes, but can be now replaced with only one useless opcode).
nocash wrote:
Assuming that it's a pretty common problem, and that it's impossible to teach C programmers not to use signed numbers... it would almost make sense to implement a "shift-and-round-towards-zero" opcode in newer processors (the newer ARM CPUs do actually have a fairly useless "uxt" opcode which helps on similar compiler-world issues, eg. when compilers think that they must ensure that "mov r0,15h" won't exceed FFFFh; which usually requires two useless opcodes, but can be now replaced with only one useless opcode).

What's really needed is a usable replacement for C which includes integer types with better semantics. Python, IMHO, did things properly, using separate operators for floating-point division (which converts any operand type to a floating-point number) and integer division (which rounds negative) but it's not really suitable for low-level tasks. In several decades of programming, I can only think of one time when truncating division would have been useful for a negative divisor, and on that occasion (targeting a Z80) the compiler handled it sufficiently slowly that I had to use a manually-rounded shift anyway.
I've been trying to learn C# lately. After writing some functional code, I thought to myself "maybe I should just replace all the int's with uint, it should run faster"

Then I got about a dozen error messages. It seems that most of the system's functions expect signed int... and simply can't handle a uint, for some reason.
nocash wrote:
Assuming that it's a pretty common problem, and that it's impossible to teach C programmers not to use signed numbers... it would almost make sense to implement a "shift-and-round-towards-zero" opcode in newer processors (the newer ARM CPUs do actually have a fairly useless "uxt" opcode which helps on similar compiler-world issues, eg. when compilers think that they must ensure that "mov r0,15h" won't exceed FFFFh; which usually requires two useless opcodes, but can be now replaced with only one useless opcode).

Why do you think C programmers are uncomfortable with unsigned numbers? For high performance code, it's probably normal to go unsigned by default? That was one of the first things I saw in internal coding guides when I started working professionally.

Though honestly, this entire problem only happens with division. ...and it doesn't even happen on x86, because it's IDIV instruction is already round to zero.

Actually technically the C spec allows signed division rounding to be implementation defined, so even the ARM compiler you were using could have rounded down if it wanted. Though, TBH, it's probably a good idea that it doesn't, since I think prevailing common expectation is to round to zero. C++11 decided to define it to finally define it as round to zero, BTW, but it has more or less been the de-facto standard for a long time.

(Technically even signed by default is implementation defined. cc65 defines default char as unsigned, for example. It's rare to see this in practice though.)

Also, aside from typing "unsigned" being a really simple available solution... if you don't like typing it, there's a really simple solution for that too:
Code:
typedef unsigned int uint;
rainwarrior wrote:
Though honestly, this entire problem only happens with division.

That and the general trend of newer versions of C and C++ compilers becoming more pedantic about treating signed overflow and other undefined behaviors as an excuse to make unexpectedly aggressive optimizations. Raymond Chen has referred to this aggression as "time travel". Unsigned addition, subtraction, and multiplication, on the other hand, are carefully defined to wrap around modulo [type]_MAX + 1.
rainwarrior wrote:
even the ARM compiler you were using could have rounded down if it wanted.
Hell, no, I wasn't using a compiler, not me. The disassembly was from Nintendo's boot rom. I am not writing that kind of code, what do think?
I didn't knew that there coding guides recommending to use unsigned numbers in professional C code, that's a good thing. Apparently not everyone at Nintendo did read those guides.
The problem isn't division as such. The problem is replacing signed division by shifting (particulary when you have a number that is never negative, and the compilier is nethertheless producing nonsense code for handling the sign bit).
tepples wrote:
rainwarrior wrote:
Though honestly, this entire problem only happens with division.

That and the general trend of newer versions of C and C++ compilers becoming more pedantic about treating signed overflow and other undefined behaviors as an excuse to make unexpectedly aggressive optimizations. Raymond Chen has referred to this aggression as "time travel". Unsigned addition, subtraction, and multiplication, on the other hand, are carefully defined to wrap around modulo [type]_MAX + 1.

Time travel would be fine if it were limited to things which have no causal relationship. For example, given
Code:
#include <math.h>
volatile int x,y;

void test(long long temp)
{
while(temp & 1234)
temp = 123456789123456789*sin(temp);
if (x) y=temp;
}

I would not think it unreasonable for a compiler to perform the read of "x" before performing the computations involving "temp" and skip those computations if x yields zero, even if the loop would never have exited.

Unfortunately, "modern" compilers use Undefined Behavior not only as an excuse to engage in time travel, but also to negate the laws of causality. For example, given:
Code:
unsigned mul_mod_65536(unsigned short x, unsigned short y)
{ return x*y & 0xFFFF; }

volatile unsigned q;
unsigned test(unsigned short x)
{
unsigned sum=0;
x|=0x8000;
for (int i=0x8000; i<=x; i++)
{
q=mul_mod_65536(65535,i);
sum++;
}
return sum;
}

When targeting platforms where "int" is 32 bits, gcc will generate code for "test" which always writes 32768 to "q" and returns 1, ignoring the value of "x".