This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

STA indirect indexed double-increments PPU address?

STA indirect indexed double-increments PPU address?
by on (#101266)
I was playing around with cc65, and I noticed that something like the following:
Code:
((unsigned char*)0)[0x2007] = a

Compiles into to something equivalent to:
Code:
LDX #$20
STX $11
LDX #$00
STX $10
LDY #$07
STA ($10), Y


This example is simplified, it isn't exactly what you get from cc65, but it seems cc65's [] operator in this case results in an indexed indirect store similar to this one.

What I discovered, though, is that across all the emulators I tried, STA to $2007 via an indirect indexed address like this appears to increment the PPU write address by two, rather than just one. On the first increment, PPU memory is not written. On the second increment, my value in A is stored to the PPU. So... it skips the byte I was aiming for and writes the next one instead! What is it about indirect indexed addressing that causes this behaviour?

Anyhow, this is also a warning, I guess, that if you're going to use cc65, don't try to write memory mapped registers this way. (edit: see posts below for syntax that does not have this problem.)
Re: STA indirect indexed double-increments PPU address?
by on (#101267)
I don't really understand, why ((unsigned char*)0)[0x2007] = a rather than *((unsigned char*)0x2007)=a ?
Re: STA indirect indexed double-increments PPU address?
by on (#101268)
Ah! That works much better, thanks!
Code:
*((unsigned char*)0x2007) = a

Generates:
Code:
STA $2007


And by the way, the reason I had done it the other way first was because of suggestion #12 in this cc65 doc: http://www.cc65.org/doc/coding.html
I misapplied it... I suppose the advice is only for when wanting to add to a pointer as an index, not for using static addresses like this.
Re: STA indirect indexed double-increments PPU address?
by on (#101273)
Which then leads to header files looking like this:
Code:
#define PPUCTRL   (*(volatile unsigned char*)0x2000)
#define PPUMASK   (*(volatile unsigned char*)0x2001)
#define PPUSTATUS (*(volatile unsigned char*)0x2002)
#define OAMADDR   (*(volatile unsigned char*)0x2003)
#define OAM_DMA   (*(volatile unsigned char*)0x4014)
#define PPUSCROLL (*(volatile unsigned char*)0x2005)
#define PPUADDR   (*(volatile unsigned char*)0x2006)
#define PPUDATA   (*(volatile unsigned char*)0x2007)
Re: STA indirect indexed double-increments PPU address?
by on (#101277)
tepples wrote:
Which then leads to header files looking like this:
Code:
#define PPUCTRL   (*(volatile unsigned char*)0x2000)
#define PPUMASK   (*(volatile unsigned char*)0x2001)
#define PPUSTATUS (*(volatile unsigned char*)0x2002)
#define OAMADDR   (*(volatile unsigned char*)0x2003)
#define OAM_DMA   (*(volatile unsigned char*)0x4014)
#define PPUSCROLL (*(volatile unsigned char*)0x2005)
#define PPUADDR   (*(volatile unsigned char*)0x2006)
#define PPUDATA   (*(volatile unsigned char*)0x2007)

I like to do this:
Code:
struct _PPU {
    byte ctrl;
    byte mask;
    byte const status;
    byte oam_addr;
    byte oam_data;
    byte scroll;
    byte addr;
    byte data;
};
#define    PPU         ( *( struct _PPU volatile * )0x2000 )

(BTW, volatile has no effect in CC65 currently.)
Re: STA indirect indexed double-increments PPU address?
by on (#101279)
It's doing exactly what you're telling it to. It's compiling array/pointer dereferencing and indexing arithmetic, so an indexed indirect addressing opcode is the most perfect output code.
Re: STA indirect indexed double-increments PPU address?
by on (#101281)
exdeath wrote:
It's doing exactly what you're telling it to. It's compiling array/pointer dereferencing and indexing arithmetic, so an indexed indirect addressing opcode is the most perfect output code.

Most perfect? No way, it should/could know that the address is constant and optimize accordingly.
Re: STA indirect indexed double-increments PPU address?
by on (#101282)
thefox wrote:
exdeath wrote:
It's doing exactly what you're telling it to. It's compiling array/pointer dereferencing and indexing arithmetic, so an indexed indirect addressing opcode is the most perfect output code.

Most perfect? No way, it should/could know that the address is constant and optimize accordingly.


:mrgreen:
Re: STA indirect indexed double-increments PPU address?
by on (#101283)
Heh, mine is now just:
Code:
#define RAW_BUS(x) (*(unsigned char*)(x))

Am I the only one who likes to use the registers by number instead of naming them?

With cc65's weak optimizer, volatile isn't really capable of doing anything, but the sentiment is right, semantically.

How does the compiler like that struct, TheFox? Does it manage to reduce 0x2000 + offset at compile time? (Edit: apparently it does! Turns into the STA $2007 it deserves.)

Also, nobody has any insight as to what is special about STA (zp), Y? That was the question I was most interested in. Why does it generate an extra increment?
Re: STA indirect indexed double-increments PPU address?
by on (#101285)
rainwarrior wrote:
Also, nobody has any insight as to what is special about STA (zp), Y? That was the question I was most interested in. Why does it generate an extra increment?


Yeah... not sure. Whats the 6502 bus doing on that opcode? Maybe the ZP accesses for the base address cause dummy bus accesses that confuse the PPU and toggle a false write and increment?
Re: STA indirect indexed double-increments PPU address?
by on (#101286)
STA (xx),Y adds a dummy read.
Code:
        1      PC       R  fetch opcode, increment PC
        2      PC       R  fetch pointer address, increment PC
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        6   address+Y   W  write to effective address

They did it this way in case they needed to fix up the high byte before performing a write, because they figured that reads wouldn't have side effects like writes would.
Re: STA indirect indexed double-increments PPU address?
by on (#101287)
exdeath wrote:
It's doing exactly what you're telling it to. It's compiling array/pointer dereferencing and indexing arithmetic, so an indexed indirect addressing opcode is the most perfect output code.

One of the primary advantages of C over assembly is that the compiler is able to pick from equivalent implementations of a statement, so that it can do "what's best" for the situation. There really isn't such a thing as "exactly what you tell it". That's an assembly programming concept, not a C concept.

Frankly I'm a little disturbed that cc65 isn't able to tell the difference between a static pointer and a variable in this case. It's really weird too, because if I create a named static array (via assembly/linker), it manages to reduce just fine into an absolute address. It's this strange case where ((unsigned char*)x) isn't treated as a static pointer with the [] operator. Not intuitive at all.

For instance if I do something like this:
Code:
.segment "PPU_REGISTERS"
_ppu_register: .res 8
.export _ppu_register

I can get well behaved results from something like:
Code:
extern unsigned char ppu_register[8];
ppu_register[7] = a;


It's only when using a number literal cast to an address that it has problems with []. As TheFox pointed out, you can cast it to a struct and it has no problem at all!
Re: STA indirect indexed double-increments PPU address?
by on (#101289)
Thanks Dwedit. Is that copied from a reference somewhere? (I'd like to read it, if it exists.)
Re: STA indirect indexed double-increments PPU address?
by on (#101290)
It's from this file:
http://nesdev.com/6502_cpu.txt

Yeah, there's a bunch of files linked from the main page of the site. But this one looks like the best for knowing what the CPU is actually doing.
Re: STA indirect indexed double-increments PPU address?
by on (#101292)
exdeath wrote:
thefox wrote:
exdeath wrote:
It's doing exactly what you're telling it to. It's compiling array/pointer dereferencing and indexing arithmetic, so an indexed indirect addressing opcode is the most perfect output code.

Most perfect? No way, it should/could know that the address is constant and optimize accordingly.


:mrgreen:

Wat.
Re: STA indirect indexed double-increments PPU address?
by on (#101295)
Dwedit wrote:
STA (xx),Y adds a dummy read.
Code:
        1      PC       R  fetch opcode, increment PC
        2      PC       R  fetch pointer address, increment PC
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        6   address+Y   W  write to effective address

They did it this way in case they needed to fix up the high byte before performing a write, because they figured that reads wouldn't have side effects like writes would.


Trying to think of a need that can abuse this where you'd want interleaved writes to CIRAM :twisted:
Re: STA indirect indexed double-increments PPU address?
by on (#101296)
DMC will still screw with the read.
Re: STA indirect indexed double-increments PPU address?
by on (#101302)
exdeath wrote:
Trying to think of a need that can abuse [extra read] where you'd want interleaved writes to CIRAM :twisted:

As I remember, the behavior depended on the CPU-PPU clock alignment at power, that it wasn't reliably the same each time.
Re: STA indirect indexed double-increments PPU address?
by on (#101314)
Dwedit wrote:
STA (xx),Y adds a dummy read.
Code:
        1      PC       R  fetch opcode, increment PC
        2      PC       R  fetch pointer address, increment PC
        3    pointer    R  fetch effective address low
        4   pointer+1   R  fetch effective address high,
                           add Y to low byte of effective address
        5   address+Y*  R  read from effective address,
                           fix high byte of effective address
        6   address+Y   W  write to effective address

They did it this way in case they needed to fix up the high byte before performing a write, because they figured that reads wouldn't have side effects like writes would.

That's... pretty stupid, couldn't they have made it so that the bus was left unused in that 5th cycle? Pretty sure that reads with side-effects were already common when the 6502 was first designed =/
Re: STA indirect indexed double-increments PPU address?
by on (#101316)
Sik wrote:
couldn't they have made it so that the bus was left unused in that 5th cycle?

I'm sure they did everything they could to keep costs down when designing the 6502, so you can be certain that this decision was made to reduce the number of transistors.

Quote:
Pretty sure that reads with side-effects were already common when the 6502 was first designed =/

Yes, but they probably assumed that the more exotic addressing modes wouldn't be commonly used to access memory-mapped registers.
Re: STA indirect indexed double-increments PPU address?
by on (#101319)
tokumaru wrote:
Sik wrote:
couldn't they have made it so that the bus was left unused in that 5th cycle?

I'm sure they did everything they could to keep costs down when designing the 6502, so you can be certain that this decision was made to reduce the number of transistors.

Quote:
Pretty sure that reads with side-effects were already common when the 6502 was first designed =/

Yes, but they probably assumed that the more exotic addressing modes wouldn't be commonly used to access memory-mapped registers.


Or auto incrementing single address FIFOs. This is both the fault of the 6502 and the PPU combined, not just the 6502.
This exact combination of CPU, addressing mode, and PPU port is like winning the lottery or something.
Re: STA indirect indexed double-increments PPU address?
by on (#101321)
exdeath wrote:
This exact combination of CPU, addressing mode, and PPU port is like winning the lottery or something.

Don't forget I also did it by discovering a compiler bug!
Re: STA indirect indexed double-increments PPU address?
by on (#101324)
It wasn't a bug, it's what you wrote :D cc65 did exactly what you told it to.
Re: STA indirect indexed double-increments PPU address?
by on (#101328)
Subscripting in C is commutative. If the code that a compiler generates for a[b], b[a], and *(a+b) differs with optimization turned on, then either A. the compiler is Doing It Wrong with respect to efficiency of commutative operations in general or B. you are coding in C++ and have overloaded some operator.
Re: STA indirect indexed double-increments PPU address?
by on (#101329)
Or C, the optimizer is simplistic and takes advantage of the fact that most people write array[n] instead of n[array]. That is, it generates better code for most array expressions, and that's good enough. Handling it fully generally would be more work just for obscure cases.
Re: STA indirect indexed double-increments PPU address?
by on (#101330)
tepples wrote:
Subscripting in C is commutative. If the code that a compiler generates for a[b], b[a], and *(a+b) differs with optimization turned on, then either A. the compiler is Doing It Wrong with respect to efficiency of commutative operations in general or B. you are coding in C++ and have overloaded some operator.


The problem here is there's a difference between array[7] and pointer[7]. One of them has a fixed address at compile/link time, and one of them needs to be resolved by extra code. cc65 normally correctly identifies these two types, and does generate different code for the array (absolute) and pointer variable (indirect indexed). This is neither incorrect, nor undesired, and does not require optimization to be turned on.

In the ((unsigned char*)0)[7] example, it is compiled as if this literal is cast to a pointer variable, rather than an array, and does all the things associated with such a thing. Yes, all arrays can be generically considered pointer variables, but that is a generalization which would generate a lot more (bigger/slower) code than necessary. This isn't really part of the optimization process; this is a problem further up the pipe. If the type is misidentified, you can't optimize away the indirection.

Anyhow, I'll report this to the cc65 mailing list, since someone on the project might be interested in fixing this problem. If not, we've covered a few ways to avoid it already. This was just a case of very poor code generation, which I do consider a bug, but I've no wish to argue the semantics of what should or should not be classified a bug. Yes the code is correct (when the extra read has no side-effect), but it's also slow as hell compared to what it could be.
Re: STA indirect indexed double-increments PPU address?
by on (#101360)
I just remembered this quirk of the 6502. More specifically:
Code:
        When an NMI occurs, the processor jumps to Kernal code, which jumps to
        ($0318), which points to the following routine:

        DD09    LSR $40         ; clear N flag
                BPL $DD0A       ; Note: $DD0A contains RTI.

        Operational diagram of BPL $DD0A:

          #  data  address  R/W  description
         --- ----  -------  ---  ---------------------------------
          1   10    $DD0B    R   fetch opcode
          2   11    $DD0C    R   fetch argument
          3   xx    $DD0D    R   fetch opcode, add argument to PCL
          4   40    $DD0A    R   fetch opcode, (fix PCH)

Not to mention all the spurious accesses that can happen everywhere when crossing pages (lower byte is updated in a different cycle than the higher byte but the bus is still taken by the processor). Yeah, it's a mess. Looks like they literally assumed reads would never have side effects.

tepples wrote:
Subscripting in C is commutative. If the code that a compiler generates for a[b], b[a], and *(a+b) differs with optimization turned on, then either A. the compiler is Doing It Wrong with respect to efficiency of commutative operations in general or B. you are coding in C++ and have overloaded some operator.

The standard doesn't require compilers to generate the most optimal code, only to ensure the final results are correct =P Though one could argue that the 6502 quirk here prevents it from being correct... (though we're starting to enter the realm of platform-specific hacks).