This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

True behavior of $2003/$2004 or FCE Ultra eccentricity?

True behavior of $2003/$2004 or FCE Ultra eccentricity?
by on (#32668)
What exactly is the behavior of $2003/$2004 on real NES hardware?

Any documentation that I've come across claims that you write the SPR-RAM address to $2003, and the byte you want to store to $2004. Okay -- that seems REALLY straightforward. But guess what? Clearly not everyone agrees on it.

Okay. Here's the situation:

I want to perform a DMA to SPR-RAM and then hide certain sprites by using $2003/$2004 and overwriting their Y-coordinates. I'm not using my entire "local" SPR-RAM page for sprites, and need to hide the sprites which overlap my other data.

Example:

Code:
lda #0
sta $2003
sta $4014  ; SPR-RAM <- $0000-$00FF

ldx #$FF  ; used to hide sprites (any value in $EF-$FF will do)

lda #35 * 4  ; sprite #35 @ Y-coordinate
sta $2003
stx $2004
.. (etc., for each sprite I want to hide)


I'm using FCE Ultra because it's the only emulator I've found that has reasonable debugging features. However, the way it responds to writes to $2003 is bizarre. I had to get into the source code to understand it, and here's the relevant portion:

Code:
static DECLFW(B2003)
{
                //printf("$%04x:$%02x, %d, %d\n",A,V,timestamp,scanline);
                PPUGenLatch=V;
                PPU[3]=V;
                PPUSPL=V&0x7;
}

static DECLFW(B2004)
{
                //printf("$%04x:$%02x, %d, %d\n",A,V,timestamp,scanline);
                //printf("$%04x:$%02x, %d\n",X.PC,V,scanline);
                PPUGenLatch=V;
                if(PPUSPL>=8)
                {
                 if(PPU[3]>=8)
                  SPRAM[PPU[3]]=V;
                }
                else
                {
                 //printf("$%02x:$%02x\n",PPUSPL,V);
                 SPRAM[PPUSPL]=V;
                }
                PPU[3]++;
                PPUSPL++;
}


If I understand the above right, you can only position ($2003) and write to ($2004) the first eight locations of SPR-RAM, which corresponds to the first two sprites (sprites #0 & #1). If I want to write to any other sprite, I must first write to address 7 (sprite #1's X-coordinate), then "make my way" to the "true" address (unless the sprite is even numbered and I want to access the Y-coordinate, in which case I'm there).

Example (doesn't work in FCE Ultra):

Code:
lda #$8C   ; (= 35 * 4,address of sprite #35's Y-coordinate)
sta $2003  ; PPU[3] = $8C, PPUSPL = $04

stx $2004  ; SPRAM[$04] <- $FF (and not SPRAM[$8C] <- $FF)


So, the quickest way to get to the intended address would be to:

Code:
lda #$87
sta $2003  ; PPU[3] = $87, PPUSPL = $07

stx $2004  ; SPRAM[$07] <- $FF (trash sprite #1's Y-coordinate)

stx $2004  ; SPRAM[$88] <- $FF (trash sprite #34's Y-coordinate)
stx $2004  ; SPRAM[$89] <- $FF (trash sprite #34's Tile Index)
stx $2004  ; SPRAM[$8A] <- $FF (trash sprite #34's Attribute)
stx $2004  ; SPRAM[$8B] <- $FF (trash sprite #34's X-coordinate)

stx $2004  ; finally, store out sprite #35's Y-coordinate


Obviously unacceptable. So my question is, what happens on real hardware? I looked at the source for two other emulators (Nintendulator & Nestopia) and neither of them handle $2003 the way FCE Ultra does (they just accept the sprite address as is, as one would expect). I'd try it myself, but I don't have any development hardware and it looks like the PowerPak from RetroZone is temporarily unavailable.

Unfortunately, my project cannot move forward until I can resolve this. Can someone please respond?

Thanks in advance, Carl

by on (#32670)
It's covered in the wiki.

by on (#32671)
Well, I don't know much about $2003 and $2004, as those are definitely the most obscure registers on the NES hardware even today. I just know FCEUltra isn't that accurate, so very likely it's a bug in the source, and Nintendulator and Nestopia are the 2 most accurate emulators arround, so its more likely they are right.

Also, Micro Machines relies on $2004 reads during rendering, that's really obscure, and so. I don't know but maybe FCEUltra did something hackish to get this (unlicenced) game to work even if it breaks the real logic.

by on (#32674)
IIRC, "Super Cars" also depends on $2004 reads, in order to blank scanlines at the top of the screen. It doesn't look right in FCEUXD, but works in nintendulator and Nestopia.

EDIT: The wiki doesn't explain much...

by on (#32675)
tokumaru wrote:
The wiki doesn't explain much...

There's not much to explain. What more is there to cover?
NesDevWiki wrote:
OAMADDR ($2003)
OAM address (write): Write the address of OAM you want to access here.

OAMDATA ($2004)
OAM data port (r/w): Write OAM data here. Writes will increment OAMADDR; reads won't.

by on (#32676)
FCEU is really showing its age. While it does have a very nice debugger, I wouldn't rely on the accuracy of its behavior.

Everything you need to know:


- $2003 sets the OAM address

- $2004 copies a byte to that address, then increments the address (on write)

- $4014 simply writes to $2004 256 times

- $2003 is changed constantly by the PPU during rendering.

- Turning the PPU off mid-rendering will leave $2003 in a semi-random state (its state can actually be determined -- but for your purposes it'd be a garbage value)

- After rendering, if the PPU is still on, $2003 finds its way back to a value of $00

- Sprite 0 and sprite 1 have a quick where they don't necessarily use the first 8 bytes of OAM, but instead they use the bytes of OAM "pointed to" by $2003. Example, if $2003=18, the sprite 0 will use OAM[$18-$1B] (same data as sprite 3) and sprite 1 will use OAM [$1C-1F] (same data as sprite 4), and OAM[$00-$07] goes completely unused. Note for this behavior to be exploited, you'd need to set $2003 appropriately each frame because it gets reset by the PPU (or does it? I'm still unsure about that). At any rate, I wouldn't do this anyway and I'd just set $2003 to zero every frame.

by on (#93170)
Another necrobump; only posting because I caught this issue in my own current project (more PowerGlovey goodness) and it tipped me off to what might be wrong, although I don't see evidence that the original poster accepted the given answer.

So.

On Nestopia, $2003 and $2004 behave precisely as naive-documented. Write the address in sprite-memory that you want to get at to $2003, then write your data sequentially to $2004. $2003 takes any old address, even unaligned addresses, and it's perfectly fine to jump straight to the desired sprite's Y coordinate and write that.

On hardware... oof. I haven't done thorough-thorough testing, but even when I reset $2003 every frame, I was getting not-wholly-deterministic behavior on sprites 0 and 1 when I jumped straight to the Y coordinate of sprite 1 and wrote 4 bytes in succession. I've also seen "sprite trails" from not resetting $2003 every frame and sprites 0/1 taking some function of the most recent data I wrote.

I seem to have the most success when I only $2003-jump to addresses on a $08 boundary (the Y coordinate of even-numbered sprites). $04 boundaries may be safe outside of sprites 0 and 1. Heck, arbitrary addresses may be safe and I'm just being paranoid. If you're developing on a system that you can't just quickly build and test against your target platform, you're going to have bigger problems than sprite indexing.

Also, goes without saying, keep your vblank timing in mind. DMA is fast, but only as fast as the CPU's load/store instructions sans the retrieve-next-opcode subcycles. If you're DMAing and then manually editing more than a few sprites, you may run out of vblank. Empirically, ~256 bytes is about what I've found I can change on a given frame without issues.

by on (#93180)
FCEU is definitely not accurate when it comes to sprites. In all versions I tried, not only $2003/$2004 had problems, but the whole sprite evaluation process appears to be broken. It seems that sprite emulation is pretty high-level in FCEU, and very different from the actual hardware.

by on (#93185)
LoneKiltedNinja wrote:
keep your vblank timing in mind. DMA is fast, but only as fast as the CPU's load/store instructions sans the retrieve-next-opcode subcycles.

Yeah, and if DMA supported $2007 as well as $2004, we would have had the equivalent of Blast Processing. This means we wouldn't have needed to use retrieve-next-opcode subcycles when copying patterns to CHR RAM or rows and columns of tiles to the nametables, and there might not have been as much of a need for the sort of fine-grained CHR bankswitching seen in MMC3-class mappers. Nintendo realized this and made the Super NES's DMA more generic.

by on (#93186)
If they had supported $2007 with DMA though they would have needed to add even more registers and logic to define size of transfer and source, etc. And they were determined to keep costs down.

by on (#93188)
MottZilla wrote:
If they had supported $2007 with DMA though they would have needed to add even more registers and logic to define size of transfer and source, etc. And they were determined to keep costs down.


Yeah, but then when you had to add so many mappers to swap out graphics better, it probably really hurt in the end.

by on (#93193)
tepples wrote:
Yeah, and if DMA supported $2007 as well as $2004, we would have had the equivalent of Blast Processing.

Would it be possible to implement such a DMA in a mapper? I guess that the mapper can't halt the CPU, so is there a way to work around that?

by on (#93194)
It is possible to DMA from PRG-ROM or RAM to CHR-RAM or VRAM (must be external VRAM) through a mapper. Depending on how it's constructed you don't even need to halt the CPU. It's not a cheap feature to add though since either a ton of I/O are required to multiplex two memories, or a FPGA with sufficient block RAM is required.

by on (#93195)
That's basically what happens in the MMC5's EXRAM--- there is no such thing as DMA - but the RAM is shared between both buses.

by on (#93220)
kyuusaku wrote:
It is possible to DMA from PRG-ROM or RAM to CHR-RAM or VRAM (must be external VRAM) through a mapper. Depending on how it's constructed you don't even need to halt the CPU. It's not a cheap feature to add though since either a ton of I/O are required to multiplex two memories, or a FPGA with sufficient block RAM is required.


At that point maybe you should just throw a fast CPU, maybe a 6502 core if not something more modern, in the cartridge with some shared memory to do all your heavy processing and just use the stock cpu for what it must do like input reading and uploading PPU updates and setting up the PPU registers each frame.

But at that point maybe you should be developing for a different system.