This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Sprite OAM issue? (updated: bug in own code)

Sprite OAM issue? (updated: bug in own code)
by on (#108164)
*edit* Sorry for the double post on this, folks. Everything below constitutes me misidentifying a very simple bug in my own code. Perhaps it is good to leave posts like this in the forum for others to learn from :)

*edit* To avoid spamming the forum with an additional thread I'll just update this one. I've since weeded out OAM as the source of the bug I've described. I think it may actually be a bus conflict bug. I use the wiki's recommended best practice for switching banks with UnROM:

Code:
 .segment "CODE"
 bankswitch:
     lda banktable, y        ;read a byte from the banktable
     sta banktable, y        ;and write it back, switching banks
     sty current_bank        ;store the current bank in RAM
     rts


And, in one part of my code, I'm doing this inside a loop with rendering off, but also at the end of nmi to update the sound engine. I'm switching to the same bank in all cases, yet I've determined that sometimes I am switched to the wrong bank---but only in the main thread. Thus, it seems like allowing the bankswitch code to get interrupted by nmi can screw up the mapper. I'm certain that I saw a comment by tepples about this some many months ago, but I was not able to find it.

Original post (with bug misidentified):

One bug that shook out of investigating scroll glitch hiding has me a bit puzzled. When I load a location on a map, at a high level, I am currently doing this:

-turn off graphics (both bg and sprites) (assume palette is all black, safely, by now)
-upload all nametable data we want to see when we fade the palette back in. Each chunk of nametable data (row or column) is also accompanied by a sprite OAM command (this stuck around from being adapted from a vblank routine)
-turn on graphics safely and then fade in the palette

So, 99% of the time, this works. But there's a strange bug there where nametable data gets corrupted. I've narrowed it down (I think) to calling sprite OAM with graphics and sprites disabled. When I protected the sprite OAM with a flag like the rest of my data, and only use it when actually ready to show something, the bug I mentioned goes away.

I know I read some comments in a few threads about sprite OAM itself getting corrupted with bg and sprites disabled, but in this case other data in the PPU seems to get corrupted.

The weird thing is, this bug was not present before adding all the extra cycle padding for hiding scrolling glitches. This makes me wonder if it is a pretty rare edge case and is simply working by coincidence (without the new code), and I *should not* be using sprite OAM outside of vblank (with gfx and sprites off), under any circumstances?
Re: Sprite OAM gotcha with rendering disabled?
by on (#108166)
There's some sort of OAM refresh bug. They talked about it in your scrolling forum post IIRC. You need to disable at the end of the scanline I believe. Don't know the specifics. Did you scour the wiki?
Re: Sprite OAM gotcha with rendering disabled?
by on (#108167)
I haven't found much information that seems to match the precise situation I ran into, which involves updating OAM multiple times in a row with graphics turned off (causing nametable artifacts). It seems to be a very rare edge case. I had to put FCEUX on 400% speed and go in and out of a house in my game at least 20 times to see it happen. I saw it happen on a real system a few times, as well. When I avoid updating OAM outside of vblank (again, with graphics OFF, I never thought this was a problem before), the problem vanishes. I guess someone must know about this bug because it appears to be emulated.
Re: Sprite OAM gotcha with rendering disabled?
by on (#108168)
Edited. And why would you update OAM lots of times out of VB? Sounds like a complete waste.

ETA: Nevermind. I read it wrong. Any video of it? Does it happen on FCEUX ever?
Re: Sprite OAM gotcha with rendering disabled?
by on (#108169)
3gengames wrote:
Edited. And why would you update OAM lots of times out of VB? Sounds like a complete waste.


I recently adapted my column/row upload routine. I wanted to be able to indirectly call this routine from vblank during gameplay, or call it while loading a full screen of graphics with the palette turned off. I left the call to upload sprite OAM so that I didn't need to either protect OAM with a flag or make a new version of the ppu upload routine, thinking it was harmless. The transition looked very smooth/fast. But---apparently it was not harmless! It is fascinating what sort of bugs shake out during development.

As for the scrolling registers, that was my first suspect for this bug, because I've seen issues like that before. In fact, I suppose it is possible this is still working by coincidence and it is still something to do with when I am updating the scrolling registers. I'm not 100% sure! I do know I cannot reproduce the bug anymore, though. Is it okay to update scrolling registers with rendering turned off? I had been under the impression for a long time you could do anything you wanted with graphics off, except update the palette. *edit* 3gengames, why did you remove your comment, it was relevant, I think.

As for video of it, yes it happens in FCEUX. You can actually see it happen once in the video I uploaded in the scroll glitch hiding thread. I may have to annotate the video to point it out. When I get time tomorrow I will do this or maybe you will spot it easily.

*edit*

Bug annotated here: The bug

I did look for a while at where I was updating scroll registers. In padding cpu cycles for glitch hiding, I thought I might have been putting updates to $2006 and $2005 too close to the end of vblank. But I modified this so the padding comes after modifying the scroll regs, so I don't think that can be the bug (it still behaved roughly the same). I still can't reproduce this bug after avoiding all unnecessary OAM updates. I'm happy obviously but if anyone knows more about this it'd sure be interesting to learn.
Re: Sprite OAM gotcha with rendering disabled?
by on (#108196)
If FCEUX does it too then doesn't that make it more likely a bug in your own code? Because if this was a hardware bug I doubt FCEUX would be emulating it (given how it would seem to be a really edge case).
Re: Sprite OAM gotcha with rendering disabled?
by on (#108199)
You may be right. I'm still curious about what is going on here. While I can't reproduce it anymore, I don't think I'm done with it. *edit* You're right. It is a bug in my code. I can reproduce it by padding cpu cycles where I had been doing the unnecessary OAM updates. Thanks for encouraging me to reconsider what the problem was! *edit* I think I may have determined that this is a bus conflict issue, I've updated the OP with more information.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108346)
So, what was the problem, exactly?

Were you getting an occasional NMI directly between these two lines?
Code:
sta banktable, y
sty current_bank 


i.e. it's already switched to a new bank, but the code that now runs in NMI thinks it's still at the old bank and doesn't switch back to it like it should?
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108349)
This is a duplicate of bg and sprites off, nmi on, main thread writes to PPU. It was the same bug. On the other hand, thinking about possible mapper issues helped me improve the robustness of my UnROM macros. For example, I previously had this:

Code:
.macro switch_bank_ldy bank
  ldy bank
  lda bank_table,y
  sta bank_table,y
  sta current_bank
.endmacro


But, since I have bankswitching both in the main thread and vblank routines, I thought I could run into a problem where the vblank routine might save current_bank on stack, switch to the bank vblank needs, then pull current_bank and switch back to it all before the main thread does a sta current_bank (potentially returning to main thread with wrong bank selected). So I moved to this:

Code:
.macro switch_bank_ldy bank
  ldy bank
  sty current_bank
  lda bank_table,y
  sta bank_table,y
.endmacro



Which should elminate that potential problem. So, while this was not the bug, thinking about possibly having a mapper issue helped me improve my code anyway.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108351)
Furthermore, saving Y keeps the code correct when you rearrange the banks for watermarking (5040 possibilities for banks 0-6 in UNROM or 1.3 trillion for UOROM).
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108352)
tepples wrote:
Furthermore, saving Y keeps the code correct when you rearrange the banks for watermarking (5040 possibilities for banks 0-6 in UNROM or 1.3 trillion for UOROM).


Nice, didn't think of that. I'm hoping to look into watermarking sometime soon in order to recruit some beta-testers. One idea I had floating around was to find sequences of code whose order do not matter, re-order them uniquely for each beta tester, and then keep track of which beta tester is associated with which harmless permutation of the code. Even if someone tampers with the rom, they are unlikely to un-do watermarking like this since it is totally inconspicuous. I know there had been a discussion of watermarking in depth here some months back; not sure if that was one of the proposed ideas or not.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108353)
I already wrote a very customized too set for windows to scramble all my stuff. (although it'll most likely get rewritten for python/bash/linux when I write my NES assembler) It scans the main code to look for html-like tags like <is> (insert subroutines) and then randomizes what subroutines to put there. It happens multiple places in the code so it'll be all spaced out Then there's other codes that shove ascii cart number data in to it and then the final dump routine which just puts all subroutine not put in to the source. Then I compile a ROM for every program generated. Collect the ROMS, split them all, and put them in to 2 folders numbered. I do this to make sure game genies don't work, and I can easily identify the ROM from which it came from and blacklist them from purchasing any more content in the future. Although it's not something I'd give out. But I may make a program on Linux later accompanied by my assembler (if I ever decide to create one/when I decide to) to do it. But that's how I do it. With some C programming and some batch and a lot of passing numbers, it isn't too bad to program honestly. It took me like 2 hours to get it doing all that stuff first rewrite.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108354)
Do you need current_bank at all? Why not just always switch to the requested bank?
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108356)
rainwarrior wrote:
Do you need current_bank at all? Why not just always switch to the requested bank?


I use current_bank to allow my sound update routine to run at the end of vblank and not have to know anything about what the main thread is doing. That's the main reason I have it. And, I mentioned pushing current_bank on the stack in vblank---I actually removed that push/pull as well, since all I need to do is temporarily switch to the music bank, then restore current_bank for the benefit of the main thread.

*edit* I forgot, I also use current_bank for various trampoline routines that perform map collision for entities. I don't want the trampoline routine to only be able to be called from the current entity bank, for example. I actually have several entity banks, each which use this routine, so it makes a lot of sense to save/restore the current bank in these sorts of situations, as well as the sound update within vblank scenario I already mentioned.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108357)
GradualGames wrote:
One idea I had floating around was to find sequences of code whose order do not matter, re-order them uniquely for each beta tester, and then keep track of which beta tester is associated with which harmless permutation of the code. Even if someone tampers with the rom, they are unlikely to un-do watermarking like this since it is totally inconspicuous. I know there had been a discussion of watermarking in depth here some months back; not sure if that was one of the proposed ideas or not.

It was. I released a tool to do just that, and Concentration Room 0.02 uses it.

GradualGames wrote:
all I need to do is temporarily switch to the music bank, then restore current_bank for the benefit of the main thread.

I just added this NMI handler use case to the Programming UNROM page on the wiki.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108362)
GradualGames wrote:
rainwarrior wrote:
Do you need current_bank at all? Why not just always switch to the requested bank?

...

Another way to handle this is to save the current bank number in a known memory location in each ROM bank. Not really beneficial over using RAM though.

But yeah, I think it's very important to be aware of all sorts of concurrency issues that can arise even in the seemingly simple scenario of two threads (with a special case where only one of them (vblank) can interrupt the other) on a uniprocessor system.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108365)
You could save the bank in an assembler variable. I made a macro called setCodeBank, which allows code to check what bank it is in, plus the assembler can decide if a trampoline call is required, or if it can just jsr to a subroutine. (Each subroutine label is also tagged with a bank number.) This idea could be extended to data to some extent, but you would need to use a variable or mark the bank somehow.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108367)
Movax12 wrote:
You could save the bank in an assembler variable. I made a macro called setCodeBank, which allows code to check what bank it is in, plus the assembler can decide if a trampoline call is required, or if it can just jsr to a subroutine. (Each subroutine label is also tagged with a bank number.) This idea could be extended to data to some extent, but you would need to use a variable or mark the bank somehow.

Doesn't work when vblank needs to be able to restore the bank (e.g. when banking in a music handler), because the active bank (of the main thread) is only known at runtime.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108459)
I had something similar with bankswitching and interrupt happening mid-switch and causing a bankswitch to the wrong bank. I ended up turning off interrupts when doing the bank switch, and ensuring the switches happened during certain scanlines.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108485)
sdwave wrote:
I had something similar with bankswitching and interrupt happening mid-switch and causing a bankswitch to the wrong bank. I ended up turning off interrupts when doing the bank switch, and ensuring the switches happened during certain scanlines.


That sounds a bit dangerous, but then I suppose you're using a scanline counter, so it's probably ok. But if you're using a scanline counter, I'm not sure why you'd need to turn off nmi? Nevermind, I don't know enough details about what you're doing. Still, I don't think one should ever need to turn off nmi. graphics, yes, nmi, no. You can just make your nmi a no-op if you need to. (swap out an indirectly called routine?)

...I forgot one more point about making music bankswitching interrupt-safe, and that's if you do ever have moments in your game that cause slowdown---it protects your game from crashing when the sound update continues to run at normal speed every frame. Not that it absolves one from having slowdowns---it's just good to try to make code bullet proof if possible.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108487)
Turning off NMI is helpful in games using mappers with 32K bankswitching where not all banks have an NMI handler. Some banks might be full of data that gets copied or decompressed to RAM, such as banks holding CHR data in multicarts.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108488)
tepples wrote:
Turning off NMI is helpful in games using mappers with 32K bankswitching where not all banks have an NMI handler. Some banks might be full of data that gets copied or decompressed to RAM, such as banks holding CHR data in multicarts.


Cool, didn't think of that. With games setup like this, I assume there must be times when everything is stopped, including music?
Re: Sprite OAM issue? (updated: bug in own code)
by on (#108489)
GradualGames wrote:
tepples wrote:
mappers with 32K bankswitching where not all banks have an NMI handler

Cool, didn't think of that. With games setup like this, I assume there must be times when everything is stopped, including music?

Yes. After the player chooses a game from the multicart's menu, it blanks the screen and silences the APU before decompressing 8192 bytes of CHR data.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#111509)
thefox wrote:
Movax12 wrote:
You could save the bank in an assembler variable. I made a macro called setCodeBank, which allows code to check what bank it is in, plus the assembler can decide if a trampoline call is required, or if it can just jsr to a subroutine. (Each subroutine label is also tagged with a bank number.) This idea could be extended to data to some extent, but you would need to use a variable or mark the bank somehow.

Doesn't work when vblank needs to be able to restore the bank (e.g. when banking in a music handler), because the active bank (of the main thread) is only known at runtime.


Thinking..
Obviously code can know what bank it is and what bank it is calling at build - as long as NMI doesn't change the bank out from under it this is true. But bankable data cannot be tracked by the same way at build. Perhaps Lua scripting of the recent NintendulatorDX would be able to at least verify that the correct bank was being accessed and alert you if banking code was needed, though unless every code path was taken it wouldn't be able to prove that banking wasn't needed at a given point in code.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#111513)
GradualGames wrote:
tepples wrote:
Furthermore, saving Y keeps the code correct when you rearrange the banks for watermarking (5040 possibilities for banks 0-6 in UNROM or 1.3 trillion for UOROM).

Nice, didn't think of that. I'm hoping to look into watermarking sometime soon in order to recruit some beta-testers.

Note to beta testers wanting to leak ROM: reorder banks.

Quote:
One idea I had floating around was to find sequences of code whose order do not matter, re-order them uniquely for each beta tester, and then keep track of which beta tester is associated with which harmless permutation of the code.

So, analyze ROM and randomly reorder routines as well. Could be automated :)

Seems it'd be simpler to just put some junk code that's executed somewhere in the ROM that's different for each copy. Less risk of breaking something by reordering code.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#111514)
blargg wrote:
Seems it'd be simpler to just put some junk code that's executed somewhere in the ROM that's different for each copy. Less risk of breaking something by reordering code.

Then 2 beta testers can just compare their ROMs to locate the junk code and modify/clear it before leaking.
Re: Sprite OAM issue? (updated: bug in own code)
by on (#111518)
Or you can do like me and rotate the sources for each copy so unless they want to disassemble and rearrange all the subroutines and even meaningless/seemingly meaningless data, they won't be able to do that. :)