This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

OAM cycling on hypothetical 15-sprite PPU with X as priority

OAM cycling on hypothetical 15-sprite PPU with X as priority
by on (#216036)
Kenny Rogers won a Grammy for the song 'The Gambler', in which Don Schlitz wrote:
Said "If you're gonna play the Game Boy, you gotta learn to play it right."

NES games use a display list in RAM, also called shadow OAM, and then DMA that to the PPU during vertical blanking period. OAM DRAM refresh bugs in the PPU make this necessary even when few sprites move from one frame to the next.

But apart from sprite 0, I've gathered that it's also considered best practice not to hardcode a particular game object's starting position in the display list. For example, don't always draw the main character using sprites 1-8, enemy 1 using sprites 9-14, enemy 2 using sprites 15-20, etc. Instead, NES games are supposed to reassign slots every frame, especially if there's a possibility that more than eight will be displayed on the same scanline. This also allows the game to make enemies intentionally Z-fight if they overlap.

But in the community for another 8-bit platform, it's common to hardcode OAM indices for actors, and I'm trying to understand why they do that. The best way I know to do so is to generalize from what I know, and I know the NES. So for comparison, I'll describe the other platform's PPU as if it were a famiclone:

  • Like the AVS, this famiclone's PPU has enough secondary OAM to draw 15 sprites per line instead of 8, using the alternate fetch pattern I described in Enhancement#Overdraw. Thus sprites can cover nearly half of the screen's width rather than one-fourth.
  • The PPU determines which sprites to draw by finding the 15 lowest-numbered sprites in OAM whose Y range overlaps each scanline, just as the authentic NES PPU does. But then it sorts these frontmost 15 sprites by their X coordinate before displaying them. Sprites to the left are drawn in front, with position in OAM only breaking ties.

On a hypothetical famiclone like this, with more overdraw and less control of sprite-to-sprite priority, would it be less of a bad practice to statically allocate OAM space?
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216038)
It would be less of a bad practice, but still not really great.
The second problem is that you have to use meta-sprites, and fixing how many metasprites an object can have is limitating.

Other than that I don't think there's any problem, assuming objects still have logical X/Y coordinates stored separatedly and copied to OAM every frame. How does most GameBoy and GameBoyColor games handle this ?
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216039)
Yes, less bad, because the bad thing you want to avoid is

1) cancellation of the rendering of information that is critically important to the user on a frame-for-frame basis.
2) cancellation that offends the users' subjective, aesthetic sensibilities.

imo in that order.

Both are less likely to occur with a more tolerant coverage per scanline.


but honestly, there's no universal law that says what is best/worst practice. If the game relies on z-depth, you can't have universal cycling. Examples: any game that is isometric. any game that relies on depth as a mechanic. any game that is using sprites for background decoration (functional or otherwise). Ideally, you don't want an owl to z-compete with the moon. If you think presentation issues like this are important to adress, you need to divide cycling into tiers, or create special clauses. Assume the moon is high up and most entities are ground bound, it'd be perfectly safe to keep said moon at the lowest z-priority, kept apart from the rest.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216040)
How can there be a community for a hypothetical 8-bit system that doesn't exist?

I know the people on smwcentral do this, and the explanations I've seen are all like "because rewriting non-animated non-moving sprites will cause slowdown." Even though they end up causing more slowdown by searching through inactive sprites for large objects like Banzai Bills and Big Boos.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216041)
This is about the Gameboy, right? How the hell do they do sprite cycling if sprite priority is dictated by the X coordinate? Won't sprites to the right ALWAYS get shafted?
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216042)
I don't think drop out on the Game Boy has to do with sprite priority. Although I could swear I remember Game Boy games that use sprite priorities. Are there no Final Fight style beat'm'ups on the Game Boy?
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216044)
tepples wrote:
But in the community for another 8-bit platform, it's common to hardcode OAM indices for actors, and I'm trying to understand why they do that. The best way I know to do so is to generalize from what I know, and I know the NES. So for comparison, I'll describe the other platform's PPU...

Once again tepples is being intentionally obtuse about something's identity for mysterious reasons. :P

I would say it's quite common for new NES developers to hardcode OAM indices too. It's just that they probably pretty quickly get beat back by the problems that creates. On a system with wider tolerance you wouldn't necessarily create a problem at all.

Overlap is often a big problem in a platform game with a flat plane of play, since gravity brings everything to the same level. But... 4 sprites on one platform is usually a pretty busy situation in NES games already... 8 sprites? How crowded is this?
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216047)
Bregalad wrote:
It would be less of a bad practice, but still not really great.
The second problem is that you have to use meta-sprites, and fixing how many metasprites an object can have is limitating.

Super Mario Bros. limits each enemy to six 8x8 pixel sprites. Bowser and Fake Bowsers are the only enemies I can think of that use two enemy slots to circumvent this.

Haunted: Halloween '85 reserves 16 tiles of CHR RAM for each enemy's current frame (and 16 for its next frame; it's double buffered). This practically limits each enemy to eight 8x16 pixel sprites per cel. The second, fifth, and sixth bosses use two slots, just as SMB1 does for Bowser.

Another option that's still kinda-sorta hardcoding is to store an actor's starting OAM index in one of the actor's fields, just as you'd store its logical X and Y coordinates, velocity, health, etc.

Bregalad wrote:
Other than that I don't think there's any problem, assuming objects still have logical X/Y coordinates stored separatedly and copied to OAM every frame. How does most GameBoy and GameBoyColor games handle this ?

Game Boy Color I think reverts to the NES behavior of relying entirely on the OAM index to determine priority. The "X determines priority" is a monochrome thing. But even if an actor's logical coordinates are separate, the OAM slots into which those coordinates are copied every frame can still be statically allocated.

psycopathicteen wrote:
How can there be a community for a hypothetical 8-bit system that doesn't exist?

The "other 8-bit platform" exists. I mentioned the Game Boy by name in the song lyric I quoted at the top of my post. I was describing properties of the system using a famiclone analogy, as if an AVS in extra sprites mode were to apply the same sprite priority algorithm described in the Game Boy Pan Docs.

rainwarrior wrote:
intentionally obtuse about something's identity for mysterious reasons

My intent in making the famiclone analogy was twofold. I wanted to explain the problem to people who are familiar with how sprites work on the NES, in order to know how a particular change (more coverage and different priority) would change programming practices. I also wanted to help keep to the topic of OAM index hardcoding in the context of more coverage and X-as-priority, not derail it into a flame war about other unrelated differences between the Game Boy and the NES or between the norms of gbdev.gg8.se and those of NESdev.com. If you would prefer, I could reword this and repost it in GBDev.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216048)
tepples wrote:
My intent in making the famiclone analogy was twofold. I wanted to explain the problem to people who are familiar with how sprites work on the NES, in order to know how a particular change (more coverage and different priority) would change programming practices. I also wanted to help keep to the topic of OAM index hardcoding in the context of more coverage and X-as-priority, not derail it into a flame war about other unrelated differences between the Game Boy and the NES or between the norms of gbdev.gg8.se and those of NESdev.com. If you would prefer, I could reword this and repost it in GBDev.

I think there would have been less irrelevant chatter if you'd just said "game boy" instead of made a big show of not saying it, inviting people to comment on the way you danced around it.

I don't give a hoot which forum this appears in, but it's NES dev enough to be where it is. I don't see the point in moving it.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216060)
If you can do fixed allocation then you do fixed allocation. It saves a lot of clocks and makes code a lot easier. If you can't get away with fixed allocation then you do dynamic allocation.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216089)
What speed benefit does this have?
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216094)
psycopathicteen wrote:
What speed benefit does this have?

Fixed addresses do not require indexing. Use of an index register has a cycle penalty for some instructions, as well as the general penalty of having one less register to work with.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216096)
Beginner tutorials often start with controller input directly manipulating OAM slots' properties, and end up never going into advanced sprite topics like meta-sprites or priority cycling, so it's no surprise that a lot of "first games" end up using hardcoded OAM positions for its game objects.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216100)
You would only be allowed one copy of an object at once unless you duplicate code.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216103)
I know it sucks, but beginners do it. Copying and pasting is often easier to grasp than indexing and indirection for a beginner.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216104)
psycopathicteen wrote:
You would only be allowed one copy of an object at once unless you duplicate code.

Not necessarily. You could, for example, statically allocate up to 6 OAM indices to each of eight actor slots (16-63) and 1 to each of fifteen bullet slots (1-15), leaving sprite 0 open for split use. I'm pretty sure Balloon Fight does something like this.

Code:
  ; Calculate this actor's starting index into OAM
  lda cur_actor
  asl a
  adc cur_actor ; A = cur_actor*3
  asl a
  asl a
  asl a  ; A = cur_actor*24
  adc #64
  tay

  ; Alternate method
  ldx cur_actor
  ldy actor_to_oam_index,x

actor_to_oam_index:
  .byte 64, 88, 112, 136, 160, 184, 208, 232


The advantage of a 1:1 mapping between actor slots and OAM indices is you can save cycles by working more directly with shadow OAM in response to things that rarely change:

  • Only having to change the attribute in shadow OAM when facing direction changes
  • Only having to change the tile number when the object changes to the next cel
  • Only having to change the position when the object moves, especially in non-scrolling games. It'd even be possible to handle camera movement for stationary objects by adding the displacement since last frame to the coordinates of all sprites 1-63.

I was just curious about at what point this simplication to save CPU time overrode the benefits of cycling, and whether that point varied with a PPU's coverage capability and priority policy.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216107)
TBH I think the CPU cycles question is more of a red herring.

People don't implement OAM cycling as an intentional tradeoff between performance and the ability to flicker. The cycling is considered a necessity for its visual functionality. Performance is just collateral damage.

People who don't implement OAM cycling aren't doing so to save cycles. They do it because it's simpler to implement. Games you find that do it aren't generally high performance games.

Burger Time is an example of a commercial NES game that doesn't do it:
https://www.youtube.com/watch?v=TcPXTwXKkSE

Why doesn't it do it? Well, there are only 6 enemies allowed at once, and they're all 1 tile wide. This leaves 2 tiles for the chef. The falling buns are sprites but given low priority, allowed to drop out, and rightly so, because they are the least important for gameplay (always below player, quickly return to being a nametable detail after falling). None of this has anything to do with performance, it was written this way because the game's needs were simple, and implementing static OAM is also simple (and has advantageous priority control).
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216138)
I think there a few techniques getting lumped under one banner, which is cause multiple parties to get confused and cross point each ;)

Having Entity type X = RAM address Y to me is Static Allocation. Its means you only ever have one active at one time and allocation/update is super fast. So you might have clones in the Entity type table to handle having more than one of the same object. Mario Kart Battle for example, 4 players and 3 balloons, fixed alloc it why bother doing anything else ;)

Having Spawn Entity X @ Sprite Y to me is Fixed Allocation. This is when you walk though your data structure in the level editor and work out where each entity will have its sprites. For example if I was porting Super Mario Bros to the C64, I would use this method to spawn enemies. As you can only walk left and hence the trigger order and number of entities on screen is mostly fixed and known at all points.Allowing me to handle things that will walk off vs something like a hammer bros that is fixed to an area but needs 2 extra for hammers. So I can just put the sprite number in the level data. Zero sorting, zero hunting for allocation and my level editor tool can check and flag instances where I "sprite out". Eats some RAM though as the level data is now larger.

Having an Entity set to a "slot" at allocation by the code, this is Bucket Allocation. The idea is you divide your sprites into groups, 6 sprites for example. Then when you spawn an entity you request the next free group, or next free run of groups if you need n. This speeds up allocation and hunting, and helps combat fragmentation. As you can when you remove one, copy the last sprite into the empty bucket but potentially wastes space, as if you only need 4 sprites you still alloc 6.So you can get the situation where there are enough free sprites but no buckets. You might have different buckets or pools, so bullets will have their allocation, small entities, large entities as needed etc

There are other special cases like Ring Buffer allocation. For example Squid Jump uses a Ring buffer allocate, as the appearance and disappearance are in a fixed order, so I either add at the head or the tail and them remove at the head or the tail ( if you collect a pickup - I just hide the sprite but keep it in the buffer until it is culled as it goes off screen )

As always there are hybrid methods ;) Bullets being Ring Buffer while main entities are bucket, while special case enemies are Static alloc etc

as rainwarrior says, Sprite cycling is a separate problem, done because you need it, or not if you don't. you can alloc your OAM one way as per above but not copy it that way to OAM RAM. So you can add an extra step the shuffle the OAM before DMA eats more clocks, but then having each entity no have to look up where and what order its sprites are in every frame might save you more in the long run. Depends on what your game needs. For example you could store the OAM stripped, so 64 X, 64 Y, 64 Tile, 64 Attribute this way a ent can modify each of them with a single ,x and no + 4 maths to get to the next. Then your code does a LSFR to step through all 64 in a random order as you copy the stripped to OAM format. Or you use a LSFR to change the "bucket" order when you copy etc etc
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216146)
The way I've been doing it is having a routine setting up what order objects are drawn into OAM, and then drawing the sprites in that order. I have 8 priority levels, and within each priority level objects alternate between forward and reverse drawing order.

If I want to add a psuedo 3D level, I would need to think of a more sophisticated priority system though.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216654)
Here's a question. Is there any correlation between slowdown in games and oam allocation?
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216657)
If by allocation you mean selecting which slots to use for which sprites, then I don't think it contributes significantly to slowdowns. Processing the metasprite entries themselves might take a good amount time, as the NES isn't particularly good at bulk data processing. Also, having many sprites on screen usually means that there are many active objects, which will definitely contribute to slowdowns.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216661)
I never knew OAM allocation was even a thing outside of smw hacking. VRAM allocation makes more sense, because, unlike the OAM, you can't update the whole VRAM in one vblank frame. If the SNES had VRAM access during the entire frame, I think I would've just DMAed almost everything onscreen every frame, like what I do with the OAM, except for maybe bullets and explosion frames.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216686)
You probably allocate VRAM about 4 times in a game. If that. Its pretty static, Screens here, Chars there, sprites there. If you change modes then you need to reallocate it but that is about it. Palettes on the other hand probably get moved around a bit. The Sprite tiles you either keep fixed or you have "slots" that you can copy data into, and then frames are copied over the top of the previous frames.
OAM is a constantly changing highly volatile resource that needs constant management to ensure you can alloc resources as you need for a given frame.

Although you are thinking SNES, this is the NES portion to which their VRAM is in ROM and hence very statically allocated ;)
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216692)
Oziphantom wrote:
The Sprite tiles you either keep fixed or you have "slots" that you can copy data into, and then frames are copied over the top of the previous frames.

Which can get pretty complicated.

Quote:
OAM is constantly changing...

Which is why I rebuild OAM every frame.
Re: OAM cycling on hypothetical 15-sprite PPU with X as prio
by on (#216693)
Oziphantom wrote:
Although you are thinking SNES, this is the NES portion to which their VRAM is in ROM and hence very statically allocated ;)

Try Haunted: Halloween '85 once. Run its demo with the PPU viewer open and marvel at how it double-buffers enemy cels.