This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download


by on (#217437)

My objects keep their position in 16-bit, world space. When it comes time to draw them, I simply subtract the camera position and draw the metasprite.

how do you guys handle culling off-screen objects? There seems to be 2 levels :

  • First cull the object using a simple bounding box, to see if the whole thing is off camera.
  • If it is at least partially on screen, draw it, and cull the individual 8x8 tiles to avoid stuff wrapping around the screen.

Do you guys have a clever way of doing this while minimizing the number of 16-bit operations (ideally after the camera transform, everything would be back to 8-bit...).

Re: Culling
by on (#217445)
Assume the position is the top left corner. Assume the widest or tallest an object can be is 256 pixels. Assume that sprite offsets are always positive. (If you need negative offsets, add the maximum negative offset before drawing, and subtract it after drawing. Then offset your offset data accordingly so it can be exclusively positive.)

Subtract the camera position from the object's position and store the low byte. (The high byte is in A. I guess store this too if you don't want a lot of duplicated similar code.)

If the high byte is zero, add the width of the metasprite to the low byte. If the carry is clear, individual clipping is not needed for the sprites in that dimension. It is fully on screen for that dimension. Otherwise it is partially off screen on the right.

If the high byte is 255, add the width of the metasprite to the low byte. If the carry is set, the object is partially on screen on the left, otherwise it is fully offscreen. Return and do nothing if fully offscreen.

If the high byte was anything but 0 or 255, the object is fully offscreen. Return and do nothing.

The high byte check could look like this:
lda OBJxlow,x
sbc scrollxlow
sta templow

lda OBJxhigh,x
sbc scrollxhigh
beq partiallyonscreenleft
cmp #255
bne fullyoffscreen
lda templow
adc OBJwidth,x
bcc fullyoffscreen
;If here, object is partially on screen on the right

lda templow
adc OBJwidth,x
bcc fullyonscreenx
;If here, object is partially on screen on the left

;If here, the object is fully on screen in the X dimension


If you know the object is fully onscreen, you can just add and store your offsets.
If you know the object is partially on screen with a high byte of zero, you add the low byte to the offset, and branch on carry set to not drawing the sprite.
If you know the object is partially on screen with a high byte of 255, you add the low byte to the offset, and branch on carry clear to not drawing the sprite.

So: You use the high byte of the initial 16bit subtract to find out if the object is fully on screen, fully offscreen, partially off screen on the right, or partially off screen on the left.
If fully offscreen, you don't even need to check the other dimension. No drawing should happen.
If fully on screen, you only need to add the low byte to the offset and store it. No high byte check at all.
If partially off screen on the left, you need to add the low byte to the offset but only store if the carry is set.
If partially on screen on the right, you need to add the low byte to the offset but only store if the carry is clear.

Obviously this requires multiple drawing algorithms for the fastest possible result. I had two. One for fully on screen (the high byte in both dimensions was zero, and there was no carry set after adding the width for either) and one for partially on screen. (Which meant I had to do the 16 bit add even if one of the dimensions was fully on screen.) I figured that was fine.

Edit: If you don't have duplicate routines for each type of partial offscreen, you still have to do a 16bit add for every sprite, but you just do a non zero branch.
   lda [reserved4],y;X Offset
   adc <reserved0;X Position
   sta OAM+3,x
   lda <reserved1
   adc #$00
   bne dms.partial.skipsprite.twoiny

Edit2: Knowing the direction of partially offscreen saves you the load of the high byte and the add. (You just need to branch depending on carry from the low byte.)
Fully on screen is just this:
;35 bytes
   lda [reserved4],y;Y position
   adc <reserved2
   sta OAM,x

   lda [reserved4],y;X Position
   adc <reserved0
   sta OAM+3,x

   lda [reserved4],y
   adc <reserved7;This should guarantee a clear carry
   sta OAM+1,x
   lda [reserved4],y
   sta OAM+2,x
   ;clc;guaranteed clear above
   adc #4
   tax;Carry not guaranteed anything after that add since oampos wraps

Which I also unrolled.
Re: Culling
by on (#217447)
I always assume the object is on screen, and do the following:

  • Subtract the camera's coordinates from the object's to find the "center" of the sprite, while also adjusting for flipping and the excess-128 format that will be used later;
  • Set up masks to invert the sprite offsets in case of flipping (EOR masks that are either $00 or $FF);
  • Iterate over the individual sprites of the metasprite, EOR'ing the excess-128 offsets against the respective masks and adding the result to the reference coordinates;
  • A final coordinate with a high byte other then $00 means the sprite is off screen, so just move on to the next sprite;
  • If the sprite is on screen, copy the remaining bytes (attributes - EOR'ed against another flipping mask - and the tile index) to the current OAM position and advance to the next position;

In the end I compare the OAM index to what it was in the beginning, to detect whether any sprites were output, and use this information when deciding whether to deactivate objects (I have two conditions for this to happen: the object has to be off screen and its spawn point outside of the active area).

My biggest tip for reducing the overhead of 16-bit math is using the excess-128 format in the metasprite definitions, so you don't have to worry about selecting/forming a high byte for them (it will always be 0 in excess-128 format, even when there's flipping), but you still need the 16-bit result to tell whether the resulting coordinates are on screen (you can obviously throw the high byte away after detecting that, there's no need to save it anywhere).

Of course things will have to be a little different if your metasprites are stored in grids rather than free sprites.

EDIT: Kasumi's method has some similarities to mine, but there are a lot of things that differ depending on the format in which you want to store the data and the limitations you impose on how the sprites can be formed. I personally don't like to use the top left corner as the reference point because I find that too restrictive. For example, you can't easily make a character that grows upward (such as James Pond) because you'd have to awkwardly manipulate the object's position in addition to its size while it's actually standing still. I don't think this is always good for physics either, specially if you're doing any sort of rotation - I prefer to have the hotspot at the center of the character's feet, so I can easily rotate around that when running up steep slopes and such. The best thing about metasprites made of free sprites IMO is that the hotspot can be anywhere really, so you can use whatever makes the most sense for each character, taking into account how they behave and interact with the level map.
Re: Culling
by on (#217451)
For Lizard, I handled this by dividing the screen into halves and quarters, and using that as sort of a parity for whether draw a tile or not. I'll talk about this in one dimension, but you could do the same process independently in both dimensions.

The 16-bit position that goes in is generally the centre of the object, partly because that makes flipping simple, but the main requirement here is that the sprite should not extend more than 64 pixels -/+ this centre position. (Internally I call this centre position a "pivot point", being used to that term from 3D animation.)

In this example I will use ^ to represent an exclusive-or operation (EOR).

Step 1: Figure out its relationship to the screen. Subtract pivot from camera to find the screen relative position, with 16 bit result X. This leaves a few categories of where it might appear:

1. X < -64 or X >= 256+64 --- completely offscreen, return early (far outside screen)
2. 64 =< X < 196 --- completely onscreen, draw with no culling (middle two quadrants)
3. X < 64 or X >= 196 --- onscreen but could be partially culled (outer two quadrants)
4. X < 0 or X >= 256 --- offscreen but could be partially onscreen, i.e. inverse culling (two quadrants adjacent to visible screen)

At this point, reduce X to an 8-bit value x. 16-bit math is no longer required.

The high bit of x (i.e. representing x >= 128) will become the parity we will use for culling. We want to create a parity value p so that if the high bit of x ^ p = 0, the position is considered offscreen (culled), and if it = 1, the position is onscreen.

So, if X was onscreen (case 3), p = x ^ $80. If it was offscreen (case 4), p = x. Store p on the zero page for easy use.

Step 2: Cull the metatiles. (Case 3 and 4 only.)

For each tile, add its signed 8-bit position to x. I'll call this result u. The result is still 8-bit; we never need to go back to 16-bits, the only additional information we needed was p.

So, calculate u ^ p and inspect the high bit of the result (BMI/BPL). If the high bit is 1, add the tile to your OAM buffer. If it's 0, skip it (culled).

That's basically it. The whole point is doing 16-bit operations only once (on the pivot point), and then from there taking advantage of the constraint that sprites won't ever be wide enough to end up on more than half of the quadrants of the screen to encode the leftover information needed to decide onscreen/offscreen, represented with that single parity.

Even differentiating the 4 cases doesn't require more 16-bit compares beyond the intial pivot - camera subtraction (i.e. you can obtain quadrant from the high 2 bits of the low 8-bit result, and a little bit of logic on the high result). I didn't outline all of that here to keep it simple, but really there's only one 16-bit subtract (per axis) for the whole metasprite.

Case 1 resulted in skipping the whole metasprite. Case 2 requires parity to be ignored and just to draw all tiles; this can either be a separate routine, or maybe an extra ORA with a second flag to force the result to always say "onscreen". Case 3 and 4 can be rendered with the same code, the input p controls which half of the 8-bit "screen" (reduced by wrapping) is considered "onscreen" or "offscreen".
Re: Culling
by on (#217520)
Thank a lot everybody.

I think ill be able to figure out an algorithm using some of the ideas you suggested. rainwarrior's suggestion seems more in-line with how my data is stored right now and is simple to implement.

As a somewhat related question, something obvious i *wasn't* doing (since all i had was one guy in a room) is clear the OAM every frame.

What do you guys do? Do you take the lazy way and clear it at the beginning of the frame? Of clear the unused entries at the end? Whats the fastest OAM-clearning loop you guys can think of ? ;)

Re: Culling
by on (#217522)
I have an index to the OAM buffer as it gets filled up. It starts at 0. Every tile added increases it by 4, etc. This tells me exactly where the leftover OAM that needs to be cleared is. So, when I'm done adding sprites, I call a "finish" routine that just puts Y=$FF in all the remaining tiles.
Re: Culling
by on (#217523)
Everything I've made since Zap Ruder has used the following finish routine to clear remaining OAM entries (source: ppuclear.s).

; Moves all sprites starting at address X (e.g, $04, $08, ..., $FC)
; below the visible area.
; X is 0 at the end.
.proc ppu_clear_oam

  ; First round the address down to a multiple of 4 so that it won't
  ; freeze should the address get corrupted.
  and #%11111100
  lda #$FF  ; Any Y value from $EF through $FF will work
  sta OAM,x
  bne loop

On the Game Boy I use something similar:
; Moves sprites in the display list from SOAM+a through
; SOAM+$9C offscreen by setting their Y coordinate to 0, which is
; completely above the screen top (16).
  ; Destination address in shadow OAM
  ld h,high(SOAM)
  and $FC
  ld l,a

  ; iteration count
  add 256 - 40
  ld c,a

  xor a
  ld [hl+],a
  inc l
  inc l
  inc l
  inc c
  jr nz, .rowloop
Re: Culling
by on (#217533)
rainwarrior wrote:
I have an index to the OAM buffer as it gets filled up. It starts at 0. Every tile added increases it by 4, etc. This tells me exactly where the leftover OAM that needs to be cleared is. So, when I'm done adding sprites, I call a "finish" routine that just puts Y=$FF in all the remaining tiles.

Yeah that's what i just did. If i were really motivated I would keep track how how far I need to clear based on the previous frame, but that's not gonna happen tonight. You culling algorithm works great. It's all implemented and working. Real cool stuff.