This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Optimizing important code parts

Optimizing important code parts
by on (#158572)
Before I go on with my general game logic, I wanted to make sure that important basic functions are optimized well enough.

I started with the sprite rendering function. (Next thing will be the PPU update, then reading the current level data.)

So, could you please have a look at the following sprite rendering function and tell me if there's something that I could improve to make it faster?

Some information:

All meta sprites are stored in the same array. The function takes the array index and then puts the starting address to a pointer.

The meta sprites are declared like this:
Width, height, Y offset.
Tile, palette, tile, palette, tile, palette...
So, all meta sprites are drawn in a rectangle shape, so I don't have to read the X and Y offset for each tile. Makes the function much faster.

As global variables, we have:
Absolute X: The value in the center of the meta sprite (not the leftmost position).
Absolute Y: The value at the bottom of the meta sprite (not the top position).
The meta sprites array index where the data are read from.
The next PPU sprites index where the data is written to
The mirror attribute to check whether the meta sprite should be flipped.

Each X and Y position is two bytes since characters who leave the screen on one side shall not enter on the other side. (X and Y offset values that are relative to the actual sprite's position are of course just one byte.)

Three remarks:

1. Characters in the game can be assigned to more than one palette. (For example, my main character has: Palette 1 = skin color, hair color, t-shirt color. Palette 2 = skin color, pants color, shoe color.) So, I cannot save the palette for the whole meta sprite. I have to use one value per tile.

2. Yes, I know: I set all sprite values first and then I check for the question whether I should actually render them. Instead of skipping the code as soon as one coordinate is outside the screen.
I did this because of the following reason:
If the current game situation is one where less than the maximum number of characters are on screen, then these characters will have an IsActive variable set to false in the game logic code. I.e. if there are only two characters on the screen while the game can handle five at once, UpdateSprites will be called only two times anyway.
This means, said optimization will only work in the rare cases where a character is partly on screen and partly offscreen.
But if a character is rendered, the game engine has to be able to handle him anyway, so there's no need to add some more comparisons just because we could save some cycles in the one second where he's partly outside the screen.
If a character is on-screen, he will be fully visible for 99% of the time, so additional BEQs for a 1 % case where only parts of him are visible would actually make the code slower since most of the time, the stuff cannot be skipped anyway.

3. Checking for mirroring whenever a new X value is set is actually faster than saving a mirror bit mask and a subtraction value in the beginning and then using that for calculation. At least when the characters are only two tiles wide, which is the case with almost all of my characters.


Alright, that's my code:
Code:
.segment "ZEROPAGE"

   _UpdateSpritesSpritesIndex: .res 1
   .export _UpdateSpritesSpritesIndex
   _UpdateSpritesMetaSpritesIndex: .res 2
   .export _UpdateSpritesMetaSpritesIndex
   _UpdateSpritesX: .res 2
   .export _UpdateSpritesX
   _UpdateSpritesY: .res 2
   .export _UpdateSpritesY
   _UpdateSpritesMirrorAttributes: .res 1
   .export _UpdateSpritesMirrorAttributes

   XCounter: .res 1
   YCounter: .res 1

   HalfWidth: .res 1
   HeightInTiles: .res 1

   AbsoluteX: .res 2
   AbsoluteY: .res 2

   RelativeX: .res 2
   PossiblyMirroredRelativeX: .res 2

.segment "CODE"

_UpdateSprites_:
.export _UpdateSprites_

   ; The start position from the meta sprites array
   ; for the current sprites is set to the const pointer.
   CLC
   LDA #<(_MetaSprites)
   ADC _UpdateSpritesMetaSpritesIndex
   STA _ConstPointer
   LDA #>(_MetaSprites)
   ADC _UpdateSpritesMetaSpritesIndex + 1
   STA _ConstPointer + 1

   ; The index offset of the meta sprites array,
   ; starting at the position of the const pointer.
   LDY #$00

   ; The size of the current meta sprites,
   ; counted in tiles, not in pixels.
   ; That's the counter value for the X loop,
   ; i.e. the outer loop.
   LDA (_ConstPointer), Y
   INY
   STA XCounter

   ; XCounter * 4 = Half of the width of the meta sprite.
   ASL
   ASL
   STA HalfWidth

   ; The absolute X position is in the center of the meta sprite.
   ; The relative X position gets moved from the center to the left,
   ; so that this value points to the leftmost position of the meta sprite.
   SEC
   LDA #$00
   SBC HalfWidth
   STA RelativeX
   LDA #$00
   SBC #$00
   STA RelativeX + 1

   ; Height, counted in tiles.
   LDA (_ConstPointer), Y
   INY
   STA HeightInTiles

   ; The absolute Y position is at the bottom of the meta sprite.
   ; So, it is moved eight pixels to the top,
   ; so that the tiles' bottoms are actually at the desired position.
   SEC
   LDA _UpdateSpritesY
   SBC #$08
   STA _UpdateSpritesY
   LDA _UpdateSpritesY + 1
   SBC #$00
   STA _UpdateSpritesY + 1

   ; Some characters cannot be drawn with their feet in the bottom position.
   ; For these meta sprites, the offset value is added to the Y position,
   ; so that they're still in the correct position.
   CLC
   LDA _UpdateSpritesY
   ADC (_ConstPointer), Y
   INY
   STA _UpdateSpritesY
   LDA _UpdateSpritesY + 1
   ADC #$00
   STA _UpdateSpritesY + 1

   ; The index of the PPU sprites that are written next.
   LDX _UpdateSpritesSpritesIndex

   ; The outer loop: All rows are drawn from left to right.
@loopX:

   ; The height in tiles becomes the loop counter.
   LDA HeightInTiles
   STA YCounter

   ; The absolute Y value is set to its starting position.
   LDA _UpdateSpritesY
   STA AbsoluteY
   LDA _UpdateSpritesY + 1
   STA AbsoluteY + 1

   ; If the meta sprite shall be mirrored,
   ; we have to manipulate the X position.
   LDA _UpdateSpritesMirrorAttributes
   BEQ @noMirroring

   ; The relative X position gets inverted and subtracted with 7.
   ; This way, it has the correct value to render the tile
   ; at the opposite of the meta sprite's center.
   ; The new value is stored in a separate variable.
   SEC
   LDA RelativeX
   EOR #%11111111
   SBC #$07
   STA PossiblyMirroredRelativeX
   LDA RelativeX + 1
   EOR #%11111111
   SBC #$00
   STA PossiblyMirroredRelativeX + 1

   JMP @endMirroring

@noMirroring:

   ; If no mirroring is done,
   ; the value is simply copied into the new variable.
   LDA RelativeX
   STA PossiblyMirroredRelativeX
   LDA RelativeX + 1
   STA PossiblyMirroredRelativeX + 1

@endMirroring:

   ; We take the original absolute centered X position
   ; and add the relative X position to it.
   ; This way we get the actual value
   ; that needs to be used for the rendering.
   CLC
   LDA _UpdateSpritesX
   ADC PossiblyMirroredRelativeX
   STA AbsoluteX
   LDA _UpdateSpritesX + 1
   ADC PossiblyMirroredRelativeX + 1
   STA AbsoluteX + 1

   ; The inner loop: Every tile in this column is rendered from bottom to top.
@loopY:

   ; The low byte of the Y position is written to the sprites array.
   LDA AbsoluteY
   STA _Sprites + 0, X

   ; The tile is read from the meta sprites array
   ; and set to the sprites array.
   LDA (_ConstPointer), Y
   INY
   STA _Sprites + 1, X

   ; The attributes are read from the meta sprites array.
   ; They are OR-connected with the mirror attributes
   ; and then written to the sprites array.
   LDA (_ConstPointer), Y
   INY
   ORA _UpdateSpritesMirrorAttributes
   STA _Sprites + 2, X

   ; The low byte of the X position is written to the sprites array.
   LDA AbsoluteX
   STA _Sprites + 3, X

   ; If the high byte of X or Y is not 0,
   ; this means this specific sprite is outside the screen.
   ; In this case, the rendering is skipped.
   ; It doesn't matter that the values in the sprites array are already written.
   ; As long as _UpdateSpritesSpritesIndex isn't incremented,
   ; the _ClearSprites function will make sure
   ; that all unused sprites are put outside the screen in the end.
   LDA AbsoluteX + 1
   BNE @endRendering
   LDA AbsoluteY + 1
   BNE @endRendering

   ; If everything is alright, then _UpdateSpritesSpritesIndex and the X register
   ; get incremented with the value 4.
   ; This value corresponds to the four bytes that we have written to the sprites array.
   ; The PPU will render the current sprite on the screen.
   INX
   INX
   INX
   INX
   STX _UpdateSpritesSpritesIndex

@endRendering:

   ; If the Y counter is 0,
   ; the inner loop isn't repeated anymore
   ; and all of the loop preparation is skipped.
   DEC YCounter
   BEQ @noLoopY

   ; For the next loop,
   ; the Y position is decremented with 8,
   ; i.e. one tile height.
   SEC
   LDA AbsoluteY
   SBC #$08
   STA AbsoluteY
   LDA AbsoluteY + 1
   SBC #$00
   STA AbsoluteY + 1

   ; The inner loop is repeated.
   JMP @loopY

@noLoopY:

   ; If the X counter is 0, the function ends.
   ; Otherwise, the outer loop is repeated.
   DEC XCounter
   BEQ @noLoopX

   ; For the next loop,
   ; the X position is incremented with 8,
   ; i.e. one tile width.
   CLC
   LDA RelativeX
   ADC #$08
   STA RelativeX
   LDA RelativeX + 1
   ADC #$00
   STA RelativeX + 1

   ; The outer loop is repeated.
   JMP @loopX

@noLoopX:

   RTS
Re: Optimizing important code parts
by on (#158583)
This won't help make anything faster, but if you're using ca65, I highly recommend using ".macpack generic" and the macros within. The ADD and SUB macros are less error-prone than writing CLC (or SEC) with ADC and SBC explicitly, and I much prefer writing BLT and BGE to writing BCC and BCS after comparisons.
Re: Optimizing important code parts
by on (#158586)
My goal was actually not to use any library functions, but to write everything myself.

Except in a few special cases:

I use the randomizer provided by by CC65.
Although I took the source code and changed some minor things, like setting the seed to 1 in the beginning. Firstly, this would require a DATA segment which I don't have because I don't need it. And secondly, my game doesn't call the rand function without having used srand anyway.

And I use Shiru's FamiTone library completely unaltered because even after working through the Nerdy Nights NES music tutorial, I would still be unable to create a decent sound driver.

But all those little code snippets with small functions or macros: I don't use them because these things I want to write myself.
It might be good to write my own little macros ADDITION16BIT and SUBTRACTION16BIT though.
Re: Optimizing important code parts
by on (#158590)
Well then you can write your own "ADD" and "SUB" macros. Although I don't see the point in doing that instead of using the ones already written for you.
Re: Optimizing important code parts
by on (#158595)
As I said: Because I want to do eveything myself.

I have no problem in using external stuff when it comes to more complicated things like a randomizer, a sound library or a compiler that transforms C code into Assembly. I.e. longer stuff where I don't understand the inner workings even if I read the code.

And of course, I have no problem in asking how certain things are done and when somebody explains it to me, I implement it into my code.

But I don't want to clutter my game with these little mundane library calls.

When I have to write 99.9 % of the game myself anyway (unlike a Windows application where the standard library is of actual help and where there are hundreds of external function calls in your own code), why should I use an external library for 0.01 % of the game? These little code details can be selfmade as well then. No need to add another external dependency for something as simple as an addition or a subtraction.
Re: Optimizing important code parts
by on (#158598)
Macros aren't library calls. Personally I don't like ADD and SUB macros because I don't want to get in the habit of using them and missing opportunities to optimize where the carry flag is already in the correct state. But macros can be super helpful.
Re: Optimizing important code parts
by on (#158601)
Movax12 wrote:
Macros aren't library calls.

Still, it doesn't invalidate what I worte about it.
Re: Optimizing important code parts
by on (#158605)
Movax12 wrote:
Personally I don't like ADD and SUB macros because I don't want to get in the habit of using them and missing opportunities to optimize where the carry flag is already in the correct state. But macros can be super helpful.

I agree. I hardly ever need typical additions or subtractions, I usually find myself doing multiple things at once and using the carry to my advantage.
Re: Optimizing important code parts
by on (#158610)
Movax12 wrote:
Personally I don't like ADD and SUB macros because I don't want to get in the habit of using them and missing opportunities to optimize where the carry flag is already in the correct state.


Y'know what they say: computer cycles are cheap, people cycles are expensive. Just one bug caused by forgetting to CLC or SEC would negate whatever advantage you get out of that.

If you're doing it in a tight loop where it can actually make a performance difference, you won't forget that ADC/SBC is faster than ADD/SUB.
Re: Optimizing important code parts
by on (#158629)
I have a 6502 emulator in my head, I'm always minding the carry flag. :twisted:

I still write the CLC and SEC instructions in my programs when they're not necessary, but I comment then out. This makes it easier to see that it's an optimization, and not a mistake.
Re: Optimizing important code parts
by on (#162132)
The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite (roughly 20% faster in the cases I tested, large metasprites with 18 or more sprites). I.e. clip the sprite's rectangle and figure out how to iterate over the metasprite calculating some increments to traverse the sub-rectangle that is actually visible. That way you won't be doing redundant clipping tests in the inner loop, which is where the lion's share of time goes. This is the improvement I was referring to over in Efficiency of development process using C versus 6502 As you mention you're only doing small sprites, might not be useful in your game. *edit* once I clean it up I'll share my routine here.

*edit* This is my current WIP. I actually realized I need to re-work this to perform mirroring the way I've been doing it, which has been to bake it into the metasprite data itself. But, it at least demonstrates the pre-clipping idea I mentioned. Though, taking thefox's comment into consideration (later in this thread), this might be overkill. Still, I was pleased with how little has to be in the innermost loop of the routine, which was an improvement over my previous efforts.

Code:
;****************************************************************
;This routine pre-clips and draws a metasprite using 16 bit
;coordinates.
;C prototype:
;void __fastcall__ sprite_draw_metasprite(int x, int y, unsigned char chr_handle, const unsigned char *metasprite);
;****************************************************************
.proc _sprite_draw_metasprite
    left = w0
    top = w1
    right = w2
    bottom = w3
    metasprite = w4
    start_row = b0
    end_row = b1
    start_column = b2
    end_column = b3
    row = b4
    column = b5
    chr_handle = b6
    width = b7
    height = b8
    width_in_columns = b9
    height_in_rows = b10
    bytes_to_skip_per_row = b11
    metasprite_offset = b12
    metasprite_offset_term1 = b13

    ;const metasprite_entry *metasprite_entries = (const metasprite_entry*) (metasprite + 2);
    sta metasprite
    stx metasprite+1

    jsr popa
    sta chr_handle

    ;int top = y - 1;
    jsr popax
    sta top
    stx top+1

    dec16 top

    ;int left = x;
    jsr popax
    sta left
    stx left+1

    ;int right = left + (metasprite[0] * 8) - 1;

    ;(metasprite[0] * 8)
    ldy #0
    lda (metasprite),y
    asl
    asl
    asl
    sta width

    ;left + (metasprite[0] * 8)
    clc
    lda left
    adc width
    sta right
    lda left+1
    adc #0
    sta right+1

    ; - 1;
    dec16 right

    ;int bottom = top + (metasprite[1] * 8) - 1;

    ;(metasprite[1] * 8)
    iny
    lda (metasprite),y
    asl
    asl
    asl
    sta height

    ;top + (metasprite[1] * 8)
    clc
    lda top
    adc height
    sta bottom
    lda top+1
    adc #0
    sta bottom+1

    ; - 1;
    dec16 bottom

    ;if (left > 255) return;
    cmp16 #255, left
    blt :+
    jmp :++
:   rts
:

    ;if (top > 238) return;
    cmp16 #238, top
    blt :+
    jmp :++
:   rts
:

    ;if (right < 7) return;
    cmp16 right, #7
    blt :+
    jmp :++
:   rts
:

    ;if (bottom < 7) return;
    cmp16 bottom, #7
    blt :+
    jmp :++
:   rts
:

    ;if (left < 0) {
    .scope
    cmp16 left, #0
    blt clip_left
no_clip_left:
    ;no clip on left side
    ;start_column = 0;
    lda #0
    sta start_column
    jmp done
clip_left:
    ;clip on left side
    ;start_column = (-(left + 1) >> 3) + 1;
    lda left
    sta start_column
    inc start_column
    clc
    lda start_column
    eor #$ff
    adc #$01
    lsr
    lsr
    lsr
    sta start_column
    inc start_column
done:
    .endscope

    ;if (right > 255) {
    .scope
    cmp16 #255, right
    blt clip_right
no_clip_right:
    ;no clip on right side
    ;end_column = metasprite[0] - 1;
    ldy #0
    lda (metasprite),y
    sta end_column
    dec end_column
    jmp done
clip_right:
    ;end_column = (255 - left) >> 3;
    sec
    lda #255
    sbc left
    lsr
    lsr
    lsr
    sta end_column
done:
    .endscope

    ;if (top < 0) {
    .scope
    cmp16 top, #0
    blt clip_top
no_clip_top:
    ;no clip on top
    ;start_row = 0;
    lda #0
    sta start_row
   jmp done
clip_top:
    ;clip on top
    ;start_row = (-(top + 1) >> 3) + 1;
    lda top
    sta start_row
    inc start_row
    clc
    lda start_row
    eor #$ff
    adc #$01
    lsr
    lsr
    lsr
    sta start_row
    inc start_row
done:
    .endscope

    ;if (bottom > 239) {
    .scope
    cmp16 #239, bottom
    blt clip_bottom
no_clip_bottom:
    ;no clip on bottom
    ;end_row = metasprite[1] - 1;
    ldy #1
    lda (metasprite),y
    sta end_row
    dec end_row
    jmp done
clip_bottom:
    ;clip on bottom
    ;end_row = (239 - top) >> 3;
    sec
    lda #239
    sbc top
    lsr
    lsr
    lsr
    sta end_row
done:
    .endscope

    lda start_row
    sta row
    lda start_column
    sta column

    ;metasprite_offset = (start_row * metasprite[0] + start_column) * 5;

    ;(start_row * metasprite[0]
    ldy #0
    lda (metasprite),y
    tax
    lda #0
:   clc
    adc start_row
    dex
    bne :-

    ; + start_column)
    clc
    adc start_column
    sta metasprite_offset_term1

    ; * 5;
    asl
    asl
    clc
    adc metasprite_offset_term1
    sta metasprite_offset

    inc metasprite_offset

    ;number_of_bytes_to_skip_per_row = ((metasprite[0] - (end_column - start_column + 1)) * 5);

    ;(end_column - start_column + 1)
    sec
    lda end_column
    sbc start_column
    sta bytes_to_skip_per_row
    inc bytes_to_skip_per_row

    ;((metasprite[0] - (end_column - start_column + 1))
    sec
    ldy #0
    lda (metasprite),y
    sbc bytes_to_skip_per_row
    sta bytes_to_skip_per_row

    ;((metasprite[0] - (end_column - start_column + 1)) * 5)
    lda bytes_to_skip_per_row
    asl
    asl
    clc
    adc bytes_to_skip_per_row
    sta bytes_to_skip_per_row

    sec
    lda end_row
    sbc start_row
    sta height_in_rows
    inc height_in_rows

    sec
    lda end_column
    sbc start_column
    sta width_in_columns
    inc width_in_columns

    lda height_in_rows
    sta row
next_row:

    lda width_in_columns
    sta column

    ldy metasprite_offset
next_column:

    ldx _next_sprite_address

    ;get y
    iny
    clc
    lda (metasprite),y
    adc top
    sta _sprite_ram+sprite_struct::ycoord,x
    ;get tile
    iny
    lda (metasprite),y
    sta _sprite_ram+sprite_struct::tile,x
    ;get attribute
    iny
    lda (metasprite),y
    sta _sprite_ram+sprite_struct::attribute,x
    ;get x
    iny
    clc
    lda (metasprite),y
    adc left
    sta _sprite_ram+sprite_struct::xcoord,x
    ;skip flipped x
    iny

    clc
    lda _next_sprite_address
    adc #4
    sta _next_sprite_address

    dec column
    bne next_column

    clc
    tya
    adc bytes_to_skip_per_row
    sta metasprite_offset

    dec row
    bne next_row

    rts

.endproc

Re: Optimizing important code parts
by on (#162133)
GradualGames wrote:
The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite (roughly 20% faster in the cases I tested, large metasprites with 18 or more sprites). I.e. clip the sprite's rectangle and figure out how to iterate over the metasprite calculating some increments to traverse the sub-rectangle that is actually visible. That way you won't be doing redundant clipping tests in the inner loop, which is where the lion's share of time goes. This is the improvement I was referring to over in Efficiency of development process using C versus 6502 As you mention you're only doing small sprites, might not be useful in your game.

That's a good point. In fact, it might be a decent enough optimization to simply have two rendering routines: one that clips per sprite, and another one that doesn't clip at all. The routine would be selected depending on whether the screen edges intersect the bounding box of the metasprite (the whole call could be skipped if the bounding box is outside the screen boundaries). You would think that most of the time the metasprites would be entirely visible.

Might also be worth having some extra logic for small and large metasprites (for really small ones the overhead of the extra checks might not be worth it).
Re: Optimizing important code parts
by on (#162134)
GradualGames wrote:
The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite

The code above is not the most recent one. I actually did some more improvements, using the rule that all meta sprites have to have a rectangular shape which saves you reading the X and Y coordinate for each sprite.

I can post my most recent function in a few hours.
Re: Optimizing important code parts
by on (#162135)
DRW wrote:
GradualGames wrote:
The only thing I can think of is that, at least for large sprites it could save some time to pre-clip the sprite

The code above is not the most recent one. I actually did some more improvements, using the rule that all meta sprites have to have a rectangular shape which saves you reading the X and Y coordinate for each sprite.

I can post my most recent function in a few hours.


That's a neat idea.
Re: Optimizing important code parts
by on (#162148)
O.k., this is my current sprite rendering function. If you have any questions, just ask.

Code:
   .importzp _ConstPointer

   .import _CharactersSprites

.segment "ZEROPAGE"

   _UpdateSpritesSpritesIndex: .res 1
   .export _UpdateSpritesSpritesIndex
   _UpdateSpritesCharactersSpritesIndex: .res 2
   .export _UpdateSpritesCharactersSpritesIndex
   _UpdateSpritesX: .res 2
   .export _UpdateSpritesX
   _UpdateSpritesY: .res 2
   .export _UpdateSpritesY
   _UpdateSpritesMirrorAttributes: .res 1
   .export _UpdateSpritesMirrorAttributes

   XCounter: .res 1
   YCounter: .res 1

   HalfWidth: .res 1
   HeightInTiles: .res 1

   AbsoluteX: .res 2
   AbsoluteY: .res 2

   RelativeX: .res 2
   PossiblyMirroredRelativeX: .res 2
   
   Palette: .res 1

.segment "CODE"

_UpdateSprites_:
.export _UpdateSprites_

   ; The start position from the meta sprites array
   ; for the current sprites is set to the const pointer.
   CLC
   LDA #<(_CharactersSprites)
   ADC _UpdateSpritesCharactersSpritesIndex
   STA _ConstPointer
   LDA #>(_CharactersSprites)
   ADC _UpdateSpritesCharactersSpritesIndex + 1
   STA _ConstPointer + 1

   ; The index offset of the meta sprites array,
   ; starting at the position of the const pointer.
   LDY #$00

   ; The size of the current meta sprites,
   ; counted in tiles, not in pixels.
   ; That's the counter value for the X loop,
   ; i.e. the outer loop.
   LDA (_ConstPointer), Y
   INY
   STA XCounter

   ; XCounter * 4 = Half of the width of the meta sprite.
   ASL
   ASL
   STA HalfWidth

   ; The absolute X position is in the center of the meta sprite.
   ; The relative X position gets moved from the center to the left,
   ; so that this value points to the leftmost position of the meta sprite.
   SEC
   LDA #$00
   SBC HalfWidth
   STA RelativeX
   LDA #$00
   SBC #$00
   STA RelativeX + 1

   ; Height, counted in tiles.
   LDA (_ConstPointer), Y
   INY
   STA HeightInTiles

   ; The absolute Y position is at the bottom of the meta sprite.
   ; So, it is moved eight pixels to the top,
   ; so that the tiles' bottoms are actually at the desired position.
   SEC
   LDA _UpdateSpritesY
   SBC #$08
   STA _UpdateSpritesY
   LDA _UpdateSpritesY + 1
   SBC #$00
   STA _UpdateSpritesY + 1

   ; Some characters cannot be drawn with their feet in the bottom position.
   ; For these meta sprites, the offset value is added to the Y position,
   ; so that they're still in the correct position.
   CLC
   LDA _UpdateSpritesY
   ADC (_ConstPointer), Y
   INY
   STA _UpdateSpritesY
   LDA _UpdateSpritesY + 1
   ADC (_ConstPointer), Y
   INY
   STA _UpdateSpritesY + 1
   
   LDA (_ConstPointer), Y
   INY
   STA Palette
   
   ; The index of the PPU sprites that are written next.
   LDX _UpdateSpritesSpritesIndex

   ; The outer loop: All rows are drawn from left to right.
@loopX:

   ; The height in tiles becomes the loop counter.
   LDA HeightInTiles
   STA YCounter

   ; The absolute Y value is set to its starting position.
   LDA _UpdateSpritesY
   STA AbsoluteY
   LDA _UpdateSpritesY + 1
   STA AbsoluteY + 1

   ; If the meta sprite shall be mirrored,
   ; we have to manipulate the X position.
   LDA _UpdateSpritesMirrorAttributes
   BEQ @noMirroring

   ; The relative X position gets inverted and subtracted with 7.
   ; This way, it has the correct value to render the tile
   ; at the opposite of the meta sprite's center.
   ; The new value is stored in a separate variable.
   SEC
   LDA RelativeX
   EOR #%11111111
   SBC #$07
   STA PossiblyMirroredRelativeX
   LDA RelativeX + 1
   EOR #%11111111
   SBC #$00
   STA PossiblyMirroredRelativeX + 1

   JMP @endMirroring

@noMirroring:

   ; If no mirroring is done,
   ; the value is simply copied into the new variable.
   LDA RelativeX
   STA PossiblyMirroredRelativeX
   LDA RelativeX + 1
   STA PossiblyMirroredRelativeX + 1

@endMirroring:

   ; We take the original absolute centered X position
   ; and add the relative X position to it.
   ; This way we get the actual value
   ; that needs to be used for the rendering.
   CLC
   LDA _UpdateSpritesX
   ADC PossiblyMirroredRelativeX
   STA AbsoluteX
   LDA _UpdateSpritesX + 1
   ADC PossiblyMirroredRelativeX + 1
   STA AbsoluteX + 1

   ; The inner loop: Every tile in this column is rendered from bottom to top.
@loopY:

   ; The low byte of the Y position is written to the sprites array.
   LDA AbsoluteY
   STA _Sprites + 0, X

   ; The tile is read from the meta sprites array
   ; and set to the sprites array.
   LDA (_ConstPointer), Y
   INY
   STA _Sprites + 1, X

   ; The attributes are read from the meta sprites array.
   ; They are OR-connected with the mirror attributes
   ; and then written to the sprites array.
   LDA Palette
   ORA _UpdateSpritesMirrorAttributes
   STA _Sprites + 2, X

   ; The low byte of the X position is written to the sprites array.
   LDA AbsoluteX
   STA _Sprites + 3, X

   ; If the high byte of X or Y is not 0,
   ; this means this specific sprite is outside the screen.
   ; In this case, the rendering is skipped.
   ; It doesn't matter that the values in the sprites array are already written.
   ; As long as _UpdateSpritesSpritesIndex isn't incremented,
   ; the _ClearSprites function will make sure
   ; that all unused sprites are put outside the screen in the end.
   LDA AbsoluteX + 1
   BNE @endRendering
   LDA AbsoluteY + 1
   BNE @endRendering

   ; If everything is alright, then _UpdateSpritesSpritesIndex and the X register
   ; get incremented with the value 4.
   ; This value corresponds to the four bytes that we have written to the sprites array.
   ; The PPU will render the current sprite on the screen.
   INX
   INX
   INX
   INX
   STX _UpdateSpritesSpritesIndex

@endRendering:

   ; If the Y counter is 0,
   ; the inner loop isn't repeated anymore
   ; and all of the loop preparation is skipped.
   DEC YCounter
   BEQ @noLoopY

   ; For the next loop,
   ; the Y position is decremented with 8,
   ; i.e. one tile height.
   SEC
   LDA AbsoluteY
   SBC #$08
   STA AbsoluteY
   LDA AbsoluteY + 1
   SBC #$00
   STA AbsoluteY + 1

   ; The inner loop is repeated.
   JMP @loopY

@noLoopY:

   ; If the X counter is 0, the function ends.
   ; Otherwise, the outer loop is repeated.
   DEC XCounter
   BEQ @noLoopX

   ; For the next loop,
   ; the X position is incremented with 8,
   ; i.e. one tile width.
   CLC
   LDA RelativeX
   ADC #$08
   STA RelativeX
   LDA RelativeX + 1
   ADC #$00
   STA RelativeX + 1

   ; The outer loop is repeated.
   JMP @loopX

@noLoopX:

   RTS


This is the function call definition within C:
Code:
#define UpdateSprites(charactersSpritesIndex, x, y, directionAsMirrorAttributes)\
{\
   UpdateSpritesCharactersSpritesIndex = charactersSpritesIndex;\
   UpdateSpritesX = x;\
   UpdateSpritesY = y;\
   UpdateSpritesMirrorAttributes = directionAsMirrorAttributes;\
   UpdateSprites_();\
}


The meta sprites are all part of one huge array:
Code:
#define SPRITES_INIT(width, height, offsetY, palette)\
   width, height, LowByte(offsetY), HighByte(offsetY), palette
   
const byte CharactersSprites[] =
{
   /* Goon
      ---- */
   
   /* Walking0 */
   SPRITES_INIT(GoonWidth, GoonHeight, GoonOffsetY, GoonPalette),
   0x90, 0x80, 0x70, 0x60, 0x50,
   0x91, 0x81, 0x71, 0x61, 0x51,

   /* Walking1 */
   SPRITES_INIT(GoonWidth, GoonHeight, GoonOffsetY, GoonPalette),
   0x92, 0x82, 0x72, 0x62, 0x52,
   0x93, 0x83, 0x73, 0x63, 0x53,

   /* etc. */
};
Re: Optimizing important code parts
by on (#162344)
How long does it typically take to write a sprite on the NES? I know for the SNES, most games took an entire scanline to write a sprite to the OAM, but my game does it in half a scanline. The NES's CPU is weaker but it doesn't have to deal with hi-oam, or realigning different sized sprites when flipping a metasprite. Also, the smaller the sprites, the more sprites you need to make a metasprite of the same size.
Re: Optimizing important code parts
by on (#162363)
If you're sticking with a rectangle design for metasprites, then you can easily store it in column format and use that speed up for removing whole columns for clipping (instead of individual sprites).


psycopathicteen wrote:
How long does it typically take to write a sprite on the NES? I know for the SNES, most games took an entire scanline to write a sprite to the OAM, but my game does it in half a scanline. The NES's CPU is weaker but it doesn't have to deal with hi-oam, or realigning different sized sprites when flipping a metasprite. Also, the smaller the sprites, the more sprites you need to make a metasprite of the same size.

I haven't timed it, but most NES games use 8x8 sprite cell mode. Just look at the Megaman character sprite; the amount of metasprite translations is quite a bit.