I'm sure this has probably been covered before, but wanted to share anyway.

I was analyzing my code to see what routines were using the most cycles overall. I was surprised to see that one of my most cycle hungry routines was just a simple loop I was using to clear out my shadow OAM before reloading sprite data into it:

I wanted to see if I could reduce the amount of cycles used here, and tried the following instead:

I was really surprised to see that this uses roughly half as many cycles, for just a handful of extra bytes of code. I understood in theory that a loop which makes 256 comparisons would take more cycles that a loop that only makes 32, but I had never actually tried it and looked at the difference in speed.

I know this is probably old-hat to many of you, but I'm excited about it. it's enough of a difference to visibly reduce lag in some areas, and I'm very pleased with it.

I was analyzing my code to see what routines were using the most cycles overall. I was surprised to see that one of my most cycle hungry routines was just a simple loop I was using to clear out my shadow OAM before reloading sprite data into it:

**Code:**

LDX #0

LDA #$FE

.Loop:

STA spriteTable, X

INX

BNE .Loop

RTS

LDA #$FE

.Loop:

STA spriteTable, X

INX

BNE .Loop

RTS

I wanted to see if I could reduce the amount of cycles used here, and tried the following instead:

**Code:**

LDX #31

LDA #$FE

.Loop:

STA spriteTable, X

STA spriteTable + 32, X

STA spriteTable + 64, X

STA spriteTable + 96, X

STA spriteTable + 128, X

STA spriteTable + 160, X

STA spriteTable + 192, X

STA spriteTable + 224, X

DEX

BPL .Loop

RTS

LDA #$FE

.Loop:

STA spriteTable, X

STA spriteTable + 32, X

STA spriteTable + 64, X

STA spriteTable + 96, X

STA spriteTable + 128, X

STA spriteTable + 160, X

STA spriteTable + 192, X

STA spriteTable + 224, X

DEX

BPL .Loop

RTS

I was really surprised to see that this uses roughly half as many cycles, for just a handful of extra bytes of code. I understood in theory that a loop which makes 256 comparisons would take more cycles that a loop that only makes 32, but I had never actually tried it and looked at the difference in speed.

I know this is probably old-hat to many of you, but I'm excited about it. it's enough of a difference to visibly reduce lag in some areas, and I'm very pleased with it.