This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

CPU budget questions

CPU budget questions
by on (#114133)
In my experiments last night I tried to update the attribute table for a single screen with 64 random values.

Boy was I in for a surprise - 64 random values within the vblank is out of the question for the NES. Glitch city. I was able to make a for loop in C that wrote 54 immediate values without glitching the screen. However, by my calculations I should have been able to do something in the area of 80 values. If there are 113.667 CPU cycles per scanline and 20 scanlines in the vblank, and an itteration of my loop was around 27 cycles (I counted the CA65 output's cycles) that's what it works out to (working very roughly here - my math habits are often fuzzy.) But that's well more than 54 iterations. What's going on here? Assuming I counted the instruction timings correctly.

Here's my setup: I do everything in the NMI - and the very first thing is to call the main loop routine, therefore anything done there first should be within vblank. And here is the asm of the main routine:

Code:
.segment   "CODE"

.proc   _main: near

   .dbg   func, "main", "00", extern, "_main"

.segment   "CODE"

;
; poke( PPU_CTRL, 0x90 );
;
   .dbg   line, "game.c", 57
   lda     #$90
   sta     $2000
;
; vram_adr( 0x23c0 );
;
   .dbg   line, "game.c", 58
   ldx     #$23
   lda     #$C0
   jsr     _vram_adr
;
; for ( i=0; i != 50; i++ ) {
;
   .dbg   line, "game.c", 59
   lda     #$00
L003E:   sta     _i
   cmp     #$32
   beq     L002F
;
; poke( PPU_DATA, 1 );
;
   .dbg   line, "game.c", 60
   lda     #$01
   sta     $2007
;
; for ( i=0; i != 50; i++ ) {
;
   .dbg   line, "game.c", 59
   lda     _i
   clc
   adc     #$01
   jmp     L003E
;
; j++;
;
   .dbg   line, "game.c", 62
L002F:   lda     _j
   clc
   adc     #$01
   sta     _j
;
; }
;
   .dbg   line, "game.c", 67
   rts
   .dbg   line


Just curious here. If 6502's listed timings are different on the NES for some reason that would be good to know.
Re: CPU budget questions
by on (#114136)
There is really short period when the VRAM access is possible - ~2700t, and 513+ are needed to do sprite DMA. So, if you need to put many values into VRAM, you have to write this part in assembly. Even further, you should prepare the values before getting into NMI, and use unrolled loops. If you have enough RAM, you can even generate a pusher subroutine, like sequence of 'lda #nn sta PPU_DATA', this way you'll get 6t/byte for sequental write and will be able to push a sequence of ~350 bytes. Much less for random writes, like 100 bytes.
Re: CPU budget questions
by on (#114137)
You can use NintendulatorDX to figure out how many cycles a piece of code is taking. Simply write to $4020 when you want to start the timing and to $4030 when you want it to end. The README file has more info.
Re: CPU budget questions
by on (#114142)
Shiru wrote:
There is really short period when the VRAM access is possible - ~2700t, and 513+ are needed to do sprite DMA. So, if you need to put many values into VRAM, you have to write this part in assembly. Even further, you should prepare the values before getting into NMI, and use unrolled loops. If you have enough RAM, you can even generate a pusher subroutine, like sequence of 'lda #nn sta PPU_DATA', this way you'll get 6t/byte for sequental write and will be able to push a sequence of ~350 bytes. Much less for random writes, like 100 bytes.


ohhhh, THAT was what was causing the glitches! the other stuff in the nmi routine. d'oh. no wonder i was actually also seeing flashing colors and shearing. on a side note the glitches were actually quite beautiful to me. i have a mind to see how i can abuse the PPU to produce these kinds of effects intentionally.

yeah, I agree, asm is definitely a better way to go here, and preparing "batches" beforehand as i see is already set up in your code. I was just using these tests as a way to get a handle on C and performance. those are some good ideas for optimization / squeezing out more from the hardware.

Thanks for the tip, thefox. I use NDX, but I will have to look into that.
Re: CPU budget questions
by on (#114145)
For this on-the-fly code, you can use X and Y for commonly used values. Do the updates to each 256-byte page of VRAM together and then X can hold $20 for the first few, then $21, etc., and Y whatever the most common value is. So you get code like
Code:
ldx #$20
ldy #$00
stx $2006
lda #$08
sta $2006
sty $2007
stx $2006
lda #$25
sta $2006
lda #$10
sta $2007
stx $2006
Re: CPU budget questions
by on (#114409)
I thought I should mention that I've settled on a hybrid NMI approach - vram updates and music in the VMI, with a "callback" in the NMI to do whatever custom stuff needs doing, and controller polling and game logic in the main thread. Trying to poll controllers in the NMI was a disaster, not sure why. Anyway I think splitting the threads is a good idea in general for framerate and CPU control.