This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

How fast is dynamic sprite loading?

How fast is dynamic sprite loading?
by on (#151394)
In this post, tepples wrote:
"Is this Battletoads?"

Some people are fans of CHR ROM because it allows rapid switching of tiles for smooth animation of the player character. But in Kirby's Adventure, it ends up causing a lot of duplication because all frames of all enemies on screen at once need to fit in the same 2K bank of enemy tiles. So instead, I'm a fan of the Battletoads technique of loading sprite tiles into video memory as they're needed. I've already described how this works on Game Boy Advance, but the NES has far less video memory bandwidth and thus needs a bit more clever technique.

The engine I'm developing for this project has four object slots in video memory: one for the hero and three for enemies. These occupy CHR RAM $1800-$19FF, $1A00-$1BFF, $1C00-$1DFF, and $1E00-$1FFF. Each slot is divided into a pair of 16-tile buffers, plus several variables in main RAM:
  • Current cel: The cel ID currently being displayed in this slot.
  • Next cel: The cel ID whose tile data needs to be loaded into the back buffer of this slot.
  • Current buffer: Whether the slot's first or second buffer is its front buffer.
  • Information about what data has been loaded into each buffer of each slot.
In addition, a set of request flags controls which sprites should be switched to the next cel as soon as they are completely loaded.

On each frame that doesn't have any updates to tiles or map caused by scrolling, the sprite cel loader finds pieces of a cel to load. It prioritizes slots whose request bit is set, switching buffers and clearing the request bit if the cel is ready and loading a piece into the VRAM transfer buffer if not. Up to 8 tiles can be copied in each frame (NTSC without extended blanking). If a particular frame uses all 16 tiles, its update is split across two frames.

If there is still no scheduled VRAM transfer after the loader has processed all request bits, it loads pieces of the next cel speculatively. Speculative loading sets the next cel to the frame most likely to follow a slot's current cel, such as the next cel of a walk cycle. I count about five mispredicts per second on average, usually when an enemy spawns or when the player takes an unpredicted action, such as jumping, stopping a walk, beginning a punch combo, allowing a punch combo to expire, or taking a hit. A mispredict may delay loading a cel for a frame or two But otherwise, speculative loading puts a cel into VRAM just when it is needed, allowing the player and enemies to be animated at an acceptable frame rate.

The metasprite drawing code uses values $00-$7F normally for constant tiles. It uses $80-$8F for these switchable slots, ORing in the start tile of current buffer of the slot being drawn.


Is the NES really that bad with sprites? I wrote down a quick loading routine and counted the cycles and ended up with:

Code:
-;
lda ({tile_address}),y   //5
sta {vram_port}      //4 9
iny         //2 11
cpy #$10      //2 13
bne -         //3 16


It would take only 2048 cycles to upload 8 tiles, and vblank is more than 4096 cycles long.
Re: Looking for NES Coder ( Paid )
by on (#151395)
psycopathicteen wrote:
Is the NES really that bad with sprites? I wrote down a quick loading routine and counted the cycles and ended up with:

Code:
-;
lda ({tile_address}),y   //5
sta {vram_port}      //4 9
iny         //2 11
cpy #$10      //2 13
bne -         //3 16


It would take only 2048 cycles to upload 8 tiles, and vblank is more than 4096 cycles long.

Vblank on NTSC NES is closer to 2270 cycles long because the NES PPU always runs in 240-line mode. This also needs to include about 600 cycles of other tasks, such as OAM DMA and setting the scroll position. So the pattern loading routine is unrolled by a factor of 16 and always copies from a buffer in an otherwise unused part of the stack page ($0100-$017F).
Re: Looking for NES Coder ( Paid )
by on (#151400)
I thought most games used forced blank.
Re: Looking for NES Coder ( Paid )
by on (#151402)
psycopathicteen wrote:
Code:
-;
lda ({tile_address}),y   //5
sta {vram_port}      //4 9
iny         //2 11
cpy #$10      //2 13
bne -         //3 16

You really shouldn't compare and branch every byte, when you know that each tile is 16 bytes. Unrolling this loop to copy 16 bytes at a time already represents a big speed boost. Still, having to increment Y for every byte and using indirect indexed addressing is too slow for my taste. I'd rather interleave the bytes and use indexed addressing with increasing base addresses in an unrolled loop, or even buffer the tiles in RAM beforehand and copy them to VRAM with an unrolled loop.

Quote:
It would take only 2048 cycles to upload 8 tiles, and vblank is more than 4096 cycles long.

As it's been pointed out, your math is a little off. With only 2273 cycles of VBlank, you have to do better than this if you expect to animate objects and update other things, such as backgrounds, palettes and OAM.

psycopathicteen wrote:
I thought most games used forced blank.

Most games don't! The ones that do are usually unlicensed.
Re: Looking for NES Coder ( Paid )
by on (#151406)
This is what happens in a typical unrolled tile copy, at 140 cycles per 16-byte tile:
Code:
vram_copybuf = $0100
PPUDATA = $2007

; prep code omitted
; carry is clear at this point
copyloop:
  .repeat 16, I
    lda vram_copybuf+I,x
    sta PPUDATA
  .endrepeat
  txa
  adc #16
  tax
  cpx vram_copylen
  bcc copyloop
; fixup code omitted


The .repeat block in ca65 expands into this:
Code:
  lda $0100,x
  sta $2007
  lda $0101,x
  sta $2007
  lda $0102,x
  sta $2007
  ; ...
  lda $010F,x
  sta $2007
Re: Looking for NES Coder ( Paid )
by on (#151408)
If you store your data to upload on the stack and unroll your code, you can easily get 8 cycles per byte:
Code:
.repeat 16
    pla ; 4 cycles
    sta $2007 ; 4 cycles
.endrepeat

If you want to write a generator to unroll and store your tiles as code in ROM, you can get down to 6 cycles or less per byte:
Code:
    lda #$05 ; 2 cycles
    sta $2007 ; 4 cycles
    ldx #$39 ; 2 cycles
    stx $2007 ; 4 cycles
    ldy #$73 ; 2 cycles
    sty $2007 ; 4 cycles
    ...

If you can order the choice of register to make loads redundant (e.g. if you lda #$00 you can sta $2007 many bytes of zeroes), to save 2 more cycles each time. (You probably wouldn't do this in combination with a forced vblank, though, since you'd normally need a consistent cycle count for that.)

You can also dynamically build this code in RAM if you want to save ROM space, at the expense of extra setup time outside of vblank.


As for games that use forced vblank, there are very few. If you have bankable CHR-ROM, there's generally not a need, it's mostly just for animating tiles with CHR-RAM. Not a lot of games actually did that.
Re: Looking for NES Coder ( Paid )
by on (#151410)
Tepples, I hope your carry is clear before that adc #16.
Re: Looking for NES Coder ( Paid )
by on (#151411)
rainwarrior wrote:
Tepples, I hope your carry is clear before that adc #16.

The prep code clears it.

And PLA is as slow as LDA a,X.
Re: Looking for NES Coder ( Paid )
by on (#151417)
rainwarrior wrote:
If you store your data to upload on the stack and unroll your code, you can easily get 8 cycles per byte:

You can also get 8 cycles per byte straight off the ROM if you're OK with copying groups of tiles instead of single tiles, and interleaving the bytes of all the groups (creating structures of arrays, as the 6502 likes it). For example, if using groups of 64 bytes (4 tiles) you could address 16KB of CHR data with an 8-bit index:

Code:
   offset = 0
.repeat 64
   lda $8000+offset, x
   sta $2007
   offset = offset + 256
.endr

This would work well for UNROM for example.

Quote:
If you want to write a generator to unroll and store your tiles as code in ROM, you can get down to 6 cycles or less per byte:

That's something I considered doing for a handful of animated objects, as well as the main character. Definitely not for all the graphics in a game.

Quote:
You can also dynamically build this code in RAM if you want to save ROM space, at the expense of extra setup time outside of vblank.

I have to say I'm not a fan of spending so much time just preparing data like that.

BTW, I just noticed we've had this conversation before.

Anyway, you know what would've been sweet? If there was an option to select $2004 or $2007 as the target for DMA writes. It wouldn't do much for name table updates (besides allowing a full background update in a single frame), but it would've been a great help for managing CHR-RAM. I know it's silly to think of what could have been... the console is what it is and we must accept it's limitations, but wouldn't it be nice if a mapper could add this feature?
Re: Looking for NES Coder ( Paid )
by on (#151418)
tokumaru wrote:
Anyway, you know what would've been sweet? If there was an option to select $2004 or $2007 as the target for DMA writes. It wouldn't do much for name table updates (besides allowing a full background update in a single frame), but it would've been a great help for managing CHR-RAM. I know it's silly to think of what could have been... the console is what it is and we must accept it's limitations, but wouldn't it be nice if a mapper could add this feature?

Wasn't that basically what the dual WRAM/CHR-RAM mapper idea was for?
Re: Looking for NES Coder ( Paid )
by on (#151421)
rainwarrior wrote:
Wasn't that basically what the dual WRAM/CHR-RAM mapper idea was for?

That was nice, but way to complicated to implement, IMO. A DMA feature built from the ground up would be complicated too, I know. Being able to reuse the existing DMA functionality but routing writes to $2007 instead would be the really cool thing I think, but that's probably not possible.
Re: How fast is dynamic sprite loading?
by on (#151424)
So it could do 16 tiles per frame even without forced blank. So that means that if DKC got ported to the NES, the sprites would be half their size, half the amount, and half the framerate.
Re: How fast is dynamic sprite loading?
by on (#151425)
psycopathicteen wrote:
if DKC got ported to the NES

The NES has bankable CHR-ROM solutions, though. Why not just use that? They probably would have used that on SNES if it was capable.
Re: How fast is dynamic sprite loading?
by on (#151426)
rainwarrior wrote:
psycopathicteen wrote:
if DKC got ported to the NES

The NES has bankable CHR-ROM solutions, though. Why not just use that?

The four windows of MMC3 work for the player and three enemies at once. If there are more independently animated enemies, you have to group enemies into enemy sets and duplicate each enemy's sprite tiles in the tile bank associated with each enemy set in which it appears, as Kirby's Adventure does. This is part of why Teenage Mutant Ninja Turtles II stops the scroll so often, so that the two players never encounter more than two distinct enemy types at once.
Re: How fast is dynamic sprite loading?
by on (#151427)
You could make a mapper that divides it as fine as you need? 16 slots would allow 16 characters with up to 16 tiles each (even though you could only display half of it in any given frame). If your characters aren't overlapping vertically, you could also use the MMC3's scanline counter to multiplex its existing 4 banks.

Also, we're forgetting that Hummer Team already ported Donkey Kong Country to the NES:
https://www.youtube.com/watch?v=fBeD-kEHy3E

(As you might have guessed, it uses 1k CHR-ROM banking.)
Re: How fast is dynamic sprite loading?
by on (#151428)
rainwarrior wrote:
You could make a mapper that divides it as fine as you need? 16 slots would allow 16 characters with up to 16 tiles each

MMC5 goes halfway there, separating out sprite and background banks.

Quote:
Also, we're forgetting that Hummer Team already ported Donkey Kong Country to the NES:
https://www.youtube.com/watch?v=fBeD-kEHy3E

Impressive, and possibly representative of what NES games would have looked like had they remained in production at that time. (Compare Donkey Kong Land and the GBC port of DKC, though the Game Boy has CHR RAM and the GBC even has CHR HDMA.) Notice how it hides the inactive monkey and simplifies some of the maps to keep under the 4 actor limit, lest it slowdown and flicker (see 5:42). Nor does Hummer appear to understand the 29/8 time signature (see boss fight at 4:00), but that's beside the point.
Re: How fast is dynamic sprite loading?
by on (#151430)
rainwarrior wrote:
You could make a mapper that divides it as fine as you need?

Creating mappers is a lot of trouble, not everyone can do it. In addition to the hardware you also have to add support for it in the emulators you use (if they're open source).

Quote:
Also, we're forgetting that Hummer Team already ported Donkey Kong Country to the NES:
https://www.youtube.com/watch?v=fBeD-kEHy3E

Which looks really good, specially considering the limitations of the platform! If the animation wasn't so erratic, the controls and physics weren't so off, and the music wasn't so crappy, this could actually have done the DKC series justice. I think it looks better than the GBC version, even though the NES hardware is inferior.
Re: How fast is dynamic sprite loading?
by on (#151432)
https://www.youtube.com/watch?v=-K_ZRV-_jlE

I think this one looks and sounds better.
Re: How fast is dynamic sprite loading?
by on (#151433)
You want to see a really good DKC port? https://www.youtube.com/watch?v=gUz-Qc0c-9Q :lol:
Re: How fast is dynamic sprite loading?
by on (#151443)
Well, I was bored. This, unsurprisingly, uses 8x8 tile attributes. I don't know how many different BG tiles you can use, so I didn't really think about it when making this.

Attachment:
DKC NES.png
DKC NES.png [ 7.01 KiB | Viewed 1689 times ]
Re: How fast is dynamic sprite loading?
by on (#151444)
Honnestly it looks awful. DKC's graphics are already scaled down from a 3D representation, scaling them down a second time makes no sense, instead you'd have to scale down the original 3D models to 4 colours directly but I really think there's no way to make them look good on a NES or GBC class graphics.
Re: How fast is dynamic sprite loading?
by on (#151446)
This is how Hummer Team's version looks, for comparison. (Obviously within tile count, and standard 16 x 16 attributes.)
Attachment:
File comment: Hummer Team: Donkey Kong Country 4
dkc4_hummer.png
dkc4_hummer.png [ 8.01 KiB | Viewed 1676 times ]


You could certainly do a lot better than what you just posted, Espozo, and throwing away half your limitations didn't appear to help you any.
Re: How fast is dynamic sprite loading?
by on (#151449)
Bregalad wrote:
Honnestly it looks awful.
rainwarrior wrote:
You could certainly do a lot better than what you just posted, Espozo

Whoa there! I made this in about 5 minutes. :lol: Does it really look that bad though?

Bregalad wrote:
DKC's graphics are already scaled down from a 3D representation, scaling them down a second time makes no sense, instead you'd have to scale down the original 3D models to 4 colours directly

Why?

rainwarrior wrote:
throwing away half your limitations didn't appear to help you any.

Well, I'd need to know what they are first. I guess is it 256 tiles you can use for BGs?
Re: How fast is dynamic sprite loading?
by on (#151451)
Espozo wrote:
I guess is it 256 tiles you can use for BGs?

Normally, yes, but bankswitching can get you more. Also, since you're using 8x8 attributes, we can probably assume the use of the MMC5, which allows access to 16384 tiles at a time, enough to fill 17 screens without repeating a single tile. Personally, I'd rather stick to 256 tiles and 16x16 attributes, it's way more NES-like.

The problem with your version is that it's full of flat-colored areas, a direct side effect of cheap color quantization. Hummer team remade the graphics to include most of the overall details on the trees and ground, even if that meant changing the hues and the contrast a bit.
Re: How fast is dynamic sprite loading?
by on (#151461)
Espozo wrote:
Whoa there! I made this in about 5 minutes. :lol: Does it really look that bad though?

Do you expect us to praise it in light of how little effort you spent on it? O_o

Espozo wrote:
Well, I'd need to know what they are first. I guess is it 256 tiles you can use for BGs?

Shiru made a really good tool for creating NES backgrounds. If you can do it with this, you can easily put it on the NES:
https://shiru.untergrund.net/files/nesst.zip
Re: How fast is dynamic sprite loading?
by on (#151462)
Espozo wrote:
You want to see a really good DKC port? https://www.youtube.com/watch?v=gUz-Qc0c-9Q :lol:


If they want to make a Sega Genesis game, they should at least have 50 layers of cardboard walls that for some reason merge into a single layer whenever scrolling up or down.
Re: How fast is dynamic sprite loading?
by on (#151464)
What?
Re: How fast is dynamic sprite loading?
by on (#151466)
tokumaru wrote:
What?


I was talking about the fake parallax in a lot of Sega Genesis games.
Re: How fast is dynamic sprite loading?
by on (#151468)
It may help if you used a visual aid of some sort to explain. (I have no idea what you're talking about either.)
Re: How fast is dynamic sprite loading?
by on (#151472)
Line scrolling.
Re: How fast is dynamic sprite loading?
by on (#151473)
The Genesis VDP always operates in the equivalent of Super NES mode 2: two background layers with an optional offset per 16-pixel vertical strip. Typically, gameplay happens on the front background, and the rear background is divided into a whole bunch of horizontal strips that are scrolled at different rates for raster parallax. (Super NES games can do the same with HDMA to BG2's horizontal scroll port.) However, raster parallax strips cannot always scroll up and down at different rates. So instead, games using raster parallax will do one of two things.
  1. One is hold the rear background at a constant vertical position, which appears as a background that's essentially infinitely far. Look at how the grass moves horizontally and doesn't move vertically in Sonic the Hedgehog 2 in Emerald Hill Zone act 1 (video).
  2. The other is to scroll the rear background vertically at the same rate as the front one. This produces an effect that the camera is being pitched (tilted) up and down. I can't dig up a Genesis example right now, but SMB3 World 1-5 of Super Mario All-Stars (video) and several scenes of the Nintendo 64 game Kirby 64: The Crystal Shards (video) have this pitching effect.

Second split coming soon.
Re: How fast is dynamic sprite loading?
by on (#151475)
psycopathicteen wrote:
Line scrolling.
What is this supposed to explain?

tepples wrote:
irrelevant stuff
I know what parallax scrolling usually looks like. I'm sure tokumaru does too. I don't know why you're suddenly trying to explain that.

What I really would like to see is a picture of this:
psychopathicteen wrote:
50 layers of cardboard walls that for some reason merge into a single layer whenever scrolling up or down
^ That actually sounds like an interesting thing to see?


Also, please don't split the thread. It's really irritating to follow.
Re: How fast is dynamic sprite loading?
by on (#151482)
I'll try to find a visual aid. You can see this type of scrolling in Aquatic Ruins Zone in Sonic 2. The grass looks like it's drawn on multiple flat walls. Notice how the details in one layer of grass end abruptly where the top of the next layer begins.
Re: How fast is dynamic sprite loading?
by on (#151483)
Yeah okay, that's an example I can understand.
Aquatic Ruins Zone: https://www.youtube.com/watch?v=2RGsqR70kes

I'm reminded of Phalanx on the SNES.
Phalanx: https://www.youtube.com/watch?v=hEpTUXH-Vn0#t=2m12s

I guess to do the effect right you have to alternate layers frequently, so that you can hide or show the overlapping zones in varying amounts.

Also I suppose games like Hang On and Rad Racer are doing a similar thing.
Rad Racer: http://forums.nesdev.com/viewtopic.php?t=8588


Your description was actually reminding me of some arcade games like Outrun or Lucky and Wild where walls and tunnels were done with piles of huge overlapping sprites.
Lucky and Wild: https://www.youtube.com/watch?v=_RqqaqUHsuU#t=7m05s



By the way, sorry if I'm speaking very abrasively today, I think I'm in a weird mood and should probably try to tone it down a bit.
Re: How fast is dynamic sprite loading?
by on (#151489)
Tl;dr. I wanted to use this in a Minecraft game for NES, but quit because of many limitations and complications. What about if there's a mapper or a microcontroller on the cartridge and that the pattern table keeps changing as the NES CPU tells the microcontroller which tile at which address of the VROM to map. That way all tiles would dynamically be loaded so you would have so many different sprites on screen like imagine SMB3 no longer having that glitch where you put enemies from 3 different environments on the same level.

Now, that microcontroller would need to be ultra fast, plus be able to constantly listen to what the PPU is doing with its pins. An 8-bit XMEGA at 32MHz maybe wouldn't do that so I think there would need to be homemade chips which is impossible so I would need to use FPGA, but I don't know why many people don't like FPGA. It's fun when you know how to do build a PC from logic circuits and bistables to serious CPUs. I just hope that the FPGA chip might be resoldered from its development board onto the cartridge board to the correct positions without getting thermal damages from the unsoldering. Or maybe if there's an FPGA in DIP package, that would be cool. Just plug it in the dev board and program it and then onto the cartridge to test how it works.
Re: How fast is dynamic sprite loading?
by on (#151500)
Honestly, to get rid of the ugly line scrolling effect, I think the best way to get about it is to have some sprites that stick out at the point the line scrolling occurs so it isn't like an obvious line. If there's something like hills and then clouds a good bit above, I think it looks fine. One thing I think they should do to make it look less obvious is to have it to where in a level with limited vertical movement, you have it to where say if there were clouds and a mountain in the background and you where really high up, they'd be very close to the split point, but if you got down lower, they'd be further apart, if that makes sense. It would be horizontal and vertical movement, versus just horizontal, which isn't very convincing. Does that make sense?

(About earlier though, I think that using the MMC5 to emulate an SNES game is justifiable, but that's just my opinion.)
Re: How fast is dynamic sprite loading?
by on (#151507)
Espozo wrote:
Honestly, to get rid of the ugly line scrolling effect

Who has declared it objectively ugly? I don't love it, but I'm also not bothered by it.

Quote:
I think the best way to get about it is to have some sprites that stick out at the point the line scrolling occurs so it isn't like an obvious line.

Yes, that is a nice effect. But come on, this is the background we're talking about, people don't pay that much attention to backgrounds.

Quote:
If there's something like hills and then clouds a good bit above, I think it looks fine. One thing I think they should do to make it look less obvious is to have it to where in a level with limited vertical movement, you have it to where...

You really like this expression, don't you? Sorry for pointing that out, but it's been a while since I noticed it's nearly impossible to find a post from you that doesn't have it.

Quote:
if there were clouds and a mountain in the background and you where really high up, they'd be very close to the split point, but if you got down lower, they'd be further apart, if that makes sense. It would be horizontal and vertical movement, versus just horizontal, which isn't very convincing. Does that make sense?

Yes, it makes sense, but implementing this effect in 2 dimensions is certainly not trivial with the hardware available in 16-bit consoles, so developers often went with what was cheap to do.

Hidrocity Zone in Sonic 3 had a really cool vertical perspective effect on the water surface. Notice how the alignment between the backmost and frontmost parts of the water surface changes depending on the vertical position of the camera, and how the whole surface is stretched to make it look like it really does extend the whole distance.
Re: How fast is dynamic sprite loading?
by on (#151509)
tokumaru wrote:
Who has declared it objectively ugly? I don't love it, but I'm also not bothered by it.

Well, I mean, less obvious?

tokumaru wrote:
Yes, that is a nice effect. But come on, this is the background we're talking about, people don't pay that much attention to backgrounds.

Well... :wink: The obvious problem about this is that this affects the total sprite limit and the sprite pixel per scanline limit, but if you're safe on those, I think it would be a nice touch.

tokumaru wrote:
You really like this expression, don't you? Sorry for pointing that out, but it's been a while since I noticed it's nearly impossible to find a post from you that doesn't have it.

I never even noticed I did that. :oops: I'll try to break that habit. You can probably tell I'm not very good at writing...

tokumaru wrote:
Hidrocity Zone in Sonic 3 had a really cool vertical perspective effect on the water surface. Notice how the alignment between the backmost and frontmost parts of the water surface changes depending on the vertical position of the camera, and how the whole surface is stretched to make it look like it really does extend the whole distance.

That's the basic picture. DKC 2 and 3 also use that effect extensively. (Look at the ship hold levels in DKC2.)