This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Why not mapper 11 or 66 ?

Why not mapper 11 or 66 ?
by on (#214831)
Usually, homebrewers use NROM first, and then jump to MMC1 or MMC3, or perhaps UxROM, when they need more ROM space.

But I was thinking about the SMB/Duck Hunt cart, and it uses a very simple switch all $8000 PRG and/or switch all $2000 CHR register.

But very few (10?) games used the mapper 66 GxROM. Apparently Color Dreams / Wisdom tree used a very similar mapper (11).

And both mappers (11,66) are offered by Infinite NES lives.

So, my dumb question is, why not use it? It's simple as can be.
Re: Why not mapper 11 or 66 ?
by on (#214832)
I'm inclined to agree with you.

I think I could argue that na_th_an also agrees, because he made a GNROM-with-MMC3 IRQ board.
Re: Why not mapper 11 or 66 ?
by on (#214833)
There's a whole bunch of pretty sensible discrete logic mappers. If you're asking why more people aren't using them, I'd actually say that the reason might just be that there's more discrete mappers than homebrew that needs them. :P

Another reason is that a lot of these mappers don't use all the bits in their registers, and often have pretty low ROM size limitations. Stuff like AxROM / BNROM / UxROM where there's a single register and all the bits go to one place are easier to extend to larger ROM sizes. When the register contents are more mixed, it gets less straightforward, and with all of these you tend to run into emulators that only implement as many bits as actually were used on extant games.

Also having to bank the whole CHR-ROM at once is a bit limiting for its purpose, you need to either have things very well separated, or otherwise a lot of redundancy wasting ROM space. Having a few blank frames to unpack CHR-RAM is sort of a wash compared to that, espceially if SRAM isn't particularly expensive compared to EPROM.
Re: Why not mapper 11 or 66 ?
by on (#214834)
I think that when homebrewing first became a thing and people needed to test their programs on hardware, they had to resort to cannibalizing existing cartridges. You can't cannibalize SMB+DH easily, since most copies use glop tops, so other common cartridges with simple mappers were better choices for going beyond NROM.

Similarly, there are many advanced mappers with features superior to those of the MMC3 (more versatile IRQs, finer CHR switching, more PRG windows, etc.), such as the Bandai mapper that's exclusive to Japan, but people still often go for the MMC3 when they need raster effects and fine CHR switching, probably because of availability.
Re: Why not mapper 11 or 66 ?
by on (#214835)
tokumaru wrote:
I think that when homebrewing first became a thing and people needed to test their programs on hardware, they had to resort to cannibalizing existing cartridges. You can't cannibalize SMB+DH easily, since most copies use glop tops, so other common cartridges with simple mappers were better choices for going beyond NROM.
It turns out that earlier CNROM boards—NES-CN-ROM-256-01 through -05—can be trivially modified to become MHROM. (Like the famicom version of the cart, CPU D4 still goes to the '161, and the latched copy goes to the hole marked for D2)

But that doesn't mean people knew it was an option.
Re: Why not mapper 11 or 66 ?
by on (#214837)
lidnariq wrote:
But that doesn't mean people knew it was an option.

There's also the fact that most people tend to not to change the purpose of things. If they're looking for examples of bankswitching for a game, they're gonna look at what's been used in games, they're not gonna look at what multicarts do.
Re: Why not mapper 11 or 66 ?
by on (#214841)
It's a good question, the fact that lots of other people moved from NROM to UxROM in the past has a lot to do with it. People simply following in the same path as people before them.

While cost is very comparable if not equal, CHR-RAM is generally be more versatile than CHR-ROM. Additionally there seems to be a preference towards UxROM over BNROM because of UxROM's fixed 16KB PRG-ROM bank. There hasn't been much for discrete mappers with 16KB PRG-ROM banks, and CHR-ROM. Is there even a discrete mapper out there with 16KB PRG-ROM banking and CHR-ROM?

Based on some arguments here I've come to believe 32KB PRG-ROM banking is more versatile than 16KB, but seems like most don't agree or fully understand what to do in order to take advantage of 32KB banking.

People may also not realize how easily mapper 11 can be extended to 512KB PRG-ROM. While homebrews have been utilizing 512KB UxROM for years. My discrete mapper boards support 512KB PRG-ROM & 128KB CHR-ROM mapper 11, but I don't have them listed for sale and rarely get requests for them.

I am aware of a few recent projects that are utilizing mapper 11, it's especially handy when creating multicarts of simple NROM games.

EDIT: All told I think my personal discrete mapper of choice would be 128-512KB BNROM, but 32KB of banked CHR-RAM. It's a single chip mapper and still has 2 register bits available for other potential features including single screen mirroring (would be better defined as AxROM). Toss a tiny cheap little 74LVC1G97 multifunction gate on the board and you could have software selectable H/V mirroring or 4-screen mirroring. But once you get into non-standard mapper design it can cause more obstacles to development than it's worth..
Re: Why not mapper 11 or 66 ?
by on (#214843)
infiniteneslives wrote:
Is there even a discrete mapper out there with 16KB PRG-ROM banking and CHR-ROM?
Seven.
Re: Why not mapper 11 or 66 ?
by on (#214845)
That's a nifty table! Didn't realize it existed.
Re: Why not mapper 11 or 66 ?
by on (#214847)
tokumaru wrote:
I think that when homebrewing first became a thing and people needed to test their programs on hardware, they had to resort to cannibalizing existing cartridges.

As someone around during that time period: this is correct. NROM, MMC1, specific models of UxROM, CNROM, and (to a lesser degree) MMC3 were the most prevalent cartridges you could find en masse (read: cheap and common). Switching to smaller font for some early nesdev and emulation history:

And in the early days, MMC3 wasn't even a viable choice -- nobody had figured it out reliably yet, especially the IRQ functionality (the nuances of which documented in the Wiki should act as proof). I know this factually because MMC3 + battery-backed SRAM was the dev cart type I wanted from near the very beginning (and nobody had written a sane/clear document on how to rewire MMC3 to use EPROMs. I will never forget how many people told me "it's not hard, just wire the pinouts accordingly" (this was horseshit then and still is -- it's more complicated than that)); I actually bought several Crystalis NES cartridges for this very purpose. There were always the "special snowflakes" who wanted MMC2 details (for emulating Punch-Out!! reliably) and then later, MMC5 (which is still a major sore spot to this day).

Most of the early homebrew efforts did not involve extensive hardware folks (e.g. kevtris), they involved dumping carts and disassembling ROMs and making some educated guesses, solely for the purpose of achieving emulation. Alex Krasivsky (Landy) and Marat Fayzullin (RST38h) were two of the original pioneers. Kevin's efforts (particularly his mapper document) revolutionised things because for the first time we had an actual (and excellent quality) HW person, who also knew 6502, doing excellent analysis. There were other people doing these types of efforts at the time (especially for Famicom-centric mappers like Konami's VRC2 and VRC4) -- guys like Goroh, and then later, Firebug -- but it wasn't until much later that this information arrived in a digestible form.

There was little if any communication between all the folks doing the RE work, so everyone seemed to be working independently -- this was one of several reasons why my nestech.txt never contained mapper details (I intentionally didn't include such, because information was coming from several conflicting sources).

In later days, some random tidbits of mapper details started trickling out of Japan (US/EU folks were operating independently of JP; neither had any idea the other were doing the same stuff at the same time for unrelated reasons, but language barrier was a big one).

Not to brag or toot my own horn, but this was why nestech.txt was "revolutionary" -- I became a sort of de-facto guy who tried to organise all the (non-mapper) information into a centralised document that was easy for both emulator authors and homebrewers to use. The only other thing at the time (early and mid-days) was Marat Fayzullin's NES.DOC. All the super precise details of the system didn't come until way later, long after the early emulation heyday.


Furthermore, re: mapper 11, nobody in the early days had gone to great lengths to reverse-engineer Color Dreams' stuff. The more "obscure" games' mappers came much later, as the cartridges weren't as prevalent (read: more rare, thus worth more).

NES emulation was what drove about ~80-85% of this effort, not homebrew. Everyone had different hobbies/interests -- I happened to fall into both categories -- but the NES homebrew scene did not really come to fruition until maybe the middle of NESticle's lifetime. Small font again:

In contrast, the SNES homebrew scene in the early 90s was significantly different because emulation wasn't on anyone's radar -- nobody was even considering it back then (systems were too slow, and nobody really cared, to be honest). All people wanted then was CPU documentation (tons of it available thanks to the Apple IIGS), MMIO register documentation (early days were a single document by some Amiga dudes named Dax & Corsair, later stuff was my SNES documentation, and then within a couple years people started getting their hands on official SNES documentation from leaked sources, but several revisions all mixed together, so it was even more chaotic to follow than what's out there now), and SPC documentation (official docs had this but it's sub-par, and I didn't do sound stuff (I still don't), so SPC homebrew stuff was done by mainly the elite), and console copiers (for homebrew and playing games).
Re: Why not mapper 11 or 66 ?
by on (#214848)
Quote:
but seems like most don't agree or fully understand what to do in order to take advantage of 32KB banking


1.copy some bank switch code into the RAM
2.jump there to switch banks, then jsr to a entrance code part of the bank, which reads the game state and uses a jump table to get to the appropriate subroutine
3.have a reset code in every PRG bank which can return you to the main bank, if the user presses reset.

The advantage of 32k PRG banks would be to have a contiguous code block or data block that doesn't need to be split awkwardly.
Music code and data might benefit from having its own 32k block.

The disadvantage would be, DMC samples might have to be copied in more than 1 bank, wasting space.
And, CHR banks are less efficient, if you have the same graphics partially copied in multiple 8k CHR banks.

I suppose, you lose mirroring changes, without the MMC. And no scanline counter.

EDIT - every bank needs an NMI handler too, which does not bode well for "all in the NMI" style programs
Re: Why not mapper 11 or 66 ?
by on (#214849)
infiniteneslives wrote:
There hasn't been much for discrete mappers with 16KB PRG-ROM banks, and CHR-ROM. Is there even a discrete mapper out there with 16KB PRG-ROM banking and CHR-ROM?

Remember that test ROM you commissioned from me, called Holy Mapperel? Remember what it used to be called?

Holy Diver and Uchuusen Cosmo Carrier, which differ in whether the mirroring control bit controls H/V or 0/1.

infiniteneslives wrote:
People may also not realize how easily mapper 11 can be extended to 512KB PRG-ROM.

Agreed. It's just that by the time you're past 128 KiB PRG ROM, you need something with slightly more finer grained CHR banking than 8 KiB.

infiniteneslives wrote:
EDIT: All told I think my personal discrete mapper of choice would be 128-512KB BNROM, but 32KB of banked CHR-RAM.

Which is within your definition of oversize mapper 11. But 8K banking has its limits unless you're just doing one bank for playfield and one for status bar.
Re: Why not mapper 11 or 66 ?
by on (#214852)
dougeff wrote:
Quote:
but seems like most don't agree or fully understand what to do in order to take advantage of 32KB banking


1.copy some bank switch code into the RAM
2.jump there to switch banks, then jsr to a entrance code part of the bank, which reads the game state and uses a jump table to get to the appropriate subroutine
3.have a reset code in every PRG bank which can return you to the main bank, if the user presses reset.
Yes simple as it sounds I think it's what some people see an an avoidable hurdle by sticking with UxROM they're already familiar with. But I've also heard these folk complain about how precious their limited 16KB fixed bank is...

Quote:
The disadvantage would be, DMC samples might have to be copied in more than 1 bank, wasting space.
This is arguably a bigger problem on traditional UxROM as samples must take up precious fixed bank space $C000-FFFF.

Quote:
I suppose, you lose mirroring changes, without the MMC. And no scanline counter.

There's always software select-able mirroring with mapper 78 (which could be extended to most discrete mappers), and DMC IRQ abUSE.
Re: Why not mapper 11 or 66 ?
by on (#214853)
Well, I used mapper 11 for Attribute Zone. (Note that, depending on the program, whether the CHR bankswitching is the high bits or low bits it might be better one way or other, although you can easily work with either one in either way anyways, by adding a few extra instructions.)
Re: Why not mapper 11 or 66 ?
by on (#214854)
Speaking entirely for myself, the choice of going with MMC3 is pretty basic.
It's a very versatile mapper with the incredibly useful added functionality of a scanline counter, and it is one of the most common mappers out there. And with that I'm not talking about cannibalizing other cartridges, but simply that I think it's extremely representative of what the NES had to offer during my favourite years of its life time. There are tons of uncommon mappers from unlicensed games etc. that do other interesting things such as using all four nametables, etc. but personally I feel like that's doing something too different from what I am actually interested in.
Re: Why not mapper 11 or 66 ?
by on (#214855)
If I remember well mapper #11 is basically a nybble swapped mapper #66 with lockout defeat, so it's not much more interesting to use than #66.

As for #66 I believe there's two fundamental reasons it's less often used:

1) Simply put, it's much deeper in iNES mapper numbering than other common mappers. For that reason, it is more likely to be ignored and/or not considered as an option. Even if it's a Nintendo official mapper.
2) 8 KB CHR-ROM banking is often impractical, as it means that whole sprite sheets have to be swapped with BG sheets. For smaller and simple games which fits in 32 KB PRG (CNROM) it's not so much an issue as it's possible to have simple gameplay and either 3 or 4 level graphics layouts. But for larger games which uses more PRG, it becomes a handicap to have only 4 pages of CHR-ROM which have to be wholly switched, as opposed to finer grained CHR-ROM switching.
Re: Why not mapper 11 or 66 ?
by on (#214974)
dougeff wrote:
Quote:
but seems like most don't agree or fully understand what to do in order to take advantage of 32KB banking


1.copy some bank switch code into the RAM
2.jump there to switch banks, then jsr to a entrance code part of the bank, which reads the game state and uses a jump table to get to the appropriate subroutine
3.have a reset code in every PRG bank which can return you to the main bank, if the user presses reset.

The advantage of 32k PRG banks would be to have a contiguous code block or data block that doesn't need to be split awkwardly.
Music code and data might benefit from having its own 32k block.

The disadvantage would be, DMC samples might have to be copied in more than 1 bank, wasting space.
And, CHR banks are less efficient, if you have the same graphics partially copied in multiple 8k CHR banks.

I suppose, you lose mirroring changes, without the MMC. And no scanline counter.

EDIT - every bank needs an NMI handler too, which does not bode well for "all in the NMI" style programs


You don't need to be that fiddly - I mean having to copy the routine to RAM. I usually reserve some bytes just beside the vectors in each ROM, and copy the same exact bankswitching code. When you change banks, the new banks will have the exact same contents, so no problems.

I usually work with a set of NROMs I paste together using a custom tool which writes the correct iNES header as well.

INL's extended mapper 11 gives you up to 16 32K PRG-ROM pages and 16 8K CHR-ROM pages with bus conflicts, but you don't usually need every combination of PRG and CHR accessible from every bank, so I use an indexed table of the actual values. Such table can lie anywhere on ROM. So my setup is something like this:

cc65 cfg:

Code:
MEMORY {
    [...]
    RJM: start = $ffc0, size = $3a, file = %O, fill = yes;
    [...]
}

SEGMENTS {
    [...]
    ROMCHGR:  load = RJM,          type = rw;
    [...]
}


crt0.s
Code:
.segment "RODATA"
[...]
   ; This can be big so place here
   .include "bus_conflict_tbl.s"

.segment "ROMCHGR"
   _change_rom:
   lda #0
   sta PPU_MASK
   sta PPU_CTRL

   ldx $0300
   lda bus_conflict_tbl, x
   sta bus_conflict_tbl, x

   jmp start

_change_reg:
   ldx $0300
   lda bus_conflict_tbl, x
   sta bus_conflict_tbl, x
   rts


"bus_conflict_tbl.s" has the table with the PRG/CHR combinations I need in every PRG bank. _change_rom does the bank switch, then jumps to "start", which contains the initialization code. _change_reg just bankswitch, which is mostly used when you are just changing the paged CHR-ROM.

This is an example of "bus_conflict_tbl.s"
Code:
   ; This ROM pages in PRG0:CHR0, PRGD:CHRD or PRGB:CHRD

   bus_conflict_tbl:
      .byte $00, $DD, $DB
      


My initialization clears all RAM but a small section I use so the different ROMs can communicate. I perform a simple CRC-like check on these values to invalidate them. If an invalid combination is found, all banks page in PRG0:CHR0, which works great as mapper doesn't guarantee an initial state.
Re: Why not mapper 11 or 66 ?
by on (#237721)
rainwarrior wrote:
There's a whole bunch of pretty sensible discrete logic mappers. If you're asking why more people aren't using them, I'd actually say that the reason might just be that there's more discrete mappers than homebrew that needs them. :P


Were there any discrete-logic based mappers that provided an IRQ or other means to assist vertical screen splits? It would seem like one could get a mapper that combined CHROM banking with IRQ generation using only two common 74-series chips. Use a a 74HC688 (8-bit identity comparator) to detect a PPU address of the form 01 111x xxx1 x00x and use that to gate an enable-controlled flip flop like the 74HC377 (a smaller one might also suffice) that captures bits 0 and 5-8 of the address. Feed the LSB of that flip flop to /IRQ, and the remaining bits to CHROM address lines.

If tiles use the lower address range, or if code avoids even-numbered background tiles $E0-$FE, background tile fetches would never enable the flip flop. The flip flop would be enabled, however, when accessing the first two lines of an 8x16 sprite using tile $F0-$FF. The first such fetch would turn on /IRQ and the second would automatically turn it off. The bottom four bits of the tile number would be loaded into four bits of the CHROM address, thus making it possible to cleanly split the screen between zones using different tile sets merely by placing a sprite at each zone boundary. If one of the sixteen tile banks was blank, that could easily be used to clean up the top and bottom edges of vertically-scrolling games that use vertical mirroring, even without the main CPU having to do handle the IRQ. If a game used eight-way scrolling along with a "score" zone, it could start the frame with a blank CHROM bank selected, use a sprite to trigger a split just above where the score is supposed to appear, use a second sprite to trigger a split at the bottom of the score which would load scrolling registers, a third if needed at the point where the name table would need to wrap to avoid the score zone, and a fourth to switch back to the empty CHROM bank.

Alternatively, if a game doesn't use scrolling and avoids even-numbered background tiles $E0-$FE, one could use those tiles numbers as the cue to switch banks, again with the CPU either using or ignoring the IRQ as appropriate for the application.

While this approach would require that one forego using part of the tile set, the restriction would seem less severe than that imposed by some other mappers that require that all sprites be placed in the opposite side of the address space from background tiles.
Re: Why not mapper 11 or 66 ?
by on (#237723)
supercat wrote:
Were there any discrete-logic based mappers that provided an IRQ or other means to assist vertical screen splits?
Plenty of pirate mapper hacks used a fixed some-power-of-two CPU cycle IRQs with discrete logic – usually a CD4020. Extremely few were clocked off the PPU (example), and especially no historical mapper ever used a "catch exactly one address" to trigger an IRQ. I think it was just too expensive in parts (for a discrete logic implementation) or in package pins (for an ASIC) to be justifiable in comparison to other implementations.
Quote:
Use a a 74HC688 (8-bit identity comparator)
Those aren't particularly affordable... much like the 74'670 that showed up in a bunch of historical unlicensed discrete logic mappers, programmable logic is just so much cheaper now that many discrete logic parts don't make sense.
Quote:
a PPU address of the form 01 111x xxx1 x00x
Note that any sprite slot not used on any given scanline out of the limit of 8 instead fetches from tile $FF. I don't know which row of that tile.

Quote:
less severe than that imposed by some other mappers that require that all sprites be placed in the opposite side of the address space from background tiles.
That's really not as big of a benefit as you might think; the PPU makes it less useful to use the same side for both because you don't get to pick and choose from tile to tile. Plus the most conspicuous example of this constraint - MMC3 - already has more sophisticated banking available underneath, largely obviating this utility.
Re: Why not mapper 11 or 66 ?
by on (#237724)
Mapper 163 can be made to auto-switch between two 4 KiB CHR banks by looking at PA9 during nametable reads. Mapper 518 can be made to auto-switch between two 4 KiB CHR banks by looking at PA10 or PA11 during nametable reads. But both are globtop ASICs, not discrete mappers.
Re: Why not mapper 11 or 66 ?
by on (#237725)
lidnariq wrote:
supercat wrote:
Were there any discrete-logic based mappers that provided an IRQ or other means to assist vertical screen splits?
Plenty of pirate mapper hacks used a fixed some-power-of-two CPU cycle IRQs with discrete logic – usually a CD4020. Extremely few were clocked off the PPU (example), and especially no historical mapper ever used a "catch exactly one address" to trigger an IRQ.


How did such mappers control when the counters ran so as to allow them to generate interrupts at the proper places?

Quote:
Use a a 74HC688 (8-bit identity comparator)
Those aren't particularly affordable... much like the 74'670 that showed up in a bunch of historical unlicensed discrete logic mappers, programmable logic is just so much cheaper now that many discrete logic parts don't make sense.

Digi-key shows $0.75 or so in onesies, or $0.30 in quantity 2000. One could replace the 74HC688 with a 13-input NAND and a few inverters, if desired, and I agree programmable logic is also often a good way of doing things. I was more interested in whether anyone had used any sort of address-triggered approach, than in the exact chips they might have used.

Quote:
a PPU address of the form 01 111x xxx1 x00x
Note that any sprite slot not used on any given scanline out of the limit of 8 instead fetches from tile $FF. I don't know which row of that tile.

I hadn't thought of that, but that might be a problem with using that particular tile range. Using a different tile range would alleviate the issue in any case. What do you think of the concept from an ease-of-programming standpoint? If nobody's implemented such a thing yet, would you see any particular problem? Incidentally, I'm new to the NES, but I've done some rather interesting 6502 mappers for the Atari 2600. One of my favorites simplifies the act of plotting a pixel at coordinate (x,y) of a 96x192 high-res screen to--quite literally:

lda $7F00,x ; Fetch bitmask and set bank of $7E00 as appropriate for this vertical stripe
ora $7E00,y
sta $7E00,y

That mapping scheme required a Xilinx XC9536XL, but ended up making many things more convenient on the Atari's 13-bit address space than they would have been on a straight linear-mapped 16-bit address space. One of my ambitions is to port (and finish) Ruby Runner, a game shown below for the 2600 which used the above mapper (not for high-res graphics, though, but to expedite the loop that had to fetch every tile in the level and decide what to do with it). I'd like to do smooth vertical and horizontal scrolling, but that would pose some interesting challenges given the need to redraw all of the tiles every "gametick" (four animation frames). While I might use a sprite for the player, everything else would be background tiles.

Image

Quote:
less severe than that imposed by some other mappers that require that all sprites be placed in the opposite side of the address space from background tiles.
That's really not as big of a benefit as you might think; the PPU makes it improbably you'd want to use the same side for both because you don't get to pick and choose from tile to tile. Plus the most conspicuous example of this constraint - MMC3 - already has more sophisticated banking available underneath, largely obviating this utility.[/quote]

When using 8x16 sprite mode, 128 tiles will be in one bank and 128 in the other. So games needing ready access to more than 128 sprites might need to take them from both halves of the address space.

In any case, my question was whether a simple banking approach could minimize the amount of circuitry required for raster effects.
Re: Why not mapper 11 or 66 ?
by on (#237726)
My ghostbusters romhack became mapper 66 when the rom was expanded by NewRisingSun for the new dpcm.

I'm going to say that I don't know how to choose a mapper. I wish there was some program where we could check or choose mapper features and then it would tell you which mappers matched those features. It's pretty tough to read up on 500 mappers to find out which ones are similar to what you're using or to even know what's possible with current mappers.
Re: Why not mapper 11 or 66 ?
by on (#237727)
supercat wrote:
How did such mappers control when the counters ran so as to allow them to generate interrupts at the proper places?
They're not free-running. They usually are something like "latch output connected to 4020 asynchronous clear; some high bit of counter connected via inverter to /IRQ".

Since 1024 cycles is almost exactly 9 scanlines (plus 3 pixels) it's not too bad of a constraint.

Licensed PAL (2A07) is a lot less convenient, but I don't know how many unlicensed games showed up in those regions for play on the 2A07 instead of a Famiclone.

Quote:
I was more interested in whether anyone had used any sort of address-triggered approach, than in the exact chips they might have used.
Not beyond the MMC2/MMC4 tile-based bankswtiching.

The logic you suggest fits nicely inside a GreenPAK ... could just check for the top two scanlines of one tile. Say $FE, to be like MMC2.

That said, I don't usually look as high as qty 2k when I'm eyeballing BOM prices for NES things. I don't think very many games sell more than a couple hundred.

Quote:
What do you think of the concept from an ease-of-programming standpoint? If nobody's implemented such a thing yet, would you see any particular problem?
It's "easier" than most IRQs in that you just have to make sure to put a sprite in the right place. But...

Each IRQ per retrace uses up one of the not-very-many 64 entries
You have to make sure there's no chance of it getting bumped out of the 8-per-scanline limit, and it consumes some of the very limited overdraw if you do.

I personally think those compromises make it kinda tough to swallow. I think I'd prefer MMC3, even with the flaw you've identified.

Quote:
Incidentally, I'm new to the NES, but I've done some rather interesting 6502 mappers for the Atari 2600.
I see your name in the credits for the Harmony Cart. I'm confident you have interesting thoughts to share :)

Quote:
I'd like to do smooth vertical and horizontal scrolling, but that would pose some interesting challenges given the need to redraw all of the tiles every "gametick" (four animation frames).
The NES answer is "just use CHR bankswitching for that".

Quote:
When using 8x16 sprite mode, 128 tiles will be in one bank and 128 in the other. So games needing ready access to more than 128 sprites might need to take them from both halves of the address space.
But you can only display 128 total tiles of sprites in 8x16 at a time anyway. And your overdraw is very limited - a lot of games used 8x8 sprites even though that consumed more OAM entries, because it both reduced the scanlines of overdraw constraint and because it means that there aren't tons of empty spots in the CHR table.

There aren't a ton of circumstances where MMC3 CHR bankswitching is available and you need many different possible sprites and you don't want to set CHR banks to overlap where they refer to in CHR. The only one that comes to mind is deliberately only using three 1 KiB banks for backgrounds, and using the last 1 KiB plus the other two 2KiB ... but that's kinda contrived. It'd be better just to use a mapper that actually gave you eight 1 KiB banks instead, like VRC4 or RAMBO-1.

Any mapper that supports CHR bankswitching comes extremely close to making two bits in $2000 irrelevant.

Quote:
In any case, my question was whether a simple banking approach could minimize the amount of circuitry required for raster effects.
Maybe? I mean, it lets you avoid having the counter on the cart, because it's hidden in the OAM evaluation. But it's not usually the counter that's the hard part.

nesrocks wrote:
I'm going to say that I don't know how to choose a mapper. I wish there was some program where we could check or choose mapper features and then it would tell you which mappers matched those features.
I have a table, but it might be too full of jargon.

Tepples had written a selector, but it was written before various modern homebrew designs hit mass production, so I don't know if he'd revise that now.
Re: Why not mapper 11 or 66 ?
by on (#237729)
I wrote Mapper wizard back when discretes (0, 2, 3, 7, 34, 180) and MMC1 were the only mappers you could get as all new parts, and before retroUSB discontinued the ReproPak.
Re: Why not mapper 11 or 66 ?
by on (#237738)
I can agree that UxROM is very attactive to jump to after messing around with NROM. If you look at it, some of the early "great games" of the system used it like Castlevania or Mega Man. It's much easier to get into at first, since you don't have to worry about any kind of volatile trampoline routine to avoid "pulling the rug" from under your program. You can always stash another "I don't wanna do math"- lookup table in that 16k, or samples, or whatever else you might need at any time.

But then again, I'm saying in that while my current project uses BxROM (1-bit register for 2x32k PRG, 8k CHR-ROM that is not switched).
Re: Why not mapper 11 or 66 ?
by on (#237747)
lidnariq wrote:
Quote:
I'd like to do smooth vertical and horizontal scrolling, but that would pose some interesting challenges given the need to redraw all of the tiles every "gametick" (four animation frames).
The NES answer is "just use CHR bankswitching for that".


CHR bank switching will suffice for three out of four animation frames. If the fourth animation frame of one gametick had three tiles stacked in the pattern (top down) "rock leaving", "rock entering", and "empty", then the top tile will need to switch from the last frame of "rock leaving" to "empty", the middle tile will need to switch from the last frame of "rock entering" to the first frame of "rock leaving", and the bottom tile will need to switch from "empty" to the first frame of "rock entering". Even if only half of the name-table entries would need to change, I would think that updating 256 entries per frame using something like:

lda $C0
sta $2007
eor #$01
sta $2007
lda $C1
sta $2007
eor #$01
sta $2007
...
lda $C0
eor #$02
sta $2007
eor #$01
sta $2007
lda $C1
eor #$02
sta $2007
eor #$01
sta $2007
....

would take less time spread over the course of three frames to update a 32x24 bunch of tiles (in an off-screen nametable) than would be needed to selectively update half the tiles in such a table, especially since the latter approach would require that every tile actually get updated twice (if table 0 is showing and 1 is offscreen, it would need to be updated in table 1 which could then be brought to the front, but would then need to be written in table 0).

A mapper which could include some dual-port storage (interleaving PPU and CPU cycles) could make things much more convenient, but might be seen as cheating even though some FPGAs which include 7K of RAM cost less than $2. Unfortunately, those parts all have evil packages, and have inputs that are 3.3V tolerant but not 5V tolerant.

Is there any way to use burst DMA to update anything other than OAM entries? If I were using a mapper with RAM at $6000, burst DMA would seem like it could ease display updates, though I think putting 64 meta-tiles in zero page each frame and then using code to copy them to 256 nametable entries would probably be adequate.
Re: Why not mapper 11 or 66 ?
by on (#237750)
supercat wrote:
Digi-key shows $0.75 or so in onesies, or $0.30 in quantity 2000. One could replace the 74HC688 with a 13-input NAND and a few inverters, if desired, and I agree programmable logic is also often a good way of doing things. I was more interested in whether anyone had used any sort of address-triggered approach, than in the exact chips they might have used.


There was the MMC2 and MMC4, in 2011 I also prototyped a mapper called 8T-ROM which used the 13-input NAND plus an XC9536XL. I was trying to set it up so certain tile numbers can trigger bankswitching and IRQ at the same time. I don't clearly remember there being any technical reason that stopped development, mostly lack of time for the hobby around the time (had a baby). The mapper is in a kind of development hell where I still think it's neat, but I think I have better ways of doing stuff, between GreenPAKs and FPGAs.

I think it watched tiles $FC through $FF and I was wanting it to do multiple CHR switches per line. With the special tiles being the borders, you could make a window.

For IRQ use, one could make sprite #0 use tile $FE, it beats polling for sprite #0 hit, but it's not an ideal IRQ source. I think it will trigger during hblank, so you may have to wait until the next hblank. Easy to use, but I don't think it's much to get excited about, since most of the interest in IRQs is for multiple splits per screen. You could put multiple sprites with that tile, but it will be triggered 8 times per sprite.. if that could be pre-scaled by 8, so it could be triggered only once per sprite, that would be an improvement.
Re: Why not mapper 11 or 66 ?
by on (#237756)
lidnariq wrote:
That said, I don't usually look as high as qty 2k when I'm eyeballing BOM prices for NES things. I don't think very many games sell more than a couple hundred.


The main screen showed 1pc and 2kpc prices. Intermediate quantities sell for intermediate prices. It looks like 100pc price is $0.42; not the cheapest part in the universe, but not outrageous. My thought of using a 13-input NAND and some inverters seems bizarrely worse, since the cheapest 13-input NAND is over $2 at quantity 1000(!?). In any case, I'll look at MMC2 and MMC4 to see what they do.

Quote:
Quote:
What do you think of the concept from an ease-of-programming standpoint? If nobody's implemented such a thing yet, would you see any particular problem?
It's "easier" than most IRQs in that you just have to make sure to put a sprite in the right place. But...

Each IRQ per retrace uses up one of the not-very-many 64 entries
You have to make sure there's no chance of it getting bumped out of the 8-per-scanline limit, and it consumes some of the very limited overdraw if you do.


If one puts the necessary sprites at the start of the list I think that would protect them against being bumped. In most situations where I would think one might want to switch many times per frames, background tiles could be used for that (e.g. games that are laid out in 16x16 metatiles could switch back and forth between two tile sets at the start of each name table row, thus allowing code to store the same data into even and odd rows of the nametable, while almost doubling the number of usable tiles.

Quote:
I personally think those compromises make it kinda tough to swallow. I think I'd prefer MMC3, even with the flaw you've identified.


No mapper can be expected to be optimal for every game. I was pondering what kind of discrete-logic mapper would best suit the needs of Ruby Runner. If I extended things out to use a CPLD and an on-cart nametable RAM, I could make things work even better by having one of the latchable bits gate address bit 5 of the nametable RAM, thus eliminating the need to write half the rows.

BTW, one thing I've almost never seen 6502 mappers other than mine do on any platform, even though it makes things very convenient and efficient from a programming standpoint, is have 256-byte regions that can be mapped on 256-byte boundaries. Having a piece of code treat a data structure as occupying 128 banks of 256 bytes each is simpler than trying to have it treat data as two banks of 64 pages of 256 bytes each. My 2600 banking scheme only had one fine-control 256-byte region, but IIRC worked around that by making any CPU cycle in the address range $00FC to $00FF configure the fine-control region to access the page of RAM indicated by the byte that was read, and $00F8-$00FB do likewise selecting flash. Thus, one way to copy 256 bytes from one arbitrary page of RAM to another would be:

Code:
    ldy #127
loop:
    bit $FC   ; Assumes source page is stored in $FC
    lda $7E00,y
    ldx $7E80,y
    bit $FD   ; Assume destination page is stored in $FD
    sta $7E00,y
    txa
    sta $7E80,y
    dey
    bpl loop

A total cost of 3+4+4+3+5+2+5+2+3 = 11+10+10 = 31 cycles per 2 bytes. The use of absolute indexed addressing saves a cycle compared with indirect indexed, partially recouping the cost of bank switching. If I had enough logic to hold two page selections in my mapper, things could have been a bit better:

Code:
    ldy #63
loop:
    lda $7E00,y
    sta $7F00,y
    lda $7E40,y
    sta $7F40,y
    lda $7E80,y
    sta $7F80,y
    lda $7EC0,y
    sta $7FC0,y
    dey
    bpl loop

Achieving the same performance in a linear-address system would require using self-modifying code, which would add the overhead of setting up the function in RAM. When using fine-grained bank selection, such issues go away and the above code could execute out of ROM with no difficulty. Have you seen any Nintendo mappers based on page-level banking?
Re: Why not mapper 11 or 66 ?
by on (#237759)
supercat wrote:
[a fully unrolled LDA #immed / STA $2007 copy] would take less time spread over the course of three frames to update a 32x24 bunch of tiles (in an off-screen nametable) than would be needed to selectively update half the tiles in such a table, especially since the latter approach would require that every tile actually get updated twice (if table 0 is showing and 1 is offscreen, it would need to be updated in table 1 which could then be brought to the front, but would then need to be written in table 0).
I mean, sure, if you insist on designing your game to rely on being able to brute-force update the entire visible game state on every 4th frame, you can design a cart that has enough RAM to support it. But there was a licensed port of Boulder Dash published on the NES at it doesn't use anything resembling such heroics.

Most of the time, a design that uses the two nametables as a means of double-buffering is trying to make the NES act like some other console rather than work within the limited bandwidth.

And to be fair, when I optimized Driar from its original SGROM release down to NROM, I did something similar, using 1K of the CPU's RAM to hold fully-unrolled copying code to do updates to nametables, to work around no longer having meaningful CHR bankswitching.

Quote:
A mapper which could include some dual-port storage (interleaving PPU and CPU cycles) could make things much more convenient, but might be seen as cheating even though some FPGAs which include 7K of RAM cost less than $2. Unfortunately, those parts all have evil packages, and have inputs that are 3.3V tolerant but not 5V tolerant.
Unfortunately those cheaper ones might not have enough I/O pins. If you're willing to limit it to just licensed NESes (no famiclones) and are willing to guess where the ALE cycles are to demultiplex the PPU's address bus and are willing to make the CPU side interface a PITA, you need at least 10(PPU A9, A8, PPU AD7..0)+2(PPU /RD, PPU /WR)+8(CPU D7..0)+2(CPU A0,1)+2(M2,R/W)=24 IO pins. While there are iCE40UL parts in that range, one'd probably prefer to have all the CPU/PPU address/data pins to make the programmer's life less miserable. Which gets us back to the iCE40xx1K parts. At least some come in a TQFP...

Also, personally, I kinda think using an FPGA as only a dual ported RAM is a waste of an FPGA.

Quote:
Is there any way to use burst DMA to update anything other than OAM entries?
Nope.

You can map your own cart device to additionally listen to writes to $2004, but that's it.

supercat wrote:
It looks like 100pc price is $0.42; not the cheapest part in the universe, but not outrageous. My thought of using a 13-input NAND and some inverters seems bizarrely worse, since the cheapest 13-input NAND is over $2 at quantity 1000(!?).
Waaacky. Looks like everyone's trying to clear out their inventory; the only outfit with a low price is Rochester Electronics, for the LS version of the part, and only as a "you buy our remaining inventory".

Quote:
Having a piece of code treat a data structure as occupying 128 banks of 256 bytes each is simpler than trying to have it treat data as two banks of 64 pages of 256 bytes each.
I must be missing something... how does being able to bankswitch on A8 and up help with this particular transformation?

Quote:
Have you seen any Nintendo mappers based on page-level banking?
No licensed mappers used anything finer than 8 KiB. And to the best of my knowledge, the finest banking seen in any pirate mapper hack is 1KiB.
Re: Why not mapper 11 or 66 ?
by on (#237762)
lidnariq wrote:
I mean, sure, if you insist on designing your game to rely on being able to brute-force update the entire visible game state on every 4th frame, you can design a cart that has enough RAM to support it. But there was a licensed port of Boulder Dash published on the NES at it doesn't use anything resembling such heroics.


I've not played the NES Boulderdash nor seen any frame-accurate recordings of it. Does ensure that all displayed tiles get updated on the same frame? Because the tile cycling in Boulderdash doesn't involve motion between tiles, it probably wouldn't matter visually if some tiles were updated on one frame and some were updated on the next, so I'd guess that's probably what happens. Achieving the smoother motion shown in the Ruby Runner .gif I posted would require, however, that all tile updates occur synchronously with the switch from the last tile set to the first. I think that would be probably achievable even using a basic CNROM cart, but if other mappers could make it easier that would be nice to know.

Quote:
Most of the time, a design that uses the two nametables as a means of double-buffering is trying to make the NES act like some other console rather than work within the limited bandwidth.


I'd say that would depend on whether that whether the desired play mechanic could be achieved better some other way on the NES. I think the NES hardware would seem like an excellent fit for Ruby Runner save for the difficulties updating name-table RAM, and even with those difficulties I would think it would be workable.

Quote:
And to be fair, when I optimized Driar from its original SGROM release down to NROM, I did something similar, using 1K of the CPU's RAM to hold fully-unrolled copying code to do updates to nametables, to work around no longer having meaningful CHR bankswitching.

Not familiar with that game.

Quote:
Quote:
A mapper which could include some dual-port storage (interleaving PPU and CPU cycles) could make things much more convenient, but might be seen as cheating even though some FPGAs which include 7K of RAM cost less than $2. Unfortunately, those parts all have evil packages, and have inputs that are 3.3V tolerant but not 5V tolerant.
Unfortunately those cheaper ones might not have enough I/O pins. If you're willing to limit it to just licensed NESes (no famiclones) and are willing to guess where the ALE cycles are to demultiplex the PPU's address bus and are willing to make the CPU side interface a PITA, you need at least 10(PPU A9, A8, PPU AD7..0)+2(PPU /RD, PPU /WR)+8(CPU D7..0)+2(CPU A0,1)+2(M2,R/W)=24 IO pins. While there are iCE40UL parts in that range, one'd probably prefer to have all the CPU/PPU address/data pins to make the programmer's life less miserable. Which gets us back to the iCE40xx1K parts. At least some come in a TQFP...


The I/O requirements for main CPU interfacing could be reduced by 5 if one adds a 74HC299 universal shift register (reads and writes will be separated by at least 3 main-CPU clocks that don't read or write the register, giving the FPGA enough time to get data to/from the shift register).

As I think about it, though, I wonder if the best way to make a cheap but versatile Nintendo cart might be to adapt the same approach used by the Atari 2600 melody cart, using one 70MHz ARM7TDMI or similar device on each bus, and maybe running an SPI port between them.

Quote:
Quote:
Is there any way to use burst DMA to update anything other than OAM entries?
Nope.

You can map your own cart device to additionally listen to writes to $2004, but that's it.


That seems like a missed opportunity in the NES design. If the same 6502 address had been used for OAM and PPU data, with the set-address write selecting which kind of data would be written, that would have freed up a 6502 address while also enhancing the usefulness of DMA. Oh well.

Quote:
Quote:
Having a piece of code treat a data structure as occupying 128 banks of 256 bytes each is simpler than trying to have it treat data as two banks of 64 pages of 256 bytes each.
I must be missing something... how does being able to bankswitch on A8 and up help with this particular transformation?


Because the upper byte of the 6502 address will be constant.

If one has a 64KiB data structure on a cart starting at address $010000 which is using an 8K banked region from $8000-$8FFF, and wants to fetch a byte given at offset X:Y, the required code would be something like:

Code:
    sty temp
    lda
    txa
    lsr
    lsr
    lsr
    lsr
    lsr
    ora #8
    sta $8000
    txa
    and #$1F
    sta temp+1
    lda #0
    sta temp
    lda (temp),y


as compared with something like:

Code:
    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    lda $7C00,y ; uses LSB of address, plus last value accessed at $FC, plus $010000.


One could replace the shifts in the first example with a table lookup, but I think the second would still seem a lot easier. The "normal" banking approach would require that the offset be split into an 8-bit part, a 5-bit part, and a 3-bit part, rather than simply being kept as two eight-bit parts. The page-level granuarity could be especially useful if one had multiple adjacent banking regions. If one wanted to load x, y, and a with three consecutive bytes at an offset specified by x:y, the code could be something like:

Code:
    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    inx
    stx $FD ; Set bits 8-15 of address for $7D00-$7DFF region
    ldx $7C00,y
    lda $7C01,y
    sta temp
    lda $7C02,y
    ldy temp

Note that this code will work even if the object crosses a page boundary. Compare that to what would be needed to fetch three consecutive bytes using normal banking if one had to allow for the possibility of crossing a block boundary.

Quote:
Quote:
Have you seen any Nintendo mappers based on page-level banking?
No licensed mappers used anything finer than 8 KiB. And to the best of my knowledge, the finest banking seen in any pirate mapper hack is 1KiB.


Bummer. Page-mapped regions are really nice to work with.
Re: Why not mapper 11 or 66 ?
by on (#237767)
supercat wrote:
Does ensure that all displayed tiles get updated on the same frame?
I can't entirely tell, but it's definitely using the standard NES thing of keeping track of which tiles need to be updated and only updating those tiles. It's on MMC1, and CHR bankswitching is used extensively – possibly to mask tearing – but it's subtle enough I can't tell. No extra RAM.

Here's a longplay: https://www.youtube.com/watch?v=mLQzL8vsNVM

Quote:
Not familiar with [Driar].
Single-screen collection platformer.
Original
my NROM optimization.

Quote:
As I think about it, though, I wonder if the best way to make a cheap but versatile Nintendo cart might be to adapt the same approach used by the Atari 2600 melody cart, using one 70MHz ARM7TDMI or similar device on each bus, and maybe running an SPI port between them.
I've honestly been wondering why I haven't seen anyone do anything like the Harmony Cart on the NES. Is that extra 600kHz too much? Memory limits? Incompatible with existing library? Two independent buses?

Hard part is not just ending up with something ridiculous like this.

Quote:
[Not being able to move the DMA target] seems like a missed opportunity in the NES design.
Yeah...

Previous times I thought someone had said that the 2C02 can't keep up with OAM DMA's pace, and needed no faster than one byte every 3 CPU cycles. But right now, testing in Visual2C02 seems to imply it works?

Quote:
as compared with something like:
Code:
    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    lda $7C00,y ; uses LSB of address, plus last value accessed at $FC, plus $010000.
Ah, yes. I misunderstood. All clear. Somehow I'd misunderstood you to be talking about blocks of 128 bytes.
Re: Why not mapper 11 or 66 ?
by on (#237786)
lidnariq wrote:
supercat wrote:
Does ensure that all displayed tiles get updated on the same frame?
I can't entirely tell, but it's definitely using the standard NES thing of keeping track of which tiles need to be updated and only updating those tiles. It's on MMC1, and CHR bankswitching is used extensively – possibly to mask tearing – but it's subtle enough I can't tell. No extra RAM.

Here's a longplay: https://www.youtube.com/watch?v=mLQzL8vsNVM


Some interesting graphics changes compared to the C64 and Atari versions; I notice they lost the bonus levels that were 20 meta-tiles wide and designed to fit on a single screen, and also run at a much faster gametick rate than normal levels.

Looking at the video, it appears that is with the original the CHR bank switching is used to cycle among tile sets to animate everything in a fashion asynchronous to gameplay, to do things like make the diamonds sparkle or (in this version) make the rocks back back and forth. The entire nametable is getting redrawn every gametick over the course of several frames. Single-step through the video while watching the left and right edges of the screen during horizontal scrolling, and this effect will be visible.

Since Boulderdash tile-set animation is done asynchronously with regard to gameplay, it doesn't really matter if all of the tiles update at once. Ruby Runner, though, uses tile-set animation to smooth out motion and create "in-between" frames that must be synchronized with nametable updates. Since even a stock NES has two frames worth of tiles in the nametable, I don't think page-flipping should pose any difficulty; I'm curious why you think that's symptomatic of using a "wrong" approach.

My objective is to make Ruby Runner have a play mechanic similar to Boulderdash, but not exactly copying it (you'll notice, for example, that the mosnters in the animated .GIF move straight ahead if they can, while the Boulderdash monsters either follow the left wall or right wall), but with all of the objects animated to move smoothly, and also hopefully without any display glitches like the sides of the Boulderdash screen. Being able to draw all of the nametable tiles in a single frame would be convenient because it would allow the game logic to compute everything that will happen in a game tick if the player were to remain stationary, then draw all of the nametable entries, wait for the frame cycling animations, wait for the frame before the first animation frame of the next game tick, and read the controller for what's should happen on that game tick. The game logic would thus need to synchronize with video only once per game tick, and the game could run smoothly provided only that each gametick's worth of game logic was complete before it was time to show the first frame of the next game tick.

Being limited to updating a quarter of the name table per frame would require either having the game logic for each gametick finish four frames early, or else having a means of starting the name table updates before the game logic is done. The first may or may not adversely affect gameplay; the latter would add complexity.

My guess would be that if I limit boards to about the same size as Boulderdash I could probably get away with the first approach, but if I use an 8K RAM memory expansion so as to allow either boards, adding four extra frames per game tick could be annoying.

Quote:
Quote:
[Not being able to move the DMA target] seems like a missed opportunity in the NES design.
Yeah...

Previous times I thought someone had said that the 2C02 can't keep up with OAM DMA's pace, and needed no faster than one byte every 3 CPU cycles. But right now, testing in Visual2C02 seems to imply it works?


I wonder why the speed would be so limited, given that the PPU bus normally runs much so faster than that? Perhaps the NES was originally planned to have the CPU run much faster?

There's a lot of really good stuff in the NES design, but a few missteps with how things fit together. The biggest omission, IMHO, is probably the lack of any on-chip way of requesting an interrupt at a certain line, or at least finding out where the beam is. Ruby Runner on the 2600 didn't have any raster interrupts available to it, but it was able to find out how much time remained before the end of overscan or vblank, run the game processing loop until those times were close to used up, and then go into a polling loop to find the exact ends of those intervals. Having an address which, when read, would report half the number of the current scan line would have been enormously useful, and being able to make the NMI trip at a configurable line would have been even moreso. Starting blanking early and extending the end of it would have made it possible for games that need to perform more updates during vblank to actually do so.

Quote:
Quote:
as compared with something like:
Code:
    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    lda $7C00,y ; uses LSB of address, plus last value accessed at $FC, plus $010000.
Ah, yes. I misunderstood. All clear. Somehow I'd misunderstood you to be talking about blocks of 128 bytes.


So how do you like that idea now that you understand it? Having a larger contiguous regions banked in is useful for running code, or for objects that are going to be accessed via absolute indexed addressing modes, objects that would need to be accessed via indirect indexed addressing modes when using large-bank switching can be accessed more conveniently using absolute indexed addressing mode and page-level switching.
Re: Why not mapper 11 or 66 ?
by on (#237792)
Quote:
Some interesting graphics changes compared to the C64 and Atari versions; I notice they lost the bonus levels that were 20 meta-tiles wide and designed to fit on a single screen
NES only has 32 tiles / 16 attributes on a screen, unlike the C64's 40. I suppose they could have retained them, if scrolling were acceptable... probably not.

supercat wrote:
The entire nametable is getting redrawn every gametick over the course of several frames. Single-step through the video while watching the left and right edges of the screen during horizontal scrolling, and this effect will be visible.
No, that's not the same thing ... that's how all games on the NES have to do scrolling if the entire level doesn't fit in the available nametables. See the page on our wiki: nesdevwiki:File:NTS scrolling seam.gif.

Or look at the game in Mesen with the "PPU viewer" enabled. (Maybe also the "Event viewer").

Quote:
Since even a stock NES has two frames worth of tiles in the nametable, I don't think page-flipping should pose any difficulty; I'm curious why you think that's symptomatic of using a "wrong" approach.
Because when you repurpose the PPU's scrolling registers to act as double-buffering, it means you can't use the NES's scrolling hardware, which was the thing that made the NES meaningfully different from its predecessors. (There'd been consoles with tilemaps before. The C64 had sub-tile scroll. But the Famicom was the first widespread commercial device to allow both sub-tile and tile-level scrolling and enough tilemap memory for that to be useful)

Now, that said, a few mappers instead let the mapper IC control which of the two nametables are being used at any given moment. It turns out that this commercial release of Boulder Dash actually runs in this 1-screen mode, which is why you saw scrolling seams on all edges.

And in the case of the commercial release, which needs 1KiB just to hold the level state (look in memory from $3E0 to $74F), they couldn't justify the cost of an extra RAM just for an unrolled copy.

Quote:
I wonder why the speed would be so limited, given that the PPU bus normally runs much so faster than that? Perhaps the NES was originally planned to have the CPU run much faster?
It's an entirely independent FSM. If you try to read or write to $2007 during rendering the result on the outputs will be some combination of the two FSMs, smearing data and address across itself as ALE and /WR or /RD are active at the same time.

Quote:
There's a lot of really good stuff in the NES design, but a few missteps with how things fit together.
I'd say a lot more than a few. The original design would have at least had a programmable interval timer, but it was defective and removed instead of fixed in later silicon versions.

Quote:
Ruby Runner on the 2600 didn't have any raster interrupts available to it, but it was able to find out how much time remained before the end of overscan or vblank, run the game processing loop until those times were close to used up, and then go into a polling loop to find the exact ends of those intervals.
Sure? But the 2600 has any timers at all. The NES just lets you misuse the DAC FIFO empty IRQ...

Quote:
So how do you like that idea now that you understand it?
It is a delightful gem to work around the slowness of the indirect modes.
Re: Why not mapper 11 or 66 ?
by on (#237793)
lidnariq wrote:
supercat wrote:
The entire nametable is getting redrawn every gametick over the course of several frames. Single-step through the video while watching the left and right edges of the screen during horizontal scrolling, and this effect will be visible.
No, that's not the same thing ... that's how all games on the NES have to do scrolling if the entire level doesn't fit in the available nametables. See the page on our wiki: nesdevwiki:File:NTS scrolling seam.gif.

Or look at the game in Mesen with the "PPU viewer" enabled. (Maybe also the "Event viewer").


Hmm... if you single-step through the video of world 3-1, the system generally takes multiple frames to process all the name table updates associated with each game tick even in cases where most entries stay the same. It also takes multiple frames to draw the column of tiles which needs to be updated during a side scroll. I thought the multi-frame updates were indicative of blindly copying everything, but it seems the game is using a slow partial-update routine that takes about as long as blindly copying everything would.

Quote:
Quote:
Since even a stock NES has two frames worth of tiles in the nametable, I don't think page-flipping should pose any difficulty; I'm curious why you think that's symptomatic of using a "wrong" approach.
Because when you repurpose the PPU's scrolling registers to act as double-buffering, it means you can't use the NES's scrolling hardware, which was the thing that made the NES meaningfully different from its predecessors. (There'd been consoles with tilemaps before. The C64 had sub-tile scroll. But the Famicom was the first widespread commercial device to allow both sub-tile and tile-level scrolling and enough tilemap memory for that to be useful)


The C64 and Atari 400/800 could easily update their "name tables" fast enough to allow continuous smooth scrolling. On the NES, that would be harder and require quite a bit more code, and would also require using some sprites to mask the right edge of the screen, so for cases where "NES-style" scrolling would be adequate, it would likely be preferable.

Quote:
And in the case of the commercial release, which needs 1KiB just to hold the level state (look in memory from $3E0 to $74F), they couldn't justify the cost of an extra RAM just for an unrolled copy.


If one updates four rows of meta-tiles per frame using an unrolled loop, one would need a 64-byte buffer to accommodate that. That hardly seems excessive.

Quote:
Quote:
Ruby Runner on the 2600 didn't have any raster interrupts available to it, but it was able to find out how much time remained before the end of overscan or vblank, run the game processing loop until those times were close to used up, and then go into a polling loop to find the exact ends of those intervals.
Sure? But the 2600 has any timers at all. The NES just lets you misuse the DAC FIFO empty IRQ...


The 2600 has a Ram/I/O/Timer chip with a timer that can measure duration up to about half a frame with units of 64 cycles.

Quote:
Quote:
So how do you like that idea now that you understand it?
It is a delightful gem to work around the slowness of the indirect modes.


I wonder why I've not seen that approach used on any banking designs other than my own?
Re: Why not mapper 11 or 66 ?
by on (#237799)
supercat wrote:
Hmm... if you single-step through the video of world 3-1, the system generally takes multiple frames to process all the name table updates associated with each game tick even in cases where most entries stay the same. It also takes multiple frames to draw the column of tiles which needs to be updated during a side scroll. I thought the multi-frame updates were indicative of blindly copying everything, but it seems the game is using a slow partial-update routine that takes about as long as blindly copying everything would.
Yeah, I think they could have done better even without resorting to blind copies.

Maybe the engine still runs on fours, but they deliberately smeared the updates across multiple refreshes to make it feel less quantized? But I bet they just didn't see the need to make it better.

Quote:
If one updates four rows of meta-tiles per frame using an unrolled loop, one would need a 64-byte buffer to accommodate that. That hardly seems excessive.
But you won't have the CPU time to translate metatiles at the same time you're uploading things to the PPU...?

I mean, you can put the unrolled loop in ROM and have it copy bytes from RAM. Slows you down to only 217 in-order bytes in vblank ((20·341÷3 - 514(OAMDMA))÷8 - 8(set scroll)).

Quote:
I wonder why I've not seen that approach used on any banking designs other than my own?
I have to assume that people just didn't think of it.

Maybe it's that it's entirely orthogonal to what banking normally does ... Normally banking is a work-around to being able to address more total address space, but your technique is instead a work-around for a different structural deficiency of the 6502.
Re: Why not mapper 11 or 66 ?
by on (#237802)
That and it takes more mapper registers to hold more bank bits and more I/Os to control more address lines.
Re: Why not mapper 11 or 66 ?
by on (#237803)
lidnariq wrote:
Quote:
If one updates four rows of meta-tiles per frame using an unrolled loop, one would need a 64-byte buffer to accommodate that. That hardly seems excessive.
But you won't have the CPU time to translate metatiles at the same time you're uploading things to the PPU...?


The average speed of
Code:
; Top half of first line
  lda $C0
  sta $2007
  eor #1
  sta $2007
  lda $C1
  sta $2007
  ...
  lda $CF
  sta $2007
  eor #1
  sta $2007
; Bottom half of first line
  lda $C0
  eor #2
  sta $2007
  eor #1
  sta $2007
  lda $C1
  eor #2
  sta $2007

ends up the same as if all of the tiles were stored individually in zero page. Within each group of four tiles, the upper-left corner takes 7 cycles, the upper-right corner takes 6, the lower-left 9, and the lower-right 6. 7+6+9+6 is 28, the same time as would be needed to fetch each byte individually from zero page.

Quote:
I mean, you can put the unrolled loop in ROM and have it copy bytes from RAM. Slows you down to only 217 in-order bytes in vblank ((20·341÷3 - 514(OAMDMA))÷8 - 8(set scroll)).


I'd figured 256 should work. Since the only sprites would be the player sprite, the score, and the side masks, and only the player sprite would need frequent updates, I was figuring on something like:
Code:
   ldx #0
   stx OAMADDR
   ldy playerY
   sty OAMDATA
   lda #3
   sta OAMADDR
   lda playerX1
   sta OAMDATA
   sty OAMDATA
   lda #7
   sta OAMADDR
   lda playerX2
   sta OAMDATA
   stx OAMADDR

That should cost a lot less than 514 cycles.

Quote:
Quote:
I wonder why I've not seen that approach used on any banking designs other than my own?
I have to assume that people just didn't think of it.

Maybe it's that it's entirely orthogonal to what banking normally does ... Normally banking is a work-around to being able to address more total address space, but your technique is instead a work-around for a different structural deficiency of the 6502.


Probably, but such a design could make a lot of things more efficient on the NES. Games that use bitmap displays could probably benefit from a little assistance there. On my 2600 cart, it's possible to set up a 96x200 bitmap display using stripes that run down 12 pages of RAM, and then plot a pixel at x,y with simply:
Code:
    lda $7F00,x ; Load mask and switch bank to proper stripe [$7F00-$7FFF triggers banking strobes]
    ora $7E00,y ; Mix with data at address Y of stripe
    sta $7E00,y ; Store it back

I've never seen any 6502-based pixel-plotting code faster than that.
Re: Why not mapper 11 or 66 ?
by on (#237804)
supercat wrote:
Since the only sprites would be the player sprite, the score, and the side masks, and only the player sprite would need frequent updates, I was figuring on something like:
Sadly OAMADDR is buggy.

You can update the first 7 bytes safely, only in order, only by relying on OAMADDR being zero when rendering turns off naturally for vblanking. Otherwise you basically have to use OAMDMA.

Or you could make a PAL-only release, where they fixed the bug :P

Quote:
Probably, but such a design could make a lot of things more efficient on the NES. Games that use bitmap displays could probably benefit from a little assistance there.
Approximately no games do. I think the CPU-to-PPU bandwidth was limited enough that there was every reason to avoid it, and the games that strictly need to not be in a tilemap were either modified heavily or didn't see a port.
Re: Why not mapper 11 or 66 ?
by on (#237805)
"Approximately no games" use bitmap-style backgrounds or software composited sprites. But I can think of a few rounding errors that you might enjoy:

  • Licensed in the US market: Qix, Videomation, Faxanadu, Hatris, Color a Dinosaur, Solstice, Shanghai II
  • Europe exclusive, benefiting from longer vblank: Elite
  • Canceled, prototype discovered later: Block Out
  • Japan only: Oeka Kids, Cocoron, Final Fantasy II
  • East Asia: 3D Block
  • Homebrew: All Action 53 volumes, Nova the Squirrel
Re: Why not mapper 11 or 66 ?
by on (#237808)
tepples wrote:
"Approximately no games" use bitmap-style backgrounds or software composited sprites. But I can think of a few rounding errors that you might enjoy:

  • Licensed in the US market: Qix, Videomation, Faxanadu, Hatris, Color a Dinosaur, Solstice, Shanghai II
  • Europe exclusive, benefiting from longer vblank: Elite
  • Canceled, prototype discovered later: Block Out
  • Japan only: Oeka Kids, Cocoron, Final Fantasy II
  • East Asia: 3D Block
  • Homebrew: All Action 53 volumes, Nova the Squirrel


I was thinking most notably of Elite. I'm not sure something like that would be possible on a system with an NTSC vblank unless it had two blocks of memory which could be switched between the CPU or PPU bus, which would require a fair number of multiplexer chips, but then again I'm not sure how Elite manages to obtain any kind of reasonable performance even *with * a PAL vblank. Any idea what it's doing?
Re: Why not mapper 11 or 66 ?
by on (#237809)
The same author's "Tank demo" dynamically allocates tiles in RAM, draws to them, copies them and the associated tilemap to VRAM during vblank, and double buffers CHR using the palette. Bit plane 0 is drawn using [black, white, black, white], and bit plane 1 is drawn using [black, black, white, white].

By "dynamically allocates tiles" I mean this: It keeps a tilemap in RAM storing which tile number corresponds to each (x, y) tile position. When drawing a pixel into a tile, it first checks whether a tile is allocated for that (x, y) position, and if not, allocates the next unused tile. Because of the sparse nature of this vector-style geometry, it's unlikely for all tiles in the viewport to get allocated as nonblank.

Other ways to improve video memory bandwidth are to disable rendering early and enable rendering late. This can become very tricky, as doing so requires working around quirks of the OAM DRAM controller.
Re: Why not mapper 11 or 66 ?
by on (#237812)
lidnariq wrote:
supercat wrote:
Since the only sprites would be the player sprite, the score, and the side masks, and only the player sprite would need frequent updates, I was figuring on something like:
Sadly OAMADDR is buggy.

You can update the first 7 bytes safely, only in order, only by relying on OAMADDR being zero when rendering turns off naturally for vblanking. Otherwise you basically have to use OAMDMA.

Or you could make a PAL-only release, where they fixed the bug :P


Bummer. It would have been nice to avoid having to blow 256 bytes of storage on the OAM. If I need to update two sprites on a frame, would that mean that I'd have to drop back to updating three rows of tiles per frame instead of four? That might not be the worst thing in the world, since the game could probably be pretty zippy even if it wastes five frames per gametick essentially waiting for vblank. Still, it does seem a bit icky.
Re: Why not mapper 11 or 66 ?
by on (#237813)
Hm. Thinking closely about the bug, maybe there's a goofy workaround...

So, there's two halves to the bug:
1- if you write to OAMADDR, on several CPU-PPU alignments, it'll smear data from one row of OAM DRAM with another.
2- if you leave OAMADDR at a value of 8 or higher, it'll copy the eight bytes from that row of DRAM over the first eight bytes.

So...
if you write eight padding values...
then the eight values you want...
you'll have the two sprites you want in slots 2 and 3, and whatever had been in slots 4 and 5 is copied on top of slots 0 and 1.
Re: Why not mapper 11 or 66 ?
by on (#237825)
lidnariq wrote:
Hm. Thinking closely about the bug, maybe there's a goofy workaround...

So, there's two halves to the bug:
1- if you write to OAMADDR, on several CPU-PPU alignments, it'll smear data from one row of OAM DRAM with another.
2- if you leave OAMADDR at a value of 8 or higher, it'll copy the eight bytes from that row of DRAM over the first eight bytes.

So...
if you write eight padding values...
then the eight values you want...
you'll have the two sprites you want in slots 2 and 3, and whatever had been in slots 4 and 5 is copied on top of slots 0 and 1.


That would seem likely to work, but if there's a race condition between the DRAM machinery and the CPU cycles, things may appear to work under most conditions, but fail on some machines under certain temperature conditions, phases of the moon, etc. I wouldn't trust any workaround that couldn't be justified based upon "analog" transistor-level simulation of the components involved.

One weird quirk about DRAM is that the process of reading a row into a buffer corrupts the data on that row within the array. Normally the corruption isn't a problem because the buffer will get written back to the row, but if things are disrupted so read occurs without the writeback, the row would likely be corrupted. Depending upon the design of the DRAM, such corruption might only be capable of turning ones into zeroes, only turning zeroes into ones, or doing an arbitrary mixture of both.

Think of DRAM as being a system of reservoirs connected via gates to canals. If the drain is opened on a canal and a gate is opened to the reservoir, the reservoir will be emptied. If a canal is connected to a lake with a water level of 3m and a gate is opened on the reservoir, the reservoir will fill to 3m. Writing is thus pretty simple. Reading, though, is harder. If one were to empty the canal but close the drain, and then opened the gate to a reservoir, the water level in the canal would go up if there was water in the reservoir, but it wouldn't go up to 3m. If the surface area of the canal were equal to that of the reservoir, the level in the canal would go up to 1.5m while the level in the reservoir would go down to 1.5m.

In most DRAM chips, however, the "area" of the array is orders of magnitude larger than that of any individual reservoir. If the canal were drained to zero before reading, the canal would end up with a depth of 0.00m if the reservoir had been empty, but only 0.01m if it had been full. It's hard to tell the difference between something not going up at all, versus it going up 0.01m. It turns out to be much easier to instead start with the canal filled to half depth, and then check whether the level goes up or down. Ideally, the circuit would be precisely balanced so that going up by even a micron would read as "1", and going down by even a micron would read as "0", but in practice circuits aren't going to be perfectly balanced so something which doesn't move meaningfully could read arbitrarily as 1 or 0.

I would guess that the DRAM array in the OAM is small enough that it probably doesn't use half-level biasing and, as a consequence, any splatted reads would only be capable of turning 0's to 1's, or only turning 1's to zeroes. If only the former can occur, a Y coordinate written as FF should remain FF. If the latter, a tile number written as 00 should remain 00. I don't know enough about the actual design, however, to know whether half-level biasing could result in other corruption patterns.
Re: Why not mapper 11 or 66 ?
by on (#237827)
lidnariq wrote:
I've honestly been wondering why I haven't seen anyone do anything like the Harmony Cart on the NES. Is that extra 600kHz too much? Memory limits? Incompatible with existing library? Two independent buses?

Hard part is not just ending up with something ridiculous like this.


I don't think a Harmony-style approach could work very well with one microcontroller monitoring both buses. The chip it uses has 32KB of flash and 8KB of RAM, which would be a bit small for "main CPU RAM", but would be enough for many kinds of mapper, especially if one added an external serial flash chip. If one didn't need to show anything too high up on the frame, it would probably be possible to load a substantial amount (256 bytes or more) from the external flash chip every frame.

A major difference between a Harmony-style cart and a typical mapper, though, would be that most mapper designs have the main CPU control the banking on the CPU side, but on a Harmony-style mapper it would be awkward to have the cart interact with the main CPU bus in any fashion. For homebrews this would be fine, but I no of no existing mappers that use that approach. Even MMC2 and MMC4, which support bank-switch, tiles use CPU-bus writes to control most mapping functions. I would guess the most practical way of doing things would probably be to have the main CPU write to PPU address range $3000-$3EFE to control things.

I agree with you that a big design challenge would be designing a mapper which is versatile, but retains the flavor of NES programming. Perhaps that could be encouraged by "standardizing" a VM language for emulators with an instruction set that's focused on the kinds of things that would typically be done in a hardware or a CPLD (e.g. take bits a..b of register c and merge them using mode d [chosen from and, or, xor, etc.] with some bits starting at e from register f). While it might be possible in theory to express in such code anything that could be done on the ARM, it would be faster to emulate than the ARM code, and anyone wanting to go crazy on the ARM would also have to go equally crazy in the emulator bytecode.
Re: Why not mapper 11 or 66 ?
by on (#237830)
supercat wrote:
That would seem likely to work, but if there's a race condition between the DRAM machinery and the CPU cycles,
The DRAM circuitry is completely idle during forced and vertical blanking. ... with the caveat that the (PAL) 2C07 enables DRAM refresh for 50 scanlines before video rendering starts, and the PAL famiclone (UA6538) enables DRAM refresh for 50 scanlines after video rendering.

Quote:
In most DRAM chips, however, the "area" of the array is orders of magnitude larger than that of any individual reservoir.
The DRAM inside the 2C02 is weird; I haven't seen anything like it. It's NMOS, it holds both the bit and the inversion of the bit, it still takes four transistors. The only way it's smaller than the SRAM that's also used on the die is that the NMOS pull-up is shared along an entire column, instead of next to each bit.

Both "bit" and "notbit" go in/out to the DRAM interface logic... look in the vicinity of node 426 in Visual2C02.
Re: Why not mapper 11 or 66 ?
by on (#237835)
lidnariq wrote:
supercat wrote:
That would seem likely to work, but if there's a race condition between the DRAM machinery and the CPU cycles,
The DRAM circuitry is completely idle during forced and vertical blanking. ... with the caveat that the (PAL) 2C07 enables DRAM refresh for 50 scanlines before video rendering starts, and the PAL famiclone (UA6538) enables DRAM refresh for 50 scanlines after video rendering.


All operations involving DRAM reads--and that includes partial-row writes--need to be carefully sequenced. The mechanisms that force a regular sequence of actions may be idle, but if one wants to e.g. write to addresses 9 and 10 without affecting other bytes on the row, one of two sequences of events must occur:

1. Bytes 8-15 are read into a buffer, bytes 1 and 2 of that buffer are written with new data, and the buffer is written back to the row.

2. Bytes 8-15 are read into a buffer, bytes 1 of that buffer is written with new data, and the buffer is written back to the row. Then bytes 8-15 are read into a buffer again, byte 2 of that buffer is written, and the buffer is written back to the row.

In the second sequence, one could insert an arbitrary number of "read row X" and "write back row X" operations [with X being the same row as the other operations or a differen trow] between the first write-back, but the first half and second half of that sequence must, individually, be processed without other intervening operations.

Note that the number of discrete steps involving the DRAM array exceeds the number of CPU writes involved in performing them, so some kind of sequenced machinery is required even for accesses involving OAMADDR and OAMDATA.

Quote:
Quote:
In most DRAM chips, however, the "area" of the array is orders of magnitude larger than that of any individual reservoir.
The DRAM inside the 2C02 is weird; I haven't seen anything like it. It's NMOS, it holds both the bit and the inversion of the bit, it still takes four transistors. The only way it's smaller than the SRAM that's also used on the die is that the NMOS pull-up is shared along an entire column, instead of next to each bit.

Both "bit" and "notbit" go in/out to the DRAM interface logic... look in the vicinity of node 426 in Visual2C02.


I'd noticed the weird cell shape and wondered what was going on. Bulk DRAM uses a one-transistor cell, and until I saw the chip layout I would have guessed they'd use a three-transistor design with a storage transistor (source grounded), write transistor (connects storage gate to write bus), and read transistor (connects storage drain to read bus). That would avoid the destructive read issue, but I don't think that's what they're doing.

BTW, the area savings from eliminating the pull-up is significant, since the pull-ups would need to have routing to VDD.
Re: Why not mapper 11 or 66 ?
by on (#237842)
supercat wrote:
All operations involving DRAM reads--and that includes partial-row writes--need to be carefully sequenced. The mechanisms that force a regular sequence of actions may be idle,
I understand what you're saying (I know how DRAM works), but I don't see how your explanation is relevant to the specific bug we've observed. There is at least one published contemporary game that relies on this copying behavior in combination with OAMDMA - it relies on the first 4 bytes in the page that is DMA'd being sprite 0 and generating a sprite 0 hit, while using OAMADDR to cycle sprite priority on the remaining 60 entries.
Re: Why not mapper 11 or 66 ?
by on (#237846)
lidnariq wrote:
supercat wrote:
All operations involving DRAM reads--and that includes partial-row writes--need to be carefully sequenced. The mechanisms that force a regular sequence of actions may be idle,
I understand what you're saying (I know how DRAM works), but I don't see how your explanation is relevant to the specific bug we've observed. There is at least one published contemporary game that relies on this copying behavior in combination with OAMDMA - it relies on the first 4 bytes in the page that is DMA'd being sprite 0 and generating a sprite 0 hit, while using OAMADDR to cycle sprite priority on the remaining 60 entries.


The copying behavior would occur because the change to OAMADDR occurs between the time that a row specified by OAMADDR gets read into the buffer and the time that the contents of the buffer get written back to the row specified by OAMADDR. Ideally, a write to OAMADDR would cause the contents of the buffer to get stored to the row identified by the *old* value of OAMADDR, and then cause the row identified by the new value to be read into the buffer, but if the chip did that there wouldn't be a bug. What's important is the sequence of events that does occur, and what race conditions if any may be entailed.
Re: Why not mapper 11 or 66 ?
by on (#237847)
supercat wrote:
What's important is the sequence of events that does occur, and what race conditions if any may be entailed.
Of the CPU-vs-PPU phases, we know of at least the three following behaviors on a write to OAMADDR

1- Works as desired
2- Copies the page at openbus (usually $20) to the page containing the new pointer
3- Other things that sound more analog like.
cite: Quietust (1), (2)

The CPU-vs-PPU phase is randomly chosen when the CPU and/or PPU are released from reset. The bug is known to be present in silicon revision G, not be in (rarer) older revisions (A-E), and revision H's properties are unknown. Most famiclones are copies of the revision G PPU and have this bug. (Note that revision A-E have other quirks pertaining to sprites, and we collectively haven't sat down to figure out what's happening).
Re: Why not mapper 11 or 66 ?
by on (#237848)
lidnariq wrote:
The CPU-vs-PPU phase is randomly chosen when the CPU and/or PPU are released from reset. The bug is known to be present in silicon revision G, not be in (rarer) older revisions (A-E), and revision H's properties are unknown. Most famiclones are copies of the revision G PPU and have this bug. (Note that revision A-E have other quirks pertaining to sprites, and we collectively haven't sat down to figure out what's happening).


Could a cart power on, determine CPU phase, and force a reset (e.g. via the lockout chip) if the phase is unfavorable?
Re: Why not mapper 11 or 66 ?
by on (#237849)
Nope. CIC lock doesn't renegotiate after negotiation fails, just endlessly asserts and de-asserts RESET to the CPU and PPU.
Re: Why not mapper 11 or 66 ?
by on (#237988)
lidnariq wrote:
Nope. CIC lock doesn't renegotiate after negotiation fails, just endlessly asserts and de-asserts RESET to the CPU and PPU.


So CIC-Reset on the cartridge slot is output-only? Bummer.

Otherwise, I was wondering about whether there would be any problem setting OAMADDR to a multiple of eight near the end and then writing from there through $FF and wrapping to 0? The act of setting OAMADDR would trash the row whose value is being set, but if that row will be rewritten anyway that wouldn't matter. If OAM can avoid corrupting the first or last row when the last action before display start is a write to address $FF, then I would think that approach would allow games that just need to update the last two sprites in a frame to save a lot of time compared with having to use OAM-DMA.
Re: Why not mapper 11 or 66 ?
by on (#237991)
supercat wrote:
Otherwise, I was wondering about whether there would be any problem setting OAMADDR to a multiple of eight near the end and then writing from there through $FF and wrapping to 0? The act of setting OAMADDR would trash the row whose value is being set, but if that row will be rewritten anyway that wouldn't matter. If OAM can avoid corrupting the first or last row when the last action before display start is a write to address $FF, then I would think that approach would allow games that just need to update the last two sprites in a frame to save a lot of time compared with having to use OAM-DMA.
Might work? I think the problem is that the other 60 sprites might get corrupted and moved on-screen.

You can DMA from ROM, if the problem were just RAM.

This is definitely "you have to extensively test on hardware" territory. We haven't quantified what happens well enough for any emulator to implement sufficiently accurate and programmer-hostile bugs.