This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Reasonable implementation of PCM

Reasonable implementation of PCM
by on (#166612)
For a few months now, it's been on my mind to find some way of playing non-Delta sampled sound while also being able to run a game. I have arrived at a couple possible obstacles, some of them being easier to overcome than others:
You absolutely need:
- A cycle-based IRQ counter
- Lots of space
- An NMI handler that doesn't need to do much

Of course the last of these severely limits the kind of genres worth considering.
The best option for my test was the FME-7 because it met the two mapper-related criteria. I find it interesting that this is far from eating all the available CPU time, yet still manages to produce tolerable results. Worst-case scenario, the game would have to deliberately play at 30FPS. The different cycle costs in all the different cases are following (with approx. 183 IRQs per frame to give a rate of 11kHz):
- Regular IRQ: 96 cycles
- On every 256th IRQ: 103 cycles
- On every 8 192nd IRQ (when the next 8k bank is needed): 114 cycles
- Jumping back to the loop point: 92 cycles

In a game, extra cycles should be added in order to return to the original bank, unless it is ok to lose 8k of the ROM window (which if I'm not mistaken, the FME-7 can also extend to $6000-$7FFF instead of having PRG-RAM there). Using other mappers might result in lower cycle counts though because the FME-7 requires 4 instructions to do any operation (selecting the internal address, sending data to it), and the IRQ counter has to be reloaded manually.
No matter what though, none of the available mappers have both a cycle-based IRQ and sufficient amount of PRG compatibility. The VRC3 would be one of the best options, if it had all 8 bits to select a 16k bank at $8000-$BFFF. So a custom extended version of the VRC3 could definitely work, or an UNROM setup with discrete logic and an extra IC for the IRQ functionality. The point is, there should be at least 2MB of space available.

Another issue, which I though would ruin the sound regardless, was OAM DMA. However, if you listen to the ROM with OAM DMA occurring every frame, it's difficult to notice the distortion. The DMA doesn't delay the streaming for too long, only for the time it would take to output 3 samples. Still, IRQs have to be enabled in the NMI, and a lot of VBlank time is lost.

The space issue could be mitigated by introducing some kind of other format. The time won by having easier mapper control with the "VRC3" could be traded in for things such as 4-bit DPCM, or 4-bit "ADPCM" where the 4 bits correspond to an entry in a 16-entry table which would hold what actually gets added to the current $4011 level.

Maybe you have come up with better ways of doing this already, but I wanted to give it a go anyway. I also realize that this is something crazy to build a game around, but it's not like games have never made huge sacrifices before to really push a certain aspect.
The song in the ROMs is the main theme of Crash Bandicoot: Warped for PlayStation.
Re: Reasonable implementation of PCM
by on (#166614)
Cool, I recently made a sample player on an HSync timer and decided to add in a few more sample loops during VBlank to balance out the distortion.

I'd recommend running a simple high-frequency square wave in your demo, and recording it to see how often and where the sample gets distorted -- you'll really notice it with a square wave.
Re: Reasonable implementation of PCM
by on (#167048)
Have you considered using MMC5's raw PCM channel instead?

  • Full 8-bit audio; far better quality than the built-in DPCM's 7 bits
  • Write mode: Write the 8-bit sample directly to $5011, just like writing a 7-bit sample to $4011
  • Read mode: simply read any address between $8000-$BFFF, and that byte is automatically loaded into $5011.

In other words, in read mode, you can skip the STA/STX/STY instruction and get out of your IRQ handler 4 cycles sooner.

Instead of needing a cycle counter from your mapper, just use MMC5's scanline counter. You'll still have to do timed writes within your NMI handler. If you fire it every scanline, you'll get 262 scanlines * 60.0988 Hz = 15.746 kHz. Even for music, 16 kHz 8-bit mono audio is surprisingly listenable. Below that in bits or sample rate, and music starts sounding pretty terrible.

The other requirement you mention is ROM size. 15.746 kHz, 8-bit, mono audio consumes a little under 923 kilobytes/minute. FME-7 tops out at 512 KB of PRG, but MMC-5 supports twice as much - a full megabyte!

In other words, MMC5 would give you a full minute of reasonably-nice audio with some room to spare for your code.

Very few mappers allow more than a MB of PRG, and only MMC5 offers an 8-bit PCM channel AFAIK.
Re: Reasonable implementation of PCM
by on (#167068)
LightStruk wrote:
Have you considered using MMC5's raw PCM channel instead?

  • Full 8-bit audio; far better quality than the built-in DPCM's 7 bits
  • Write mode: Write the 8-bit sample directly to $5011, just like writing a 7-bit sample to $4011
  • Read mode: simply read any address between $8000-$BFFF, and that byte is automatically loaded into $5011.

Disadvantages:
  • MMC5 is not cloned in a CPLD yet
  • Mapper audio does not play back on unmodified 72-pin consoles

The whole advantage of an IRQ-driven system is that it works on an NES. Otherwise, you could just solder a microcontroller onto a Famicom cartridge and have it mix a 1-bit DAC at 1.79 MHz into the audio loop.

Quote:
Even for music, 16 kHz 8-bit mono audio is surprisingly listenable. Below that in bits or sample rate, and music starts sounding pretty terrible.

Case in point: The music in Luminesweeper for Game Boy Advance is 18.157 kHz 8-bit mono audio, compressed with the GSM Full Rate codec at 30 kbps.
Re: Reasonable implementation of PCM
by on (#167127)
Yes, the point is to create high-quality sound without any audio expansion mappers, as that would exclude all non-modded NES systems. But as far as custom mappers go, at that point using a microcontroller with its own ROM would be even better. The bottleneck is a stock 2A03/2A07. The controller would have to have a 8-bit memory buffer where it places the next sample from ROM and then pull /IRQ low. Simply reading from the buffer should also result in acknowledging the IRQ to save more time, not to mention having an IRQ latch to automatically reload whatever counter it has to keep track of time. The IRQ handler could be extremely short and quick as a result:
Code:
IRQ:
sta zpvar ;to avoid pla when exiting, which is 4 cycles ;3
lda SampleBuffer ;7
sta $4011 ;11
lda zpvar ;14
rti ;20


Something like this would only take a little over 7 000 cycles every frame to play audio at 22kHz. To minimize the distortion from OAM DMA, the microcontroller could also move on to the next sample after some time, regardless of whether the 2A03 has read the buffer or not, so there would only be dropped output, and no delayed output.
Re: Reasonable implementation of PCM
by on (#167143)
The real problem with PCM is that you really can't use it in games. You'd get huge jitter if you need to update sprites via sprite DMA.
Re: Reasonable implementation of PCM
by on (#167144)
OP considered that:
Quote:
However, if you listen to the ROM with OAM DMA occurring every frame, it's difficult to notice the distortion. The DMA doesn't delay the streaming for too long, only for the time it would take to output 3 samples.
Re: Reasonable implementation of PCM
by on (#167147)
Dwedit wrote:
The real problem with PCM is that you really can't use it in games. You'd get huge jitter if you need to update sprites via sprite DMA.

That didn't stop Sega from trying it with the Mega Drive. (and they took it to a much worse extreme with SMPS, sadly)
Re: Reasonable implementation of PCM
by on (#167152)
Is it that a switched-mode power supply would introduce even more noise than Sample Music Playback System already introduces?

[/badpun]
Re: Reasonable implementation of PCM
by on (#167199)
tepples wrote:
Disadvantages:
  • MMC5 is not cloned in a CPLD yet
And the FME-7 is cloned? That's the mapper za909 is talking about using.
tepples wrote:
Disadvantages:
  • Mapper audio does not play back on unmodified 72-pin consoles
On this forum, given how many folks are PowerPak users and FDS aficionados, I figured most everybody with a NES had either done the really simple audio mod or had installed an ENIO board.
tepples wrote:
Case in point: The music in Luminesweeper for Game Boy Advance is 18.157 kHz 8-bit mono audio, compressed with the GSM Full Rate codec at 30 kbps.
Does this video have the music in it you're talking about? It sounds pretty darn good. How strenuous was GSM decoding for the GBA's ARM CPU?
Re: Reasonable implementation of PCM
by on (#167201)
LightStruk wrote:
tepples wrote:
Disadvantages:
  • MMC5 is not cloned in a CPLD yet
And the FME-7 is cloned? That's the mapper za909 is talking about using.

JxROM (FME-7) compatible boards are available from infiniteneslives.com. In fact, the reason I called the multi-mapper test ROM "Holy Diver Batman" is because Paul wanted to start offering boards compatible with IF-12 (Holy Diver) and JxROM (Batman: Return of the Joker).

Quote:
tepples wrote:
Disadvantages:
  • Mapper audio does not play back on unmodified 72-pin consoles
On this forum, given how many folks are PowerPak users and FDS aficionados, I figured most everybody with a NES had either done the really simple audio mod or had installed an ENIO board.

I own a PowerPak, but my NES is not audio modded. I guess that leaves me outside "most". But I imagine that "most" people who buy newly made NES games aren't members of this forum. In case someone wants to distribute a game on cartridge, how much extra will it cost to bundle an ENIO with every copy of a game?

Quote:
tepples wrote:
Case in point: The music in Luminesweeper for Game Boy Advance is 18.157 kHz 8-bit mono audio, compressed with the GSM Full Rate codec at 30 kbps.
Does this video have the music in it you're talking about? It sounds pretty darn good. How strenuous was GSM decoding for the GBA's ARM CPU?

It takes 60% of CPU time.
Re: Reasonable implementation of PCM
by on (#167216)
LightStruk wrote:
On this forum, given how many folks are PowerPak users and FDS aficionados, I figured most everybody with a NES had either done the really simple audio mod or had installed an ENIO board.

I think the vast majority of PowerPak owners just want to play NES games with it, and aren't comfortable with a soldering iron. Relatively few do the audio mod.

On this board, there's a strong bias toward people who would do the mod, but we're a DIY crowd, and really not representative of the NES user at large.

Almost nobody has an ENIO. Very few of these were made.
Re: Reasonable implementation of PCM
by on (#167236)
I only decided to bust out the FME-7 because it had everything I needed for a quick test, but for a complete sampled soundtrack a custom variant of either the FME-7 , the VRC3 or something similar would be the most optimal. The biggest issue is having enough space to do it, and giving up a lot of VBlank time, so maybe PAL has an advantage here.
Re: Reasonable implementation of PCM
by on (#167264)
The nice(?) thing about something with constant unchanging IRQ requirements is that you could use one or two ICs as your external IRQ generator, making it easy to add this functionality to any almost any hardware.

A lot of pirate games used a (74')4040 to generate an IRQ every 2ⁿ cycles. Since your interval here is 160cy, you only need to add an AND gate (to set that period; = NOT(NAND)) and a NAND gate to drive the /IRQ pin on the card edge.
Re: Reasonable implementation of PCM
by on (#167355)
not quite sure how they did it(seems to be a custom mapper from bunnyboy and or shiru) but the creator of 'A winner is you' mentioned they are using PCM audio http://nintendoage.com/forum/messagevie ... did=160050

Looks like it mostly shows the song menu while running, so also not sure whether you would have enough headspace to actually run a game on top of it
Re: Reasonable implementation of PCM
by on (#167358)
The PCB says "DPCM", although who knows whether that actually means anything.
https://www.flickr.com/photos/81889059@N08/25409422524/ shows some Cypress part (has to be CHR RAM), a 8-pin microcontroller for the CIC, a voltage regulator, a QFP that's probably a CPLD as bankswitching and maybe some voltage translation.

There's hidden space on the board behind the label, which has to be hiding the PRG ROM. With only 10 made, we're probably never going to get to find out, unless Bunnyboy tells us.
Re: Reasonable implementation of PCM
by on (#167379)
"The board is custom made by bunnyboy, has 64MB ROM with a custom UNROM-like mapper (4096 16K banks)." -Shiru at http://nintendoage.com/forum/messagevie ... did=152873
Re: Reasonable implementation of PCM
by on (#169584)
I'm trying to figure out what would be a good container format that compromises between quality and easy, realtively fast decompresson. From what I can tell it's a reoccurring thing in the sample data to have the same value multiple times in succession, so an RLE compression format could definitely help reduce size tremendously. To stick to 1 byte/IRQ, I'd go for a custom format like this:
Code:
RRSS SSSS
RR: Number of times to repeat the sample
SSSSSS: 6 bits of sample data

Hopefully this could be "decompressed" without losing too much time, and at 6-bit resolution we could stick to regular PCM or some delta-based format.
Re: Reasonable implementation of PCM
by on (#169587)
Quote:
Lots of space


Depending on what results you want, could one cut up the song arrangement in reusable bars and repeat? That'd slice memory requirements conciderably and still retain a lot of extravagance. From an artist's point of view, seamless loop constraints isn't much of a bother.

If it could work synced with the onboard channels, you can probably get even greater fidelity out of the bit depth and sample rate by reserving all or the most dominant percussive sounds to the synth channels, and use samples for arranged parts with less dynamic width, such as strings and whatnot. Drums, vocals and percussives tend to cause the most audible artifacts in low bit samples due to their wide dynamic range and quick changing envelopes.
Re: Reasonable implementation of PCM
by on (#169602)
WheelInventor wrote:
Depending on what results you want, could one cut up the song arrangement in reusable bars and repeat? That'd slice memory requirements conciderably and still retain a lot of extravagance.

You mean like the introduction to Space Racer?
Re: Reasonable implementation of PCM
by on (#169636)
tepples wrote:
WheelInventor wrote:
Depending on what results you want, could one cut up the song arrangement in reusable bars and repeat? That'd slice memory requirements conciderably and still retain a lot of extravagance.

You mean like the introduction to Space Racer?


Almost! Space racer sounds like they are using a collection of (almost) singular sounds or 1-beat segments played in a row, where i propose taking more of a complete score (which has to be written, arranged and mixed according to specs), and slice it up into repeatable 2, 4 or 8 beat bars (or 3 or 6 or 12), depending on what we can afford. The total amount of bars used per song should be kept relatively low.

You can get pretty creative with this stuff by allowing the engine to overwrite what bar from memory is being played.
For the sake of the examples, let's assume we can afford 8 beats per bar/sample file:

Case A: Bar 1 plays for 8 beats 2 times.
Case B: Bar 1 plays for 8 beats, then bar 2 plays for 8 beats
Case C: Bar 1 plays two times, but on the 5th beat the 1st time, it is overwritten and repeats prematurely, thus creating variation out of just one bar.
Case D: Bar 1 plays 2 times, but on the 7th beat the 1st time you call bar 2 which then plays for 2 beats before bar 1 overwrites bar 2 again. Can be used for fills, suspension, and more advanced variation.
Case E: Bar 1 plays 1 time and is overwritten 5th beat by a bar containing just a short echo tail and then silence.

And lots and lots of other usable cases for getting the most out of a couple of bars.

As for arranging songs in the engine, i'd have one bit decide wether to repeat the last used bar or to scale one position up in the score bank, so the arrangement data doesn't have to be so explicit about everything. Sometimes, you specify a bar adress (a nibble could represent this if we're happy with max 16 bars per song), sometimes, you don't need to specify anything. And perhaps, if permittable, a nibble to decide start point within that bar - that'd give all the creative freedom needed do do something amazing.