This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Improved 400+ color palette demo

Improved 400+ color palette demo
by on (#62735)
I just improved my 400+ color palette demo to not shake (well, it shakes by two pixels, but at 30 Hz, so it's much less noticeable). Cleaner, easier-to-follow code is included, rather than the super-optimized code like in the previous version. NTSC-only at the moment:

full_palette.zip

Image Image
Re: Improved 400+ color palette demo
by on (#62753)
blargg wrote:
I just improved my 400+ color palette demo to not shake (well, it shakes by two pixels, but at 30 Hz, so it's much less noticeable). Cleaner, easier-to-follow code is included, rather than the super-optimized code like in the previous version. NTSC-only at the moment:

full_palette.zip

Image Image


Man, that is WHACK!

by on (#62759)
I find it funny that the first image is full of harmony and looks very clean, while the second one looks like a TV with really bad interference, even though they are composed by the same rows, just arranged differently (the JPEG artifacts don't help, but both images have them)...

by on (#62760)
I'm not hip with all the goovy jive you kids use these days, but isn't "whack" a negative term?

by on (#62761)
So is "Bad", but that didn't stop anyone.

by on (#62762)
Dwedit wrote:
So is "Bad", but that didn't stop anyone.

Man, I love the power glove.

by on (#62763)
Drag wrote:
Man, I love the power glove.

Why, Lucas? Is that what lets you PK freeze Jimmy Woods's hands?

by on (#62765)
Yeah, the left image looks more vibrant and clean, but doesn't really show how smooth a gradient you get. You are left wondering whether some colors are repeated. The right one makes it very clear that this isn't just showing 52 colors or something puny.

by on (#62775)
A to switch: is that possible?

by on (#62796)
Sure, but unless I can make the inner loops similar enough to be switched based on a variable, it would involve having both loops in the code, and thus more complexity. It's already complex enough to follow as it is. I'll take a look though, since it would be nice to have one ROM.

I came across your vertical stretch demo where you blend lines via flickering. I realized that this would benefit the smooth gradient version of this palette demo. Since it already shakes horizontally at 30 Hz, effectively blending the vertical edges, I could have it shake vertically as well, blending it all together more.

by on (#62798)
blargg wrote:
Sure, but unless I can make the inner loops similar enough to be switched based on a variable, it would involve having both loops in the code, and thus more complexity. It's already complex enough to follow as it is.

For each mode, make a 32-entry table of tints and a 32-entry table of starting colors. Order them in ROM as normal_tints, smooth_tints, normal_startcolors, smooth_startcolors. Then display entries 0-31 or 32-63 of each table.

by on (#62944)
I put up a nice page on the Wiki describing how this demo works, and updated the source code to be cleaner and easier to follow. Unfortunately, I didn't put in controller control of alternating between two palettes, as it would have complicated the code more than it was worth.

by on (#62959)
Wasn't there an animated demo posted with the noise channel making 'wave' sounds? Was that posted elsewhere or was it taken down?

I could've sworn it was in this thread!

by on (#62967)
Yeah, but it's in for a big upgrade. Stay tuned... :)

by on (#62986)
Well Blargg, congratulation for this demo which was implemented in an incredibly clever way.
If I understand way, you managed to get completely rid of all jittering (or get only 1 pixel jitter I haven't understood exacly).
How did you manage this exploit ? I always got ~9 pixels (3 CPU clock) jittering at best (NTSC).

by on (#62988)
There's a description on the Wiki page. You still have the three-pixel jagged vertical edges, but this latest version reduces the horizontal shaking to one pixel, which isn't very visible, basically just making the edges slightly softer.

I'll attempt a more concise [er, longer it seems] version, covering some details I know you know but for the benefit of others.

The PPU has long and short frames. Long frame is 341*262=89342 PPU clocks, or 29780 2/3 CPU clocks. If rendering is disabled, you get all long frames. Looking at where CPU cycles fall on the first scanline, it cycles through three positions:
Code:
0--1--2-- frame 1
-0--1--2- frame 2
--0--1--2 frame 3
0--1--2-- frame 4
-0--1--2- frame 5
--0--1--2 frame 6
...

So with rendering disabled, there's no way to avoid the image shaking quite noticeably, with the jagged edges slowly moving along.

If rendering is enabled, you get an alternation between long and short frames (a short frame is one PPU clock shorter than a long). This causes the position of the CPU cycles on the first scanline to only toggle between two positions:
Code:
0--1--2-- frame 1
-0--1--2- frame 2
0--1--2-- frame 3
-0--1--2- frame 4
0--1--2-- frame 5
-0--1--2- frame 6
...

So it seems you can just enable rendering during the time where the PPU skips a clock every other frame, and have the image not shake. But the problem is that your code also has to delay one CPU cycle extra every other frame, as the average length of a short and long frame includes half a CPU cycle fraction (29780.5).

So your code is delaying one extra CPU cycle every other frame, and the PPU is skipping a pixel every other frame. When you delay an extra clock, you effectively move your image three pixels to the right. On long PPU frames, the PPU effectively moves the image to the left by one pixel, and on short, to the left by two pixels. You have a choice as to which PPU frames you delay an extra CPU clock on, either short or long. You want to delay the extra clock and move your image right three pixels on frames that the PPU moves it two to the left, so that they result in only one pixel shift right.

If you do it wrong, you'll move the image three pixels right when the PPU moves it only one to the left, resulting in it moving a total of two to the right. Then on the next frame, the PPU will move it two to the left, and you'll have much more noticeable shaking.
Code:
0--1--2-- frame 1
--0--1--2 frame 2
0--1--2-- frame 3
--0--1--2 frame 4
0--1--2-- frame 5
--0--1--2 frame 6
...

So in the code, I run long frames and synchronize to the PPU clock, then enable rendering and figure out whether the next frame is a short or long one, and delay one extra frame if necessary, so that everything is synchronized.

by on (#62989)
If I understand well you're constantly in sync with the PPU and never use interrupts.... so this isn't viable if you want to do anything else than a raster effect, right ?

by on (#62990)
Haha, right. This wouldn't be useful for code that isn't entirely cycle-timed. I'd be very surprised if you could even reduce the jitter to three pixels with such code, as it would require landing on the same CPU clock each frame. Have you actually managed to do this? The best I can think of is an NMI routine with the CPU sitting in a series of two-cycle instructions (like a long string of NOPs), which would synchronize you to an accuracy of two CPU clocks (6 pixels).

Hmmm... with this approach, if you could somehow determine whether your NMI code took an even or odd number of clocks, you could add an extra and synchronize down to the CPU clock, and use the scheme I described. Determining even/odd might be possible via a sprite #0 hit or something near the end of the frame. Argh, now I'm going to have to experiment with this at some point. It'd be pretty cool to be synchronized exactly to the PPU every frame...

by on (#62991)
In fact with a mapper that trigger an IRQ at a very predictable PPU clock (such as MMC3 or MMC5 but NOT any VRC series, FDS, N106, etc...) it could use the double interrupt method :

Interrupt 1 prepare field for interrupt 2 the next scanline, and then go into a long string of nops, that are interrupted by interupt 2. Inside interrupt 2 you get 2 cycle error (which is 6 pixels NTSC and 6.4 (7 ?) pixels PAL, both fitting into 1 tile if you're lucky).

I know they managed to somehow kill that remaining 1-cycle jitter on the C64 (which has a built-in scanline counter that can both trigger IRQs and be read at any time), however I don't know if that could be possible on NES. Maybe rely on obscure $2007 read behavior ?

by on (#62992)
I guess the "jitter" explains how/why Atari 2600 games looked how they did (AFAIK they did basically what blargg's demo does, since the console had no actual PPU as I understand it).

by on (#62993)
I think I'm going to get this working to where your wait-for-nmi loop can be made of a normal JMP loop. This would be really useful in the mega-palette-demo I'm working on, which is presently entirely cycle-timed (which is crazy, considering all the branches and calculations it's doing).

I'd have thought that the Atari's frame length was a multiple of CPU clocks, so that you could have pixels appear at the same position each frame. I don't remember them shaking, but it was about 6 years ago that I last had a 2600.

by on (#62994)
blargg wrote:
I'd have thought that the Atari's frame length was a multiple of CPU clocks, so that you could have pixels appear at the same position each frame. I don't remember them shaking, but it was about 6 years ago that I last had a 2600.


Each line on the Atari VCS is 228 cycles long at 3.58 Mhz video clock. Divide by 3, and you get the CPU clock and the amount of CPU cycles per line, which is 76. Synchronizing the 6507 to the video is very easy. Just write to a strobe register (WSYNC), and the graphics chip halts the CPU until horizontal blank is reached. From that point on, you bang the video registers constantly to draw your graphics. The better your skill as a coder and "out of the box thinker", the better your graphics look.

Such a mechanism would have been quite practical for lots of other architectures. Unfortunately, Atari (and Commodore) patented the hell out of their innovative designs, which might be the reason why the NES PPU has this weird method via Sprite 0 to detect a certain scanline. Atari later sued Sega alone for patent infringement on various methods on sprite and character generation, which were common in japanese 16 Bit video game architectures, but invented by Atari first in the 1970's. From the money they got from Sega, they could sustain themselves and the Jaguar for another 2-3 years, until they finally went belly-up.

by on (#62995)
Success! It was failing before because I was disabling rendering too soon, dumb bug on my part. Now to clean it up and add a page to the Wiki. A PAL version should be possible, and it will be simpler as PAL doesn't have short frames.

It's kind of sad, because I had put so much into this completely-cycle-timed palette demo I'm making. It's going to be so much simpler to work on now, but not as impressive a feat technically. :(

by on (#63136)
OK, it took a few days to be sure I had all the timing worked out. Now I've got a little simulator program and understand it well, along with working test programs. But I want to be sure it's worth documenting and putting together demos for. I'm not that familiar with all the mid-frame PPU things you'd want to do, and which ones have tricky timing, so I'm seeking input on these. I'll describe what the requirements are and what it provides.

Basically what this new technique provides is synchronization with the PPU as best as is technically possible. It is as if you cycle-timed everything. This means that with rendering enabled each frame, you can have a particular CPU cycle fall on one of two adjacent pixels in each frame. It will jiggle between them every frame, as this is a hardware limitation. So if you're doing raster graphics, the image will hardly shake. If you're doing timed writes during the frame, they won't jump around by several pixels. You will always fall on one of two particular adjacent pixels in the frame, since the PPU is synchronized with beforehand. The two pixels your write can fall on are the SAME regardless of PPU-CPU synchronization after power/reset.

The requirements:
* NTSC NES only at the moment. If this proves a useful technique, I'll look into putting it on PAL.

* Before enabling rendering, you call a synchronization routine. This takes about 18 frames on average, and blanks the scrreen while it's synchronizing. You can call this again to re-synchronize later, if necessary.

* After synchronizing, you must enable NMI and enable rendering by the next frame and leave it enabled from that point.

* Every frame from that point, you must increment a frame counter. If you don't need synchronization for a particular frame, you don't need to do anything more.

* During frames you need synchronization, you must do sprite DMA during vblank (unlikely you aren't already doing this), and run about ten instructions of synchronization code at the end of vblank, just when the vblank flag is being cleared. All code in vblank before that point must be cycle-timed.

* The loop that waits for NMI must be a HERE: JMP HERE style, and it must be executing when each NMI is triggered. No special NOP sequences are required, as I originally figured.

* DMC samples can't be played, since they introduce too much jitter. MMC3 or similar scanline interrupt seems like the only way to get around this.

The main question is how useful this might be. Thoughts?

by on (#63137)
Sounds useful to me ! Can't wait to see what it is exacty.
Although I really think it's important than it's done on both NTSC and PAL. If it's NTSC only then that will be a bad move IMO.
Quote:
I'm not that familiar with all the mid-frame PPU things you'd want to do, and which ones have tricky timing, so I'm seeking input on these. I'll describe what the requirements are and what it provides.

Basically, scan-line based effects doesn't have very tricky timing because you have plenty of time window to do your register writes.

However, mid-scanline effects, such as changing the pattern table for a text window, (or the effect you do in your demo obviously), have very tricky timing. I guess if you could have very few jittering, it could be possible to simulate multiple scrolling backgrounds via $2006 writes on each scanline, in addition to switch nametables and pattern table multiple times a scanline. Thit could open cool effects.

by on (#63138)
OK, you've convinced me to carry this all the way through. PAL shouldn't be nearly as difficult because of the way VBL is synchronized with CPU cycles more firmly. Basically the only thing there to deal with is the variable NMI latency. The current technique will carry over easily for that.

EDIT: working perfectly on PAL now too (kept hitting reset to be sure it works with all possible PPU-CPU synchronizations). As predicted, it was much simpler to do there. Now to write a clear explanation and clean demo code.

by on (#63156)
Awesome work! If you need any more thoughts on usefulness, I remember struggling with these shaky issues myself many years ago when coding raster effects, but never took up the task to actually reverse engineer the behavior fully like you've done here - I was satisfied enough with having coded the probably first NES program to compensate for DMC cycles in the raster loops :P

So I'm sure a lot of people will find your code very useful! I know I would have liked to know those secrets when having raster code that would "aaalmost" work, as long as you got a "good" reset...

by on (#63160)
Bananmos wrote:
Awesome work! If you need any more thoughts on usefulness, I remember struggling with these shaky issues myself many years ago when coding raster effects, but never took up the task to actually reverse engineer the behavior fully like you've done here - I was satisfied enough with having coded the probably first NES program to compensate for DMC cycles in the raster loops :P

So I'm sure a lot of people will find your code very useful! I know I would have liked to know those secrets when having raster code that would "aaalmost" work, as long as you got a "good" reset...

+1. Very cool to finally see a complete solution to this problem, figuring all this out has been on my todo-list for a long time... I'll be sure to play around with the demo code once you release it.

by on (#63165)
So this is basically the NES equivalent to the C64's stable raster IRQ, except with no IRQ? Brilliant. Now if I could just figure out something similar for the Megadrive, could do some fun things by changing values in VDP registers (that don't require writing to VDP RAM).