This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Compressing 1-bit tiles

Compressing 1-bit tiles
by on (#56855)
I looked at tokumaru's rewrite of the Codemasters Markov CHR codec. I noticed that it essentially transforms runs of identical pixels into a unary code: 0 represents one pixel, 10 represents two, 110 represents three, etc.

Dwedit has noticed that this scheme doesn't perform so well on tiles with 1 bit per pixel. These tiles are seen in text fonts and in decorative tiles representing things that are in the far background. Encoding run lengths in unary doesn't save any bits over not compressing the 1-bit tile at all: each pixel has one bit for whether it's the same color as the pixel to its left or the other color. In fact, there are two additional bits of overhead per line: one to state that the row isn't repeated, and one because the first pixel is always literal (not compressed), even in a tile with two colors mode 1 and the others mode 0.

I'm playing with the following techniques and seeing 15 to 20 percent off a font:
  1. Encoding 32x8 pixel areas instead of 8x8, to be rearranged by the unrolled code that copies the result to VRAM. This allows for capturing longer run lengths, such as the 1-pixel-tall gap at the bottom of glyphs with no descender or the 2-pixel-tall gap at the top of lowercase glyphs with no ascender.
  2. Gamma coding the run lengths. It slightly outperforms Golomb(M=2), which significantly outperforms unary.
  3. Swapping the code for the most common run length with the code for run length 1 pixel. This saves on (for example) fonts with lots of vertical strokes 2 pixels thick.

I'll release a demo once I have an encoder and decoder finalized.
Re: Compressing 1-bit tiles
by on (#56856)
tepples wrote:
Gamma coding

Hum, I had never seen that. The number of bits (minus one, because no number uses 0 bits) necessary to represent the value are encoded in unary, and then comes the value itself. Interesting. How can you represent 0, without it being interpreted as one extra bit in the next code? EDIT: I just read it can't encode the number 0, so you'd have encode the value + 1 and subtract 1 after decoding.

Anyway, I wonder if an auto-triggering RLE scheme (a run length is only encoded when a pixel repeats, even if this length is 0, in case the pixel doesn't repeat again) would work well in this case. I guess this is the only way to make RLE efficient when working with codes as small as 1 or 2 bits in small images.

If a very good compression for 1-bit tiles is devised, it would be cool if it performed well on normal NES tiles too, since they are basically composed of 2 1-bit tiles. I don't think RLE alone will do much though, we must do something about larger repeating patterns, not only single pixels.

by on (#56863)
I'd never seen gamma coding either, that's ingenious. I'm very interested in the results here, since there's recently been a resurgence in popularity for simpler graphics that could use this compression.


by on (#56905)
I always did "repeat last row" encoding for 1bit graphics. Though that's almost always for fonts and not other detail. Compression results vary, obviously, but I was able to compress a whole 96 character font set to 55-60% of the original.

There's also other techniques, like the changing the format of the graphics themselves. Changing horizontal bitmap to vertical rows instead or even re-ordering to zig-zag format. Before compressing.

by on (#57144)
Today, I managed to round-trip SMB1's CHR through the Python encoder and 6502 asm decoder. So I feel ready to release exact ratios that I've managed to achieve, which approach within a couple percent of CMM on pretty much everything but SMB1:

Concentration Room
titlegfx.chr (plenty of 1-bit tiles)
compressed 22528 bits to 11909 saving 47%
compressed 32768 bits to 20712 saving 36%

compressed 32768 bits to 18475 saving 43%

Super Mario Bros.
smb1bg.chr (minus tiles containing title screen data)
compressed 30720 bits to 18660 saving 39%

Perhaps after I release the encoder, hopefully tomorrow night, someone could help figure out why CMM kicks the behind of the (admittedly less sophisticated) format that I'm using for 2-bit tiles on compression ratio for SMB1.

by on (#57235)
As you can see from the file sizes in the zipfile, it even beats PNG.

It encodes 2-bit tiles similarly to 1-bit tiles, except instead of using a lookup table for each color as in CMM, it uses only one table that gets EOR'd against the current color.