This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Adapting Sik's UFTC graphics compression

Adapting Sik's UFTC graphics compression
by on (#167722)
In this post, Sik wrote:
Also pixelart doesn't really compress that well in the first place since there isn't much redundancy at all within a sprite (UFTC exploits the fact that among different sprites from the same animation there may be some repeated portions as well as a good chunk of blank pixels, but even then it barely makes it to about half the size in the best cases).

UFTC, as I understand Sik's description at sonicretro #25105 and the compressor source code, breaks each tile into four 4x4-pixel quadrants, stores all quadrants in a dictionary, and stores each as a list of dictionary indices of the four quadrants that make it up. It decompresses well on a system with 4-bit packed pixels and 16-bit memory, such as the Sega Genesis or Game Boy Advance.

But decompressing UFTC on a planar system, such as ColecoVision/SG-1000, NES, SMS, TG16, Game Boy, or Super NES, would require a fast shift by 4 to pack and unpack the nibbles that represent 4x1-pixel slivers. A lot of these 8-bit-era CPUs take time to shift proportional to the shift distance. Exceptions include the SWAP instruction of the Game Boy's Z80-like CPU and the S-CPU's multiplier (write $4202=$10, write $4203=data byte, read $4216 and $4217).

I'm curious as to what sort of ratio is possible for 2bpp graphics (NES, Game Boy, Super NES BG3), where each 4x4 pixel quadrant fits into 4 bytes and you need 2 bytes to address a quadrant in the dictionary. This would end up inflating each unique quadrant by 50 percent. So I'm further curious as to how many quadrants other than the all-blank tile are actually repeated often. Because if few are, it'd be cheaper to just encode which quadrants are blank in a header byte, which I'll call "null UFTC":
Code:
76543210
|||||||+- 0: top left is all 0's; 1: contains nonzero pixels
||||||+-- 0: top right is all 0's; 1: contains nonzero pixels
|||||+--- 0: bottom left is all 0's; 1: contains nonzero pixels
||||+---- 0: bottom right is all 0's; 1: contains nonzero pixels
++++----- Compression type for this tile, where one value indicates null UFTC

Or equivalently:
Code:
76543210
||||||++- 0: top half transparent; 1: left halves; 2: right halves; 3: literal bytes
||||++--- 0: bottom half transparent; 1: left halves; 2: right halves; 3: literal bytes
++++----- Compression type for this tile, where one value indicates null UFTC


And even if non-blank tiles are reused often, I'm also curious about how many are. This would inform another option to keep only the 256 most-used quadrants in the dictionary and have a similar control byte for each tile that determines whether each quadrants is a dictionary entry or a literal.

Another option is to encode left-half and right-half quadrants as two separate dictionaries that overlap in memory, so that the quadrants can be combined by ANDing without shifting. This might work as well for 4bpp planar tiles on Super NES and TG16 as UFTC works on the Genesis and GBA. How much does it cost in compression ratio to never reuse quadrants from one side to the other?

I could answer all these questions myself by testing these hypotheses against a tile corpus. What corpus is commonly used to evaluate 2bpp and 4bpp tile codecs, if any?
Re: Adapting Sik's UFTC graphics compression
by on (#167723)
This kinda reminds me when I tried quadtree compression of the individual bitplanes. The results weren't particularly impressive, but it didn't try to reuse common patterns.
Re: Adapting Sik's UFTC graphics compression
by on (#167728)
First of all, you have to wonder if 4×4 is really the best option here. If it was me I'd try 8×2 first, since then you can just copy bytes as-is (which is the critical part here). Note that I did consider making a variant of UFTC using 8×2 blocks instead (called XFTC because it's even faster for the 68000), but I ditched it because the compression was slightly worse. But I guess still it's a better option than not having anything in the first place.

So maybe consider looking at 8×2 blocks besides 4×4 ones. For all we know it's a better option once trade-offs are considered, and it's a lot more feasible for planar graphics.
Re: Adapting Sik's UFTC graphics compression
by on (#167731)
So I have several dimensions to test in order to quantify the space-time tradeoff of UFTC-like algorithms:
  • Quadrant size: 4x4, 4x4 with separate left and right dictionaries, or 8x2
  • Compression method: Solid color 0, all solid colors, top 128, top 256, straight 2-byte dictionary
  • Platform: 2bpp graphics and 6502 or 4bpp graphics and 65816+mul8x8
  • Metrics: Ratio and decompression speed

In addition, the ratio and decompression speed for 2bpp results would be compared to those for PB8 RLE and Oracle common byte, which make up a "control group" of sorts.

I could start with CartCollector's NES graphics corpus and extend it with a second corpus from the first two volumes of Action 53. Then later I could ask in SNESdev for a 4bpp corpus.
Re: Adapting Sik's UFTC graphics compression
by on (#236688)
My first attempt at this got interrupted by The Curse of Possum Hollow. I've since learned how to better interleave hobby projects and paid projects. So I'll leave some notes to self or to anyone who wants to benchmark UFTC family.

The quadrant dictionary can be in one of three formats:

XFTC (8x2)
Each dictionary entry is 4 bytes in Game Boy format (top row plane 0, top row plane 1, bottom row plane 0, bottom row plane 1).

UFTC (4x4)
Each dictionary entry is 4 bytes, in the format used by Missile Command's MADSEL mechanism and by Atari 7800 MARIA mode 320B tiles: plane 1 in bits 7-4 and plane 0 in bits 3-0.

Dual dictionary UFTC (4x4)
Each dictionary entry is 8 bytes in Game Boy format. Two dictionaries, for the left half and right half of a tile, share the same bytes. Left half entries use bits 7-4; right half entries use bits 3-0.

Compressed tiles can be in one of four formats:

Solid color
A header byte specifies whether each quadrant is a 4x4 literal or solid color, as well as which solid color is used for the solid portions of this tile.
Tile size is 1 + 2 * bpp * literal_count
No dictionary is emitted

Top 256
A header byte specifies whether each quadrant is a 4x4 literal, a reference to the first 256 entries of the dictionary, or a reference to the rest of the dictionary.
Tile size is 1 + 2 * bpp * literal_count + ref8_count + 2 * ref16_count

Top 128
If the first byte is less than 256 - (dictionary size / 256), it's a 1-byte reference to the start of the dictionary. Otherwise, it's a 2-byte reference to the rest of the dictionary.
Tile size is 4 + ref16_count

Straight 2-byte dictionary
All dictionary entries, no literals. Suitable for random access decompression, ratio is worse than top 128 especially for 2bpp.
Tile size is 8 (fixed)
Re: Adapting Sik's UFTC graphics compression
by on (#236704)
UFTC is in effect a layer of metatiles below the 8x8 pixel tile.

I implemented a size estimator and ran it on NESdev Corpus. The Top 128 (really top 240+) and Top 256 methods worked best, with Top 256 edging out Top 128 on tile sets with many hapaxes (quadrants appearing only once) compared to the tile set's total size.
Code:
num_tiles is the total number of 8x8 pixel tiles
len(freqs) is the total number of distinct 4x4 pixel quadrants
len(hapaxes) is the number of 4x4 pixel quadrants appearing only once
topn_uses is the number of uses of the 240-256 most common 4x4 pixel quadrants

top256: 4 * len(freqs) + num_tiles + 2 * dupe_uses - topn_uses
      = 4 * len(freqs) + num_tiles + 2 * (4 * num_tiles - len(hapaxes)) - topn_uses
      = 4 * len(freqs) + 9 * num_tiles - 2 * len(hapaxes) - topn_uses
top128: twobyte_bytes - topn_uses
      = 4 * len(freqs) + 8 * num_tiles - topn_uses
top256 - top128 = num_tiles - 2 * len(hapaxes)


I compared UFTC and XFTC (but not yet 2-dictionary UFTC) to Oracle common byte, ZET (a simplified variant of Oracle common byte always using $00 as the common value), PB53 (an RLE codec with a few NES-specific special cases for single-color and repeated tiles), interleaved PB53 (which allows repeating a tile 4096 bytes back), jroatch's Donut, and a simplified variant of Donut that lacks the "transpose" mode.

Results:
Code:
battlekid (47488)
    uftc    twobyte:50096  top128:45864  top256:38030  solid:40448
    xftc    twobyte:51360  top128:47364  top256:38766  solid:42200
    oracle:37143  zet:39041  pb53:35089  ipb53:35100  sdonut:32786
    donut:31551
bombsweeper (8192)
    uftc    twobyte: 6520  top128: 4825  top256: 4527  solid: 4232
    xftc    twobyte: 6612  top128: 4940  top256: 4587  solid: 4372
    oracle: 3775  zet: 3519  pb53: 2760  ipb53: 2762  sdonut: 2257
    donut: 2168
cmc80s (16384)
    uftc    twobyte:14196  top128:11543  top256:10437  solid:12172
    xftc    twobyte:14592  top128:12022  top256:10680  solid:13152
    oracle:11345  zet:11920  pb53:10292  ipb53:10298  sdonut: 8825
    donut: 8482
dpadhero (20496)
    uftc    twobyte:18500  top128:15382  top256:13393  solid:14273
    xftc    twobyte:19032  top128:16074  top256:13873  solid:16029
    oracle:12919  zet:15242  pb53:10438  ipb53:10448  sdonut: 9102
    donut: 8837
geminim (4096)
    uftc    twobyte: 3728  top128: 2870  top256: 2668  solid: 2928
    xftc    twobyte: 3896  top128: 3080  top256: 2776  solid: 3184
    oracle: 2528  zet: 2791  pb53: 1896  ipb53: 1896  sdonut: 1674
    donut: 1621
gemventure (4096)
    uftc    twobyte: 4376  top128: 3681  top256: 3146  solid: 3880
    xftc    twobyte: 4408  top128: 3721  top256: 3165  solid: 3988
    oracle: 2981  zet: 3370  pb53: 2292  ipb53: 2292  sdonut: 1984
    donut: 1953
lj65 (4096)
    uftc    twobyte: 3852  top128: 3025  top256: 2769  solid: 3540
    xftc    twobyte: 3992  top128: 3200  top256: 2842  solid: 3892
    oracle: 3224  zet: 3724  pb53: 2282  ipb53: 2282  sdonut: 2157
    donut: 2086
manhole (8192)
    uftc    twobyte: 8072  top128: 6766  top256: 5694  solid: 6052
    xftc    twobyte: 8560  top128: 7377  top256: 6070  solid: 6992
    oracle: 6087  zet: 6938  pb53: 4876  ipb53: 4878  sdonut: 4416
    donut: 4268
neotoxin (24576)
    uftc    twobyte:22944  top128:19438  top256:16552  solid:16796
    xftc    twobyte:23052  top128:19602  top256:16720  solid:18032
    oracle:14597  zet:14308  pb53:12838  ipb53:12848  sdonut:11407
    donut:10911
nesnake2 (8192)
    uftc    twobyte: 6912  top128: 5568  top256: 5678  solid: 7164
    xftc    twobyte: 6976  top128: 5649  top256: 5729  solid: 7052
    oracle: 6406  zet: 7485  pb53: 6162  ipb53: 3877  sdonut: 5267
    donut: 5132
quantdisco (32768)
    uftc    twobyte:28928  top128:23983  top256:20868  solid:20568
    xftc    twobyte:29632  top128:24910  top256:21501  solid:22572
    oracle:19463  zet:21659  pb53:15524  ipb53:15538  sdonut:13887
    donut:13395
sackofflour (32768)
    uftc    twobyte:25916  top128:20155  top256:18488  solid:16932
    xftc    twobyte:26832  top128:21374  top256:19345  solid:21920
    oracle:19154  zet:22540  pb53:12927  ipb53:12117  sdonut:11355
    donut:10998
solarwars (32768)
    uftc    twobyte:32800  top128:28689  top256:23405  solid:22308
    xftc    twobyte:32604  top128:28434  top256:23234  solid:22952
    oracle:20810  zet:21152  pb53:19060  ipb53:19074  sdonut:16844
    donut:16389
sting (6144)
    uftc    twobyte: 4796  top128: 3437  top256: 3366  solid: 3872
    xftc    twobyte: 4800  top128: 3442  top256: 3368  solid: 4084
    oracle: 3173  zet: 3519  pb53: 2456  ipb53: 2458  sdonut: 2135
    donut: 1971


Verdict: For NES tiles, UFTC does not beat PB53, which is still probably the best choice for streaming decompression on NES.
Re: Adapting Sik's UFTC graphics compression
by on (#236966)
Why did you choose UFTC ??,because if you want a fast streaming decompression, you should try LZ4 instead .
https://www.brutaldeluxe.fr/products/cr ... index.html

UFTC is fast to decompress, but it seems limited in some types of data,and not well suited for bitplane graphics .
Re: Adapting Sik's UFTC graphics compression
by on (#236967)
One problem I've found with using anything that has "LZ" in the name on the NES is that such codecs operate based on back-references to previous decompressed data. The "match offset" in LZ4 can be up to 65535 bytes or the whole file, whichever is shorter. This is fine for decompressing on a computer with a large amount of CPU RAM, such as the Apple IIGS that the linked article assumes. But it's inconvenient for a largely ROM-based system, most of whose RAM can be accessed for only 1.2 ms out of every 16.4 ms. On an NES, for example, you have 2K of internal CPU RAM and perhaps 32K of CHR RAM on the cartridge, which can be read or written only during forced or vertical blanking. One ends up treating CHR RAM as write-only in most cases because reading it requires so much accommodation from the rest of the engine. This is why I've preferred codecs that embrace small RAM, such as RLE or those use a dictionary in ROM. RLE can be seen as the special (degenerate?) case of LZ with a small, fixed match distance of 1 or 2 or however far apart the rows from the same bit plane are. UFTC originally attracted my attention because of its use of a dictionary in ROM. But in this case, a lightly customized RLE codec (PB53) beat UFTC for most of the corpus, as did a heavily modified RLE codec (Donut).

I imagine that LZ family codecs are more practical on Master System, TurboGrafx-16, and Game Boy, which both have 8K of RAM and make VRAM less inconvenient to read. But PB16 (RLE with a match distance of 2) does pretty well for me on Game Boy.
Re: Adapting Sik's UFTC graphics compression
by on (#236971)
tokumaru wrote:
This kinda reminds me when I tried quadtree compression of the individual bitplanes. The results weren't particularly impressive, but it didn't try to reuse common patterns.

I coded bitplane-based compression for a client last year, and the result of my "PornoMAG" compressor was rather impressive!

We were able to compress an 8KB bank containing a large cell-shaded image, down to just 800 bytes! (It likely wouldn't work that well, for sprites & tilesets, though.)

The client paid me a substantial fee on top of my payment, not to release the tools to the public. Which I happily accepted.

Their game will be released for the Famicom, at some point in the near future.
Re: Adapting Sik's UFTC graphics compression
by on (#236973)
tepples wrote:
a heavily modified RLE codec (Donut).

That's like saying PNG is a heavily modified DEFLATE codec.

So I ran your scripts through some sets that I had.

Code:
ByteOff19 (5996544)
    uftc    twobyte:3286364  top128:1787254  top256:2237960  solid:1255772
    xftc    twobyte:3308448  top128:1809359  top256:2266533  solid:1415212
    oracle:1790876  zet:1592931  pb53:1017799  ipb53:1016790  sdonut:726594
    donut:694238
a53-1to4 (557056)
    uftc    twobyte:433980  top128:369319  top256:337972  solid:383664
    xftc    twobyte:438800  top128:377848  top256:344074  solid:411340
    oracle:356532  zet:379923  pb53:299114  ipb53:293424  sdonut:266346
    donut:254577
donut-corpus (1638400)
    uftc    twobyte:1228480  top128:819024  top256:1048242  solid:1368156
    xftc    twobyte:1259324  top128:849898  top256:1084877  solid:1488292
    oracle:1210732  zet:1336070  pb53:1032525  ipb53:1019284  sdonut:930000
    donut:891330
Re: Adapting Sik's UFTC graphics compression
by on (#236974)
JRoatch wrote:
tepples wrote:
a heavily modified RLE codec (Donut).

That's like saying PNG is a heavily modified DEFLATE codec.

Good analogy, given the Paeth predictor and other scanline differencing modes of PNG. GIF is pixels->optionally interlace->LZW, whereas PNG is pixels->expand to 2^2^n colors->optionally Adam7 interlace->differencing->DEFLATE.

But thanks for the numbers:

JRoatch wrote:
Code:
ByteOff19 (5996544)
    uftc    twobyte:3286364  top128:1787254  top256:2237960  solid:1255772
    xftc    twobyte:3308448  top128:1809359  top256:2266533  solid:1415212
    oracle:1790876  zet:1592931  pb53:1017799  ipb53:1016790  sdonut:726594
    donut:694238

I'm curious as to what ByteOff19 looks like that'd make UFTC solid do so well. Are there a lot of solid tiles or tiles with two or three solid quadrants?

JRoatch wrote:
Code:
donut-corpus (1638400)
    uftc    twobyte:1228480  top128:819024  top256:1048242  solid:1368156
    xftc    twobyte:1259324  top128:849898  top256:1084877  solid:1488292
    oracle:1210732  zet:1336070  pb53:1032525  ipb53:1019284  sdonut:930000
    donut:891330

Both UFTC and XFTC top 128 do well on this one. Perhaps the reason I saw so few matches in top 128 on the parts I used is because I had compressed each part of NESdev Corpus separately. Top 256 does better with a large fraction of hapaxes, and the bigger dictionary by combining them all helps top 128 do better. Do the banks in this 1.6 MB corpus share a lot of data?
Re: Adapting Sik's UFTC graphics compression
by on (#236976)
tepples wrote:
I'm curious as to what ByteOff19 looks like that'd make UFTC solid do so well. Are there a lot of solid tiles or tiles with two or three solid quadrants?

It's literally 80% solid black tiles. Sorry to disappoint.

tepples wrote:
Both UFTC and XFTC top 128 do well on this one. Perhaps the reason I saw so few matches in top 128 on the parts I used is because I had compressed each part of NESdev Corpus separately. Top 256 does better with a large fraction of hapaxes, and the bigger dictionary by combining them all helps top 128 do better. Do the banks in this 1.6 MB corpus share a lot of data?

Other then text glyphs, not much is obviously shared.
It's a concatenated file of a collection of 8KiB files from CHR-ROM games including unlicensed and homebrew.
When I did run the 8KiB files separately uftc-top256 did the best out of all it's variants, laying between oracle and zet in terms of bytes used.