This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Need Help Compressing Data on the NES.

Need Help Compressing Data on the NES.
by on (#54772)
Hey everyone. This is going to be my first post on the NesDev website! Anyways, on to my question, which is in the title. What method do you guys usually use to compress data in a NES ROM? I am asking because while I am doing some good progress on making my first interesting Custom NES game for an emulator (Collision system is in place! Will discuss question about that some other time.), I realized that at the rate I am using memory (32 bytes for each unique row of tiles embedded in ROM, as opposed to storing each tile as a byte in ROM.), I am going to run out of space if I want to make really unique levels without dramatic repetition. I also have some other questions about programming for the NES, but I will leave them for other topics. :) Oh yeah, and for any method provided for compressing data, can you post either 1 - if generous, example code or 2 - pseudo code, which will basically be a word-explanation as to how to do it. The reason for asking for explainations / code is because although I've studied methods of compression, I am yet to grasp it well enough to do it even at high levels. But perhaps it will not be as hard to pull of on the NES given there are some neat tricks I am not familiar with yet! Also, explanations by human beings as opposed to textbooks can make things much easier sometimes.

Thank you for any input for solving my problem!

EDIT: Changed title so it specifies this as a "help me" topic.

by on (#54773)
Hey, and welcome.

"Metatiles" are the main thing to use for background. The typical size is 16x16 pixels, which makes good sense because that's the size of the attribute table to select the colors. That will bring the memory used per screen to 240 bytes, down from 1024.

You would have 64 possible metatiles (using 6 bits). The remaining 2 bits can be used for attribute, but unless you plan on palette-swapping a lot of backgrounds, it's may be smarter select the attribute from a table based on the metatile number. Then you would have those 2 bits free for collision detection, or whatever else you want to define for any particular block.

The simplest compression code is run-length-encoding. All it does is replace a "run" of more than 3 identical bytes with the following - special marker byte, byte to repeat, number of times to repeat. You can combine this with metatiles, and save a whole lot of memory.

I hope that helps.

Another important point is that you need decode all this stuff into a buffer, and dump the buffer into VRAM during vblank. Because decoding may take some time, and vblank time is short.

by on (#54776)
I read your post, and while it actually was a bit of a review of the options, I decided to put some well pondered thought into how to get as much data out of 8 bits as possible, or 1 byte, and thats when I realized that that 8 bits wasn't the answer that looked most promising. 16 bits was.

What is it I am thinking of that could make use of 16 bits that seems so good? Well, it is an untested idea that I need to try out, and I will work on testing as soon as I post this, and has an obvious con as I will present later soon, but I will look for a solution later on.

First of all, the format using 16 bits:


Time to explain what each set does, and it is actually quite simple.

AA represents a pointer to a set of 3 tiles, where if AA equals 00, then it will refer to no tile, and be ignored.

BB functions the same way, but points to another tile set, and is ignored if BB equals 00. You will see shortly why AA and BB are separate.

CCC tells us how many times to repeat the tiles. If CCC equals 0000, a special case occurs because if either AA or BB is not zero, this will mean "draw one tile", meaning if CCC equals 1111, this will actually draw 16 tiles. The reason for this is for the minimum theoretical byte usage for one line of tiles will be 8 bytes, otherwise, it would have to be 10 bytes ( I think this would be the case). Also, an important note: The only time that CCC will force the system to draw an empty tile, would be if both set AA and set BB are both set to zero. It doesn't nessesarily FORCE drawing nothing, but it points to drawing a default, preferably "blank" tile.

D currently has no function. I want it to have a use, though. I will explain why it is omitted when I talk out bit set EEEEEEEE next.

Now, on to the heart of my idea. What would we waste a set of 8 bits on?! Actually, hold on cowbow, because this part is what makes the idea so effecient! That is because each bit actually has an idividual meaning!

The magic: each bit in the "E Byte" ( I will not quote it after this ) tells us the same thing, but it does either 1 thing or the other for a maximum of 8 times (8 bit set, remember?).

For example:

if bit 7 is 0, it draws one tile that is described by set AA.

if bit 7 is 1, it draws one tile that is described by set BB.

Sound simple? It is, and you may have a few questions that I will happily answer. If you are still wondering why I am omitting bit D in the first byte, it is because with the system I am using, you can only draw up to 8 tiles with the E Byte. Thus, bit set CCC only needs to go up to 111, and that number (including zero) can exactly fit the maximum repetition number that is supported by the E Byte.

Now, for the drawbacks:

Currently, since I have not tested this idea, it has only a few drawbacks I can see. With this system, you can typically only point to 6 unique tiles, and to make it worse, you can only draw two different ones at a time. It may be possible to increase the limit, and one thought I have for that is to implement bit D to serve as a "switch" to point to a different tileset that changes both set AA and set BB's tile pointers.

The other drawback I have noticed with this approach is its applications. It is only useful if you have a game where you need to often alternate two different tiles in quick succession.

In summary, I do think I am onto something that could prove to be a possibly useful compression method, and I still need to find a definitive good use for that bit D. I shall post some more, possibly in a new thread if I produce any more new ideas.

Oh, and be sure to give me feedback on my idea guys!

by on (#54786)
You're right in that the method you outlined wouldn't be terribly useful. But you are thinking along the right lines! Use smaller parts of a byte to define what you might normally take a whole byte or more to do. The best ways to save space include both the graphical consolidation known as metatiles, and a more literal compression that you might think of in terms of zip files.

As Memblers said, "metatiles" are a great technique that is used pretty much universally on the NES and many other game platforms. Considering a single 8x8 tile being the smallest piece a NES can think about, a metatile combines several of those so you can refer to them as one logical piece. Here are three examples of ways to represent your tiles:


The first is probably what you are doing now, literally storing the ID number of every single tile, 32 across by 30 down. Very wasteful.

The second combines each block of four tiles into a metatile. The graphics are all the same, but the way you store the data changes. With this method you have a table that records what tiles make up each tile - you're saying "tile 1 = 1, 2, 5, 6" etc. You have to write a routine to decode this and store the literal tiles in the NES memory as before, obviously.

The third example is something you might see in Super Mario Bros. You can compress your data as much as you want, and if you have a lot of large chunks of blocks that always go together you might just want to record that a single ID number results in one huge background object. Or another way to look at it is, having multiple levels of metatiles. In the picture, we've already said that "tile 1 = 1, 2, 5, 6" and so forth; then we go to the next level and say that "big tile 1 = 1, 2, 3, 2, 4." It can get quite complicated to decode, depending on how far you take it.

Me, I stopped at the second level this time. Here is the table I have for my current test ROM:

.db $00,$00,$00,$00   ;00 - empty
.db $E3,$E6,$E6,$E3   ;01 - solid center
.db $F2,$F3,$F4,$F5   ;02 - solid variant (big bubble)
.db $F6,$F7,$F8,$F9   ;03 - solid variant (medium bubbles)
.db $E8,$E6,$E2,$E3   ;04 - right wall
.db $E3,$EB,$E6,$E7   ;05 - left wall
.db $E4,$E1,$E6,$E3   ;06 - floor
.db $E3,$E6,$EC,$EA   ;07 - ceiling
.db $E0,$E1,$E2,$E3   ;08 - top left corner
.db $E4,$E5,$E6,$E7   ;09 - top right corner
.db $E8,$E6,$E9,$EA   ;10 - lower left corner
.db $E3,$EB,$EC,$ED   ;11 - lower right corner
.db $E3,$E6,$E6,$EE   ;12 - top left bend
.db $E3,$E6,$EF,$E3   ;13 - top right bend
.db $E3,$F0,$E6,$E3   ;14 - lower left bend
.db $F1,$E6,$E6,$E3   ;15 - lower right bend

And having recorded that, here is an example of a map screen taking advantage of it:


The next type of compression I apply is called RLE compression, or Run Length Encoding. It's very simple. See all the zeroes in a row up there? Rather than storing "0, 0, 0, 0, 0, 0" etc., you can just store "16, 0" - that is, just saying "there are 16 zeroes in a row here."

Now obviously this does not make sense if you have a lot of individual tiles. You are wasting space if you have to say "1, 4, 1, 2, 1, 5" (there is 1 four, 1 two, 1 five...) That is why you program in special cases that let you say "there are 16 zeroes in a row" when you need to, but otherwise you can just store the IDs literally.

My own rule that I have coded is this: if the current byte is less than 128, then that is simply the next tile in the sequence. if the current byte is greater than 128, subtract 128 from it and repeat the next byte that many times.

In other words, I am using the highest bit as a flag to tell the program how to treat this byte. If a byte is in the format %0xxxxxxx, that means the ID number is just whatever the byte is, which in compression terms is called a "literal." Since I'm using less than a hundred metatiles this isn't a problem. If however it is in the format %1xxxxxxx, then ignore that 1 and use the rest of the byte as a loop counter for the following byte. As an example, %10000101 means repeat the next byte 5 times.

So knowing that, here is what the above map looks like compressed:

;simple RLE compressed background using 2x2 metatiles
.db 131,0,4,2,5,0,0,4,3,1,1,2,1,3,2
.db 131,0,10,7,11,0,0,4,1,2,3,1,2,1,1
.db 136,0,10,131,7,13,1,1,2
.db 140,0,10,131,7
.db 144,0
.db 6,9,142,0
.db 1,5,0,0,8,9,138,0
.db 2,5,0,0,4,5,138,0
.db 3,14,6,6,15,5,138,0
.db 1,2,2,1,1,14,6,6,9,132,0,8,6,6
.db 12,131,7,13,3,1,2,14,9,131,0,4,3,1
.db 5,131,0,10,132,7,11,131,0,4,2,1
.db 5,140,0,4,1,2
.db 5,140,0,4,3,3
.db 14,140,6,15,2,1

You should be able to visibly compare the two and see how it works.

So with these two systems working together, we went from 32x30 tiles, which is 960 bytes to store one screen, down to 134 bytes for the screen and 64 bytes for the table that helps define it. That doesn't include the size of the subroutines that interpret the data but it definitely saves us a lot of space.

Just for good measure, here is the subroutine I use to unpack the above RLE-compressed map into RAM:

unpack_screen:      ;unpack an RLE compressed screen into RAM (2x2 metatiles)

;tmpada = address of compressed screen data
;tmp8y = index of compressed screen data
;SCREEN = constant address in RAM where data is unpacked
;tmp8x = index of RAM location
;x = loop counter for RLE runs
;a = tile data being loaded and stored

   ;*** make section to detect current screen and get address from table
   ;*** for now, load a single temporary bg

   lda #<bg
   sta tmpada   ;store address of screen data in tmpada
   lda #>bg
   sta tmpada+1
   ldy #0
   sty tmp8x   ;index of how many metatiles have been unpacked
--   ldx #1      ;used to handle RLE bits
   lda (tmpada),y
   bpl +      ;if last bit not set, it's a literal
   and #%01111111   ;clear last bit
   tax      ;x becomes the loop counter
   lda (tmpada),y   ;load tile to be repeated
+   sty tmp8y   ;back up data index
-   ldy tmp8x
   sta (SCREEN),y   ;store tile in screen RAM
   inc tmp8x   ;increment RAM index
   bne -      ;loop if we're in an RLE run
   ldy tmp8y   ;restore data index
   ldx tmp8x
   bne --      ;we are done copying data when RAM index wraps

If this can be written more efficiently, I am all for hearing it, by the way. :) I can get stuff to work but I am not very confident in my finesse.

by on (#54791)
Well depending on how your level are, it might be good to use 2 layers of metatiles - I do that in the game I'm making. The screen is made of 32x32 big metatiles which are made of 4 16x16 smaller metatiles.
And like you said, it's good to put all bits to good use. For example I have 32 big metatiles available, and I use the upper 3 bits to say if they how many times they're repeated (1-8). It ended up being quite efficient.

by on (#54831)
Well thanks for the input! You actually clarified what the meta tiles are very much for me!

Now, I am currently using a compression method, which I am squeezing a minimum of 4 bytes per "row", and it is working perfectly. However, I am having a problem that probably everyone has had at some point programming anything for the NES. I am getting artifacts after writing up to address $2303 in the PPU. I am suspecting that the problem is that I am running out of time to load the background into memory. I am loading it only once, which is right out of start-up. I have read that GBA-guys tutorial is very bad, and it is from his tutorial that I made my VBlank subroutine. Now, the artifacts will appear, and more specifically, I see no tiles drawn after $2303. I checked the debugger I am using, and it will show the PPU address resetting to $2000. This leads me to my question...Am I running out of VBlank time in this case? Because if the PPU is resetting, I think this is why, because I read somewhere that the PPU registers reset ( I think ) after VBlank.

I also need someone to help me here. If time is the problem, how can I extend the time I can take to draw? Does that mean drawing more than one VBlank, and if so, how can I do this successfully?

by on (#54832)
If your game doesn't scroll the screen at all, the easiest solution is by far to just disable rendering and have unrestricted PPU access, and once the screen is done you turn rendering back on. But if you plan on scrolling, read on.

VBlank time is very short. You shouldn't have much logic running during that time, you should use it just to dump data to VRAM. This means, among other things, that you shouldn't be decompressing data during VBlank. Compressed things that have to be sent to VRAM must be decompressed beforehand and buffered, and during Vblank you transfer that buffer to VRAM. If you aren't doing it like this already, this is the first thing to do.

Note however that even the fastest possible code can't update the tiles and attributes of a full screen in a single VBlank. Most games draw the full initial screen either while rendering is off (which means PPU access is unrestricted) or across several VBlanks while a black palette or the alternate name table (the one that is not being drawn) is displayed. Once gameplay starts and the screen has to scroll, only the tiles that just scroll into the picture need to be drawn, in the form of columns (horizontal scroll) or rows (vertical scroll).

The VBlank time is enough to update a row and/or a column of tiles, as well as the palette and the sprites, which is enough for most games. If your program works under those requirements, you should do things as I described above without needing any special tricks.

by on (#54838)
tokumaru wrote:
Most games draw the full initial screen either while rendering is off (which means PPU access is unrestricted) or across several VBlanks while a black palette or the alternate name table (the one that is not being drawn) is displayed.

Sorry if this is kind of off-topic, but exactly what advantage would you have of updating across multiple Vblanks with a black palette over just turning the screen off and doing it all at once? The only thing I could think of would be if you used the same updating code that you use in real-time to draw the whole screen. Actually, now that I think of it, in my project I turn the screen off and use the same updating code drawing the first screen the player sees one column at a time, but I do it all in one frame (and disable NMIs)...

by on (#54846)
Celius wrote:
Sorry if this is kind of off-topic, but exactly what advantage would you have of updating across multiple Vblanks with a black palette over just turning the screen off and doing it all at once? The only thing I could think of would be if you used the same updating code that you use in real-time to draw the whole screen.

Yeah, I guess this is the most obvious advantage. I don't really know if any games aver used the "black palette" approach, but displaying the alternate name table with useful information while a level is loading sure is an interesting option.

Actually, now that I think of it, in my project I turn the screen off and use the same updating code drawing the first screen the player sees one column at a time, but I do it all in one frame (and disable NMIs)...

I used to do it like this too, but there were complications because in my game objects can write to the background as well (background objects check which row/column is being rendered and overwrite the buffer with their own tiles if it they overlap), so I had to simulate the gameplay environment so that objects would work like normal in case they needed to draw to the background.

Recently I had a different idea, and instead of fooling the objects I decided to jump straight into the gameplay loop and draw the initial screen inside the regular game loop. A few special objects animate the title card sequence while drawing the initial screen 1 column at a time.

With this I don't need a separate loop before the main loop just to draw the first screen, I just need some special case handling in the main loop, so there isn't any repeated code. I guess this could be enough reason to do it a column per VBlank instead of the whole thing with rendering turned off.

by on (#54885)
Alrighty, partly from reading the responses here, and reading another topic about how to create a proper NES loop, I have now solved my problem with drawing the whole background.

It turned out that I was making a huge mistake - Drawing the background outside of VBlank. After noticing that I started drawing in VBlank - sending 128 byte blocks of tile data in at a time, which is about as much as I could squeeze into a single VBlank. So, I am drawing my whole background in roughly 7 1/2 VBlank cycles, and I am quite happy with the result, with no glitches other than unused sprites on the screen. I think I'm going to do more searches to remember how to get rid of the phantom sprites.

by on (#54887)
Assuming you're using $4014 with a page in RAM, the simple solution: Set all of the sprites' Y coordinates to $FF.