This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Bitwise tricks for CHRAM Pattern Table?

Bitwise tricks for CHRAM Pattern Table?
by on (#159933)
Hello,

I have some programmatically created 2-bit pixel data that is laid out "in order", four pixels to a byte. For the four colors supported (0-3, or %00 - %11) i have them organized one after another. For example, if I had 8 pixels of color 0, 1, 2, 3 repeating in order I would have a 2 bytes as follows: % 00 01 10 11, % 00 01 10 11

Now I want to push this into CHR RAM in the PPU pattern table format. The Pattern table, however, is not laid out in order. It is laid out in "planes" of 8 bytes, such that the high bit of each pixel is supposed to be set in a byte 8 bytes later as described here: http://wiki.nesdev.com/w/index.php/PPU_pattern_tables

Perhaps this is a clearer example. Given two input bytes as follows:

% 01010101, % 01010101

I need the output to be

%11111111, %00000000 (the order isn't that important as I can easily switch them, but if I understand the pattern table correctly, the low pixel bits go in the first byte and the high pixel bits go 8 bytes later, so in this case the 0's go 8 bytes later.)

I have been working through various schemes to cut up the bits and put them in the right place. The idea is to take the odd bits of the two input bytes and put them into one output byte and put the even bits into the other output byte. I have (barely) managed to get this to work by looping through 16 times for the two input bytes, bit shifting and setting bits on alternating output bytes. I then write these output bytes to a buffer with 8 bytes of spacing as required.

But my question is if there is any more efficient way to transform the input bytes to the output bytes. Is there some direct way, without looping (whether unrolled or not) through every bit? What I want is some way to take all the even bits of two input bytes and make one output byte, then take all the odd bits of the two input bytes and make a second output byte.

I've been going through all sorts of ideas but can't seem to find a shortcut. Any ideas?

Thanks,

Splitpane
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159934)
I can't think of math shortcuts, but you can always speed these kind of thing up using look-up tables. But even with look-up tables, you won't get anywhere close to updating a significant number of tiles in a single VBlank, so if this is for real-time animations, I'd strongly advise you to try and generate the data in the final format in the first place. Even if that's slower and less convenient than generating in the format you're using now, the end result might still be faster than converting while uploading to VRAM, and it will certainly allow you to update more tiles each frame.

Here's the fastest I can think of, using 4 256-byte look-up tables:

Code:
   .repeat 8
   lda (PointerA), y
   tax
   lda Plane0Left, x
   ora Plane0Right, x
   sta $2007
   iny
   .endrepeat

   .repeat 8
   lda (PointerB), y
   tax
   lda Plane1Left, x
   ora Plane1Right, x
   sta $2007
   iny
   .endrepeat

Even though this is the fastest way I can think of, it still takes 336 cycles to update a single tile, as opposed to the 176 cycles that a direct copy would need. You might be able to improve these times a bit if you get rid of the indirect addressing, by shuffling the data around instead of storing it linearly.
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159938)
splitpane wrote:
programmatically created

If you've got the program that creates it, why not just revise the program to output in the format needed instead of generating something you have to immediately convert?
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159939)
Thanks for the suggestion, guys! I will explore both the lookup table as well as of course just formatting the data differently in the first place. The reason I have not done that already is that I am doing calculations on the pixel data to create random maps -- I mean "maps" in the cartography sense. These calculations, for example smoothing or averaging neighboring pixels, scaling, de-resing, are considerable simpler if the two bits in each pixel are next to each other. Also, to be honest, I solved the map making algos without yet fully understanding what the Pattern table format was. Yeah, that wasn't too fun... ;)

As for the performance, I do not need to update during VBLANK. Each of these maps is a thumbnail of 8 x 8 tiles that will not change frame to frame, but just once per "level". Changes during VBLANK are only going to be to nametable and attributes, not to the tile data itself. So maybe I can get away with the performance of my current code, it's just that it seems overly complex fragile and I don't like to have any code that I can barely understand myself.
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159940)
If it's happening at a transition where an extra frame or two of delay wouldn't be missed (e.g. entering a pause screen), I might leave it as-is if it already works? There's nothing particularly wrong about doing it with 16 bit-concatenation operations, it's just slower than it could be.
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159941)
splitpane wrote:
As for the performance, I do not need to update during VBLANK. Each of these maps is a thumbnail of 8 x 8 tiles that will not change frame to frame, but just once per "level".

Then I wouldn't worry about performance at all. Look-up tables are definitely overkill, a waste of 1KB of ROM for an improvement that players will hardly notice. Generating the data in the final format would be ideal, but if that means a complete rewrite of the map generation code, converting the data afterwards is probably the best option.

Quote:
So maybe I can get away with the performance of my current code, it's just that it seems overly complex fragile and I don't like to have any code that I can barely understand myself.

You can do something like this, which I don't think is hard to follow, and should be moderately fast, since it's unrolled:
Code:
   .repeat 8, Index
      lda (Pointer), y
      .repeat 4
         asl
         rol Plane1+Index
         asl
         rol Plane0
      .endrepeat
      iny
      lda (Pointer), y
      .repeat 4
         asl
         rol Plane1+Index
         asl
         rol Plane0
      .endrepeat
      lda Plane0
      sta $2007
   .endrepeat
   .repeat 8, Index
   lda Plane1+Index
   sta $2007
   .endrepeat
   iny

This uses ca65's .repeat command, which can automatically creates a symbol containing the repeat count (starting from 0). If the assembler you're using doesn't have that you just have to initialize the symbol (Index) yourself, and increment it each step.

Anyway, after each tile it would be a good time to check for Y overflows so you can increment the high byte of the pointer when necessary. X is free for you to count tiles and loop over this code.
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159942)
Great feedback and perspective. Thanks a lot to both of you. I will test out to see how my code performs while changing to new levels and will not over-optimize, but will look into the code snippet with the .repeat command, which I was not familiar with to hopefully simplify my code. I am still using nesasm, but will figure it out. Actually I was planning on migrating assemblers at some point anyway...
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159943)
splitpane wrote:
I am still using nesasm

Looking at the help file, it doesn't look like NESASM has a repeat command! Weird... I thought every assembler had this.
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159945)
Edit: Ohh,.. I don't think NESASM supports nested macros.
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159946)
Yeah, well that's a whole other topic. I've been planning to migrate assemblers, but haven't. The "repeat" command is perhaps the motivation I needed. Thanks for the advice.
Re: Bitwise tricks for CHRAM Pattern Table?
by on (#159950)
The straw that broke the camel's back? A lacking rep commend is the least of the worries when it comes to NESASM.