This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

6502 code density.

6502 code density.
by on (#233850)
So this is a bit of a weird rant maybe, but what exactly makes 6502 code density so bad?

So there is a lz4 -> vram implementation that came with some version of neslib that I used to use. It compiled to 380 bytes, which seemed pretty good considering the amount of ROM it saved. When I started some GBdev, my first project was to make a lz4 decompression routine. My first (very rough) attempt was 250 bytes! Granted there is a bunch of complexity in the NES version to deal with PPU memory, but not *that* much. I then discovered gblz4 which touted a ~70 byte decompression routine although it used a modified lz4 format. I spent some time crunching my routine down, and eventually got a vanilla lz4 decoder in 67 bytes by by relaxing the "no memory use" constraint of gblz4.

380 bytes seemed like a lot compared to 67 bytes... Of course I had to try and reduce the size of the NES version now. :p After discussing compression in a different thread, I was inspired to whittle a few more bytes off tonight. It's now down to ... 250 bytes. (sad trombone) Not exactly fair to compare them 1:1 since the NES version has a pair of trampoline functions to allow it to work with either regular RAM or VRAM, and the VRAM. (Gameboy has no need for this, RAM is RAM) Still... it's almost 4 times larger.

My NES lz4 code is here for reference: ... xler/lz4.s ... _to_vram.s

So what is it that makes 6502 code so low density? I find GBZ80 much more frustrating to program. No indexing, a maddening lack of instruction orthogonality, etc. On the other hand, it has 16 bit registers, load + increment instructions, 16 bit increment instructions, conditional return, etc.
Re: 6502 code density.
by on (#233854)
6502 has more register-memory operations that take 2 bytes (one for the opcode and one for a zero page address), whereas 8080 has more register-register operations that can be encoded in 1 byte. Instead of needing to reference local pointer variables ptr1 and ptr2, you get de and hl. Even if your local variables overflow BCDEHL, it's easier to spill registers to the stack so long as the variables' lifetimes are fairly nested, as you can push or pop a register pair at a time.

LZ77-family data decompression is a largely sequential workload, and 8080 with its 16-bit increments excels at sequential things. What makes 6502 lose in your test is that you aren't doing anything that would need random access to struct fields. Dictionary compression (such as metametatiles) might not be a huge 8080 win either.

To eliminate separate decompressors targeting RAM and VRAM, I usually design decompressors on NES to feed whatever generic VRAM transfer buffer mechanism I've set up.
Re: 6502 code density.
by on (#233875)
because the Z80 has Micro code, and was designed in an era when RAM was stupidly expensive and so making the CPU more complex and more expensive, but working on making code density tighter made a lot of sense. hence instructions like LDIR vs lda XXXX,x sta XXXX,x dex bne. The 6502 was designed to be $5, and to do it, needed to drop the complexities and silicon cost of microcode and made a model that hits RAM each time. When it was designed RAM was getting cheaper and cheaper, so you could offset RAM cost for the CPU cost. see CISC vs RISC wars
Re: 6502 code density.
by on (#234048)
How many opcodes have you used?
Re: 6502 code density.
by on (#234081)
The Game Boy CPU is essentially an 8080 with Z80's enhanced bit shifting operations, and the 8080 is an 8008 with a stack pointer and 16-bit increment, decrement, and add hl instructions.

Intel 8008: Commissioned 1970, released 1972
Intel 8080, a cleaned up 8008: Released 1974
Motorola 6800: Released 1974
MOS Technology 6502: Released 1975 by Motorola veterans
Zilog Z80: Released 1976 by Intel veterans
Sharp SM83: Unknown; incorporated into LR35902 SOC in 1989