This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

cc65: Unnecessary code when accessing pointers

cc65: Unnecessary code when accessing pointers
by on (#220158)
Inspired by Banshaku's recent threads, I did some analyzing and I found out that the cc65 compiler is not very efficient when it comes to pointer access, even though this has nothing to do with the architecture and could easily be avoided, if I'm not mistaken.

When I have this simple code snippet:
Code:
extern unsigned char *pNumber;
#pragma zpsym("pNumber")

void __fastcall__ Test(void)
{
    *pNumber = 5;
}

then this is what the compiler turns it into:

Code:
   lda     _pNumber+1
   sta     ptr1+1
   lda     _pNumber
   sta     ptr1
   lda     #$05
   ldy     #$00
   sta     (ptr1),y
   rts

My own pointer is clearly declared as being located in the zeropage:
Code:
#pragma zpsym("pNumber")
--> .importzp   _pNumber

And yet, the compiler feels the need to always copy the pointer values to its own pointer instead of simply doing this:
Code:
   lda     #$05
   ldy     #$00
   sta     (_pNumber),y
   rts

Why is this the case at all? Is there any technical reason for it or is it simply an oversight by the programmer who created the parser?

Is there any way to get the compiler to change this behavior without adding inline Assembly manually?

I compiled with
cc65 -O Test.c
and the situation is the same in the old cc65 from cc65.org as well as the newer version from github.


By the way, if you do more than one variable access, like this:
Code:
    *pNumber = 5;
    *pNumber = 6;

Guess what:
Code:
   lda     _pNumber+1
   sta     ptr1+1
   lda     _pNumber
   sta     ptr1
   lda     #$05
   ldy     #$00
   sta     (ptr1),y
   lda     _pNumber+1
   sta     ptr1+1
   lda     _pNumber
   sta     ptr1
   lda     #$06
   sta     (ptr1),y
Re: cc65: Unnecessary code when accessing pointers
by on (#220159)
I just avoid using pointers like this.

I only used pointers to access individual enemies, and I sent it as a parameter to a function that I wrote a function in assembly to process drawing that enemy's sprite.

I didn't write it until I needed to save cycles.

So, anything that takes too many cycles, I rewrote in assembly, as a fastcall function.

So basically, it translates to...

function(&enemy1);

lda lowbyte.enemy1
ldx highbyte.enemy1
jsr function
Re: cc65: Unnecessary code when accessing pointers
by on (#220160)
dougeff wrote:
I just avoid using pointers like this.

Well, sometimes you can't avoid using pointers.

(Of course, my current example of accessing a single number through a pointer would be nonsense in a real situation, but it was just a simple minimalistic example to demonstrate the concept.)

For example, my new game will have a whole bunch of enemies, so you cannot program each enemy behavior individually.
Instead, I created a script-based function. It reads the first item from an array and depending on the contents, it reads the next values in a certain way.

For example:
If the current array value is "Move forward", then read the next value as "direction" and the value after that as "number of tiles".
If, instead, the current value is "Wait", read the next value as the number of frames to wait.

Etc.

Same with the level buildup function: Each screen is stored in an array of arbitrary size because each screen can have an arbitrary number of background objects, NPCs, enemies etc. So, I need a pointer to iterate through it until the pointer reads the screen end byte.


How would you do these things without using pointers?


dougeff wrote:
So, anything that takes too many cycles, I rewrote in assembly, as a fastcall function.

Well, yeah, writing directly in Assembly is always the best solution, but not wanting to do this is also the thing that's pretty much the reason why people use C to begin with.

And in the current situation, we're not even discussing anything that a C compiler cannot optimize because of the architecture.
In the moment, it's simply the question: Why does the compiler always copy the pointer to its own pointer? Is there any reason for it? And can it be avoided (either by command line options or by a certain code style that we simply remember to always apply to C programs for the NES)?
Re: cc65: Unnecessary code when accessing pointers
by on (#220161)
Well, I used to write inline assembly just like your na_th_an's example*, but I find it "ugly" to see C code with lots of assembly.

You could write a macro that inserts inline assembly to make it "pretty" and more C like.

edit
*example
https://github.com/mojontwins/MK1_NES/b ... enengine.h
Re: cc65: Unnecessary code when accessing pointers
by on (#220163)
DRW wrote:
Why is this the case at all? Is there any technical reason for it or is it simply an oversight by the programmer who created the parser?

I wouldn't expect any compiler to generate optimal code in all scenarios. If I was writing a code generator I, too, would definitely start by handling the general case (in this case, a pointer from anywhere in the memory space), and only then start thinking about case-specific optimizations like this.

(By the way, no compiler would be doing optimizations like this in the parsing phase. Parsing simply checks the input against the grammar of the language.)
Re: cc65: Unnecessary code when accessing pointers
by on (#220168)
The compiler lacks optimizations for this case. Nothing you can do, except write a patch.
Re: cc65: Unnecessary code when accessing pointers
by on (#220172)
I avoid using pointers in cc65 as well, as I know they tend to behave worse than arrays. Sometimes you have to, as pointed. But it's fun how you better use array access when possible when targetting the 6502 via cc65, but you better use pointer based access when possible when targetting the Z80 via z88dk or SDCC. Sometimes porting is a nightmare because of this :-D
Re: cc65: Unnecessary code when accessing pointers
by on (#220189)
@DRW

I checked the code regarding the array of structure and saving the reference was not so bad BUT accessing the data that is referenced by the pointer (2 arrays) causes the compiler to move the data inside PTR1 even though it had the information just before in the last statement.

I guess even though it looked "nicer" code wise at first, I will avoid that pattern after all. I do not really need the array of structures, it just looked better to me.
Re: cc65: Unnecessary code when accessing pointers
by on (#220208)
Yeah, looks like every pointer access of any kind does that.

Unfortunately, I still need pointers if a character has a certain movement pattern that is stored in an array.

I wrote some macros for this kind of stuff now, like this:
Code:
#define AsmSetVariableFromZpArrayPointer(variable, zpArrayPointer, index)\
{\
   __asm__("LDY %v", index);\
   __asm__("LDA (%v), Y", zpArrayPointer);\
   __asm__("STA %v", variable);\
}
Re: cc65: Unnecessary code when accessing pointers
by on (#220260)
Just to chime in because I had this same issue with my project: yes, cc65 generates terrible code for pointers. Anything using pointers in a loop will probably need to be written in assembly.

In Robo Ninja Climb, I had a simple loop with some pointers that literally used 80% of a frame with cc65's version. Rewriting in assembly with a tiny bit of optimization dropped it to less than 5% of my frame.
Re: cc65: Unnecessary code when accessing pointers
by on (#220292)
Here's another strange cc65 behavior:

This:
Code:
dest = (src + 3) >> 2;

gets turned into this:
Code:
   ldx     #$00
   lda     _src
   jsr     incax3
   jsr     shrax2
   sta     _dest

Why doesn't the compiler simply use LSR?
It creates perfectly fine code when you turn the shift operator around:
Code:
   lda     _src
   clc
   adc     #$03
   asl     a
   asl     a
   sta     _dest

And if you use the right shift operator, but remove the + 3, then it's fine as well:
Code:
   lda     _src
   lsr     a
   lsr     a
   sta     _dest
Re: cc65: Unnecessary code when accessing pointers
by on (#220294)
That's actually correct. The temporary result src + 3 is implicitly a 16-bit int. The high bits of the result can matter when you shift them down, but they won't matter when you shift them up. Think of (255+3)>>2.

How does it deal with:
Code:
dest = (unsigned char)(src + 3) >> 2;
Re: cc65: Unnecessary code when accessing pointers
by on (#220296)
Is this also incorrect?
Code:
clc
lda src
adc #3  ; C:A ranges from 3 to 258
ror a
lsr a
sta dest
Re: cc65: Unnecessary code when accessing pointers
by on (#220297)
tepples wrote:
Is this also incorrect?

No, that's fine, but that's a whole new class of optimization that you've ordered here. (Something about keeping track of not just 8 and 16 bit results, but 9 bit as well...)
Re: cc65: Unnecessary code when accessing pointers
by on (#220298)
Is there any way I can force the compiler to treat this as a byte?
Re: cc65: Unnecessary code when accessing pointers
by on (#220299)
DRW wrote:
Is there any way I can force the compiler to treat this as a byte?

Only the result of (src+3) matters, you don't have to cast everything else. Cast has higher precedence than >> or most operators so you don't need extra parentheses either (unless you prefer them to make the order clear.)

This works for me:
Code:
;
; i = (unsigned char)(j+3) >> 2;
;
   lda     _j
   clc
   adc     #$03
   lsr     a
   lsr     a
   sta     _i
;

(i and j as zpsym)
Re: cc65: Unnecessary code when accessing pointers
by on (#220300)
Thanks. Yeah, this seems to work.
Re: cc65: Unnecessary code when accessing pointers
by on (#220308)
Since we are talking about unnecessary code for pointers, I could talk about one of the result of my tests. One of my function that is processing intensive data requires to be done in asm. The parameters are for an entity that we put in a buffered list so it will be set later in the OAM when parsing is over. Instead of passing the parameters to the function or one my one with some with some ZP variables, what I do is I share a block of memory from the asm side and map it on the C side with a struct. This way all parameters, pointer included, are accessed with an indexer and cc65 doesn't use it's ptr1 thingy.

The c code would look that way when imported:

Code:
someheader.h

typedef struct {
    char x;
    char y;
    char foo;
    const char* data;
} myParmeters_t;

extern myParameters_t addToBufferParams;
#pragma zpsym("adToBufferParams");

void __fastcall__ addToBuffer(void);


when used:
Code:
   mycode.c

   // ---- begin addToBuffer -----
   addToBufferParams.x = actor.x;
   addToBufferParams.y = actor.y;
   addToBufferParams.foo = actor.foo;
   addToBufferParams.data = actor.currentFrame;
   addToBuffer();
   // ----- end addToBuffer ---------


Finally, the assembler would look something like this:
Code:

.export _addToBuffer := subAddToBuffer

.exportzp _addToBufferParams := zpParams


.segment "ZEROPAGE"
zpPrams:    .res 10   ; this is a shared buffer for parameters

.segment "BSS"
bufferedList  .res 60 ; some list of entitites

.segment "CODE"

;->BEGIN----------------------------------------------------------------------
; add to entity buffer
;
; Note: uses shared parameters (etc etc)
;
.proc subAddToBuffer
;---------------- Parameters definitions ----------------
.scope local
     posX = zpParams
     posY = zpParams+1
     foo  = zpParams+2
     data = zpParams+3
.endscope
;---------------------------------------------------------

     ; ... some code before
     lda local::posY
     ; do some processing
     sta bufferedList,x
     ; .... mode code here

     rts
.endproc
;-<END------------------------------------------------------------------------


The code generated on the C side access the shared buffer with indexers (_addToBufferParams,x) for all parameters so it should be fast enough. But, it may be possible that different structs may generate different access code so more testing would be required.

One thing that may not be relevant to this conversation but found interesting is that cc65 doesn't use signed char by default: everything is unsigned unless you pass a parameter to the compiler. Which means, unless you want to write portable code (which is maybe not that possible with the nes), you don't have to write unsigned char since char is unsigned. My definition of variable were becoming long with all those const that I may decide to remove the unecessary unsigned for now. I'm just used to write it out of habit.

edit:
I confused it with some other tests, it was not with an indexer but like this:

Code:
; addTobufferedListParams.data = hero.frame.current;
518 ;
519         .dbg    line, "src/example1.c", 201
520         lda     _hero+7+1
521         sta     _addTobufferedListParams+3+1
522         lda     _hero+7
523         sta     _addTobufferedListParams+3


edit2:
Updated the asm code since it did something else. This code is written by end for example only.
Re: cc65: Unnecessary code when accessing pointers
by on (#220311)
Yeah, I immediately wondered about your statement that simple struct members are accessed with an indexer. Since the location of the member is known at compile time, it's of course simply a +.


I didn't know that char was always unsigned in cc65. I would say this goes against the C standard.

In my case, it doesn't really make a difference. One of the first things that I included into my C code was:
Code:
typedef unsigned char byte;
typedef signed char sbyte;
typedef byte bool;
#define false 0
#define true 1


By the way, you should be really careful about sharing struct data with Assembly code. Because you always need to make sure that the data actually matches between the two.

I've had a similar issue where I wanted to use my struct for the current character data in Assembly, but it's not possible to actually export the member names as constants, so that they can be included in Assembly.

That's why I'm using inline assembly whenever I need the performance of Assembly with the data of a struct:
Code:
/* C version: */
En = Player.Energy;

/* Inline Assembly in C: */
__asm__("LDA %v + %b", Player, offsetof(struct GameCharacter, Energy));
__asm__("STA %v", En);
Re: cc65: Unnecessary code when accessing pointers
by on (#220322)
DRW wrote:
I didn't know that char was always unsigned in cc65. I would say this goes against the C standard.

The signedness of the char type is implementation-dependent; that means it can be either signed or unsigned, and the implementer of the compiler is free to choose one. Oddly enough, whatever the signedness, it is not equivalent to the corresponding signed/unsigned char. That is, char, unsigned char and signed char are always distinct types.

You may want to use types in the stdint.h header, if you want portable integers of known sizes you can use these handy typedefs: int8_t, uint8_t, uint16_t etc.
Re: cc65: Unnecessary code when accessing pointers
by on (#220323)
https://en.cppreference.com/w/c/language/arithmetic_types wrote:
char - type for character representation. Equivalent to either signed char or unsigned char (which one is implementation-defined and may be controlled by a compiler commandline switch), but char is a distinct type, different from both signed char both unsigned char

So which one is it? Equivalent to either one of the two or always distinct from both?
Re: cc65: Unnecessary code when accessing pointers
by on (#220326)
Regarding c65, I think it is mentioned here:

http://cc65.github.io/doc/cc65.html#toc7.15
Quote:
7.15 #pragma signed-chars ([push,] on|off)

Changes the signedness of the default character type. If the argument is "on", default characters are signed, otherwise characters are unsigned. The compiler default is to make characters unsigned since this creates a lot better code. This default may be overridden by the --signed-chars command line option.

The #pragma understands the push and pop parameters as explained above.

As for standards, is seems to be c89 with a few things from c99:
Quote:
--standard std

This option allows to set the language standard supported. The argument is one of

c89

This disables anything that is illegal in C89/C90. Among those things are // comments and the non-standard keywords without underscores. Please note that cc65 is not a fully C89 compliant compiler despite this option. A few more things (like floats) are missing.
c99

This enables a few features from the C99 standard. With this option, // comments are allowed. It will also cause warnings and even errors in a few situations that are allowed with --standard c89. For example, a call to a function without a prototype is an error in this mode.
cc65

This is the default mode. It is like c99 mode, but additional features are enabled. Among these are "void data", non-standard keywords without the underlines, unnamed function parameters and the requirement for main() to return an int.

Please note that the compiler does not support the C99 standard and never will. c99 mode is actually c89 mode with a few selected C99 extensions.


edit:

I forgot about the main point talked, using some mapped values with struct. Yes, there is some risks but it simplify a lot of access so I guess if the person is organized it should be usable. Of course, when there is a bug with overlapping values then the fun begins .. ^^;;; I guess it a compromise between speed, usability and risk of weird bugs.
Re: cc65: Unnecessary code when accessing pointers
by on (#220912)
I'm a bit late to the party, but register variables can be a good solution to the original problem.

For instance, this function avoids copying the pointer into a scratch location. You can mark function arguments as register too, and it generates slightly different code.
Code:
void bar(void){
   register u8 *foo;
   *foo = 5;
}


Accessing the pointer is cheaper now, but the downside is there is kind of a lot of code (like 20 bytes) associated with saving and restoring register variables. So use them carefully.

Code:
;
; register u8 *foo;
;
   lda     regbank+14
   ldx     regbank+15
   jsr     pushax
;
; *foo = 5;
;
   lda     #$05
   ldy     #$00
   sta     (regbank+14),y
;
; }
;
   ldy     #$00
   lda     (sp),y
   sta     regbank+14
   iny
   lda     (sp),y
   sta     regbank+15
   jmp     incsp2