This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Some questions about C in CC65

Some questions about C in CC65
by on (#164636)
I've got some little questions about writing C in CC65:

Somewhere I read: "Use the array operator [] even for pointers"
Why should this be done? And in what situation does it produce a difference?

Are local static variables slower or less effective than global ones? Or is there no difference at all?
Re: Some questions about C in CC65
by on (#164637)
In the current version of cc65, I don't see an obvious difference between the code generated via foo[bar] or *(foo+bar).

Is there some other meaning that I'm forgetting about?
Re: Some questions about C in CC65
by on (#164638)
There is no underlying difference between static local and static global. The only difference is only the enclosing function has access to the local ones. Other than enforcing that restriction, to the compiler they're both global RAM reservations.

As far as the array operator goes, I don't think that's really a correct "rule", but using [] to access array elements by way of a pointer is the normal way to do that anyway.

Were you really planning to write:
*(pointer + 5)
Instead of:
pointer[5]
?

The compiler shouldn't be generating different code for the two, but maybe it does. My point is that few people would even try the former. (Which might be why the compiler could have trouble with it? That's if it really does have a problem with it at all.)

The other part of this is that index 0 is perfectly fine just to use the * operator:
*pointer
Is fine, and normal C style.
pointer[0]
Likely also fine, but kind of unusual?

At any rate, don't make assumptions. If you want to know, compile it and check the resulting assembly. If you don't want to know, it probably doesn't matter much. Performance issues are usually going to come up when you start making for loops trying to operate on a bunch of stuff repeatedly. At that point you'll need more help than just "use this syntax for arrays".
Re: Some questions about C in CC65
by on (#164639)
I think...
With -r compiler command...
cc65 replaces (non-static) local variables with C variables if it can. But...

Quote:
There is some overhead involved with register variables, since the old contents of the registers must be saved and restored.


Apparently, non-static STATIC local variables are faster.

And (I may be wrong), I think [] arrays can index like this... Address, x. Whereas pointers always compile into an indirect (Address), y which is slower.

Edit: I can't seem to find any examples/documentation to back that up.
Re: Some questions about C in CC65
by on (#164640)
dougeff wrote:
Apparently, non-static local variables are faster.

I don't see how that's possible.
Re: Some questions about C in CC65
by on (#164641)
I meant 'static'...I'm typing too fast.
Re: Some questions about C in CC65
by on (#164647)
I ran a few tests...

Static Local Variables code was slightly faster than non-static. Reason =
-Non-static Variables, lda value, sta RAM, lda RAM later when it's called --
-Static Variables, the value is a constant, it only lda from address of constant when it's called.


array [index] vs *(array+index)...

All tests I did where essentially equivalent....

Code:
   ldy     _index
   lda     #<(_array)
   ldx     #>(_array)
   sta    ptr1
   stx     ptr1+1
   lda     (ptr1),y
Re: Some questions about C in CC65
by on (#164657)
rainwarrior wrote:
Were you really planning to write:
*(pointer + 5)
Instead of:
pointer[5]
?

No, but when I do some kind of memcpy or memfill and I wouldn't fill the array from the start, but from an offset, I would pass the value as array + offset instead of &array[offset].
The + operator looks more natural here than the [] operator since the latter one is usually used when you access one item of the array. But in the current case, the expression is still supposed to represent a whole array and not a single value inside the array.

dougeff wrote:
Static Local Variables code was slightly faster than non-static.

That there's a difference between local variables and static local variables, that's clear to me. Both work on a different principle. (That's also why I would never use the compiler switch that transforms all local variables into static ones because that completely alters the meaning of the code.)
My question was purely directed towards global variables vs. local static variables. I wanted to know whether I can use the convenient way of declaring local static variables or if I have to declare them outside the function to gain better performance. But the answers here match what I supposed from reading the assembly output.
Re: Some questions about C in CC65
by on (#164660)
DRW wrote:
I would pass the value as array + offset instead of &array[offset].

Do you understand how accessing data in an array is different than simply computing a pointer value?

Also, if the array in question is static, it might be better to just pass an offset index instead of a pointer. If something is static, try not to hide that fact from the compiler if you can.
Re: Some questions about C in CC65
by on (#164674)
rainwarrior wrote:
Do you understand how accessing data in an array is different than simply computing a pointer value?

I'm not sure what exactly you are referring to, but as far as I know, both are interchangable in C.

So, firstly, you can declare int x[5]; or you can declare int *x; and use the same operators on each of them.

And secondly, the following comparisons are always true:
&x[3] == x + 3
x[3] == *(x + 3)

This is the case, no matter if x is a pointer or an array.

The only two differences are:
1. The declaration. You cannot declare int *x = { 1, 2, 3 };, it has to be int x[] = { 1, 2, 3 };).
2. The sizeof operator returns the actual data type size * array length if the variable was declared as an array. And it only returns data type size if it was declared as a pointer.

So, I know what the difference between [] and * is supposed to be from a logical point of view. But from a technical point of view, they are pretty much identical in C and C++.
You can even declare
int a = 5;
int *b = &a;

and then call b[3] and the compiler will not complain, even though, from a logical standpoint, this pointer isn't involved in anything array-related at all.

So, since value[3] is identical to *(value + 3) and since &value[3] is identical to value + 3 and since you can do all four operations on pointers and arrays, I was asking whether it's true that one of them internally creates a more efficient assembly code.

rainwarrior wrote:
Also, if the array in question is static, it might be better to just pass an offset index instead of a pointer. If something is static, try not to hide that fact from the compiler if you can.

My CopyArray and FillArray are general purpose functions.

Yes, I know, accessing a global array is easier since the assembly code just needs to do something like LDA MyArray, Y (or, if even the offset is known, LDA MyArray + 3) and doesn't need to do indirect addressing like LDA (MyPointer), Y.
But I do have various arrays that need copying and filling, so unless I want to put the copy and fill functions multiple times into the ROM, one for each array, I have to use pointers.

(And since I have to use pointers for copying arrays and since sometimes the copying starts somewhere in the middle of the array, I was asking whether pointer = array + offset; is really worse than pointer = &array[offset].)

I do use a fixed array for shifting, though (i.e. shifting all array items one position to the left), since only one array in the whole game needs that. So, the assembly function accesses this array directly instead of using a pointer.
Re: Some questions about C in CC65
by on (#164681)
DRW wrote:
You can even declare
int a = 5;
int *b = &a;

and then call b[3] and the compiler will not complain

That's undefined behavior. An executable compiled from a program including this can do literally anything and the compiler will still conform to the standard. Though the standard does not require a diagnostic for all unspecified or undefined behaviors, a compiler ought to at least warn for an undefined behavior as easy to spot statically as this.
Re: Some questions about C in CC65
by on (#164702)
I know that it's undefined behavior. I'm not saying that I would ever do this in any source code.

I was just pointing out the fact that, syntactically, arrays and pointers are virtually identical in C and C++. Because I don't understand rainwarrior's question:
rainwarrior wrote:
Do you understand how accessing data in an array is different than simply computing a pointer value?

So, I described the equality of pointer and array to find out what exactly he is referring to.
Re: Some questions about C in CC65
by on (#164707)
I was referring to:

array + offset
computes a pointer

*(array + offset)
access data from the array

The former is no more a performance concern than any 16-bit addition.
Re: Some questions about C in CC65
by on (#164710)
Well, those two do something completely different.

It should be clear that my question wasn't directed towards array + offset vs. *(array + offset).
If I replaced one with the other in my code, the performance would be the least of my problems since the program would either not compile at all or be buggy.
Re: Some questions about C in CC65
by on (#164711)
rainwarrior wrote:
*(pointer + 5)
Instead of:
pointer[5]


DRW wrote:
I would pass the value as array + offset instead of &array[offset].


rainwarrior wrote:
Do you understand how accessing data in an array is different than simply computing a pointer value?


I asked this question because you came into this thread asking about the performance differences of two types of array access. When you responded with something that is not array access, I was trying to make sure that you understood that this is a completely different situation than the one you were initially asking about.
Re: Some questions about C in CC65
by on (#164712)
The standard requires that *(p + i) and p[i] have the same effect when defined. But performance is an aspect of quality of implementation, and the standard does not address quality of implementation beyond the minimum needed to conform. A compiler can generate efficient code for one syntax and inefficient code for another, so long as they have the same effect. And historically, there hasn't been enough money to pay developers to improve the quality of implementation of cc65.
Re: Some questions about C in CC65
by on (#164713)
rainwarrior wrote:
I asked this question because you came into this thread asking about the performance differences of two types of array access. When you responded with something that is not array access, I was trying to make sure that you understood that this is a completely different situation than the one you were initially asking about.

Weeell, actually, it is some kind of an array access when you have a look at the overall context:

Code:
memcpy(destinationArray, sourceArray + 3, 5);
/* Copies five items from sourceArray into destinationArray,
   starting with the item at index 3. */


Yes, from a technical point of view, this is mere pointer calculation. But the logical intention is that memcpy takes addresses that are connected to arrays.
Re: Some questions about C in CC65
by on (#164714)
DRW wrote:
Weeell, actually, it is some kind of an array access...

I've clarified already exactly what I meant when I said "array access". I don't see the purpose of making this semantic argument. You already know this alternative interpretation of the words has nothing to do with what I was saying.

Yes of course pointer arithmetic is related to arrays. Why do you bring it up? I thought you started this thread to ask a question about performance.
Re: Some questions about C in CC65
by on (#164715)
And on an in-order, shallow-pipeline architecture such as the 6502, performance for a particular function is easy to see by just trying it both ways, reading the resulting assembly language, and reporting any noticeable discrepancy on the project's GitHub issues page.
Re: Some questions about C in CC65
by on (#164746)
rainwarrior wrote:
Yes of course pointer arithmetic is related to arrays. Why do you bring it up?

Because your statement still tried to make it seem like I asked for something completely different than what I actually wanted:
rainwarrior wrote:
When you responded with something that is not array access


As said by tepples: Pointer arithmetic with pointers that reference simple non-array variables is undefined behavior.
So, even if my second post only referenced pointer address calculation, it is clear that it's still related to arrays simply because there just is no other way where pointer arithmetic is a valid, defined, non-hacky action. (Do you know any?)
Pointer arithmetic in a non-undefined behavior can only occur with arrays, so my second post was a logical follow up to the first one.

The statement
rainwarrior wrote:
you responded with something that is not array access
is simply not true. array + offset is array access. Where else do you find this construction?

That's why I still brought this up: Because you still make it seem like my second post looked like it's a completely different topic than what I asked in my first post.


tepples wrote:
And on an in-order, shallow-pipeline architecture such as the 6502, performance for a particular function is easy to see by just trying it both ways, reading the resulting assembly language, and reporting any noticeable discrepancy on the project's GitHub issues page.

O.k., I'll have a look. I asked because I wanted to find out if there's some definite explanation on why one of them is always and definitely worse than the other one. But it really seems to be an internal implementation detail.
Re: Some questions about C in CC65
by on (#164750)
DRW wrote:
why one of them is always and definitely worse than the other one
I really don't see that. (using -Oirs and --add-source)
declarations in global scope:
char * x;
char y[10];
char z;

generated code:
Code:
;
; foo(&x[z]) , foo(&y[z]) , foo(x+z);
;
        lda     _x
        ldx     _x+1
        clc
        adc     _z
        bcc     L0010
        inx
L0010:  jsr     _foo
;
; foo(y+z);
;
        lda     _z
        clc
        adc     #<(_y)
        tay
        lda     #$00
        adc     #>(_y)
        tax
        tya
        jsr     _foo
where, for some reason, the slower form occurs only when adding an offset to an array.

When using fixed offsets, it's solely a function of whether the compiler can know the location the memory at compile time, or whether it has to dereference the pointer at runtime:
Code:
;
; foo(&x[3]), foo(x+3);
;
        lda     _x
        ldx     _x+1
        clc
        adc     #$03
        bcc     L0007
        inx
L0007:  jsr     _foo
;
; foo(&y[3]), foo(y+3);
;
        lda     #<(_y+3)
        ldx     #>(_y+3)
        jsr     _foo

Re: Some questions about C in CC65
by on (#164754)
lidnariq wrote:
where, for some reason, the slower form occurs only when adding an offset to an array.

If it used the immediate y in the former construction it could be faster, but curiously in this case both are roughly the same number of cycles? (Off by one depending on branch.)
Code:
;
; foo(&x[z]) , foo(&y[z]) , foo(x+z);
;
        lda     _x         ; 4
        ldx     _x+1       ; 8 (+4)
        clc                ; 10 (+2)
        adc     _z         ; 14 (+4)
        bcc     L0010      ; 16/17 (+2/3)
        inx                ; 18 (+2), 13 bytes
L0010:  jsr     _foo
;
; foo(y+z);
;
        lda     _z          ; 4
        clc                 ; 6 (+2)
        adc     #<(_y)      ; 8 (+2)
        tay                 ; 10 (+2)
        lda     #$00        ; 12 (+2)
        adc     #>(_y)      ; 14 (+2)
        tax                 ; 16 (+2)
        tya                 ; 18 (+2), 13 bytes
        jsr     _foo

So really, you've highlighted a missed optimization opportunity here for the compiler, but at the same time you've also demonstrated that all versions tried are pretty much the same speed.

Either way doesn't seem to make a significant performance difference on this compiler, i.e. not worth worrying about (at least in the kind of "best practices" question that DRW was asking).


DRW wrote:
array + offset is array access

I already defined what I meant by "array access" in the context I said it. You're clearly not confused about what I meant at this point, so I don't know what you're trying to tell me here. Of course the words "array access" could mean other things. All words can mean other things, but I didn't mean that when I said it, and you know this, don't you?
Re: Some questions about C in CC65
by on (#164755)
Quote:
memcpy(destinationArray, sourceArray + 3, 5);


Quicker =

Code:
Ldx #5
:
Lda (sourceArray + 3), x
Sta destinationArray, x
Dex
Bne :-


Could be done with inline assembly, or as a call to an assembly function.

On a side note, a large project I'm working on...in C...is just loaded with inline assembly. I feel like it's not really in the spirit of "writing an NES game in C"

Another side question...
Do any of you guys use the MACPACK macros? Branch Long, for example.
Re: Some questions about C in CC65
by on (#164761)
dougeff wrote:
Quicker =

This was just a very simplified example to demonstrate my question. I don't actually have an array that I want to fill with the value 5, starting at index 3.

Also, since my memcpy function (I actually have my own one and don't use the one from the C standard library since I never copy more than 255 items, so the size_t of the official memcpy is unnecessary for me) is used several times in the game, I have to use pointers and cannot just put the absolute address there. Unless I implement the same function many times, one time per source and destination array.
(And yes, I know I can use macros even in assembly to type the function only once, but to declare it multiple times with multiple variables. But it would still occupy duplicate ROM space.)


dougeff wrote:
On a side note, a large project I'm working on...in C...is just loaded with inline assembly. I feel like it's not really in the spirit of "writing an NES game in C"

What do you use the inline assembly for?

I was surprised to find out that in my game, I didn't need any assembly for the game logic. The only parts that are written in assembly are the really basic, general things:
NES initialization
NMI
Background update in NMI from a buffer array
Sprite update into the DMA location
General functions like CopyArray, FillArray, ShiftArray
Music (which is an external library anyway)

Everything else is pure C. I never got into the situation where I had to rewrite parts of the general game logic (like MovePlayer or CheckCollision or CalculateNexLevelColumn) in assembly.

Even filling the background array with the correct tile values etc. is done in C (well, with the help of CopyArray and FillArray of course). Only the actual low level PPU update function that gets called during NMI and that reads this background array is written in assembly.

I even had the disadvantage that I have a status bar and parallax scrolling, so I had to split my game logic into three different parts which means of course that I lose some of the rendering time because I don't always manage to fill each of the three sections to the very end with stuff.

And my game has six moving meta sprites at once on the screen:
2 x 2 tiles
2 x 3 tiles
2 x 1 tiles
2 x 5 tiles
2 x 5 tiles
2 x 2 tiles
And it never lags.

So, I really don't understand why people have problems with creating games in C and think it's slow and need to optimize their code by filling it with assembly here and there.

Tell me what you use the inline assembly for. Maybe I have an idea how to improve it.
Re: Some questions about C in CC65
by on (#164763)
DRW wrote:
(And yes, I know I can use macros even in assembly to type the function only once, but to declare it multiple times with multiple variables. But it would still occupy duplicate ROM space.)

dougeff's "inline memcpy" would use less ROM space than a function call to a memcpy. The code he's posted is 10 bytes long. For a function call, the jsr is already 3 bytes, and I'm pretty sure you're not going to be able to fill 2 pointers and 1 size parameter in less than 7 bytes.

I probably still wouldn't do the inline approach just because I tend to leave optimizations like that until later when I've identified a performance problem, but the idea that it would take up more ROM space is incorrect in this case. If the arrays are static the inline version is faster and smaller.
Re: Some questions about C in CC65
by on (#164766)
Quote:
What do you use the inline assembly for?


It's usually stuff like this... indexing arrays.

(edit, removed bad example)

And anything thats time sensitive... Basically anything that happens during V-blank...writes to the PPU.

(second edit)...
Upon review, of "why do I inline assembly?"...I suppose most of my inlined assembly was unnecessary. The only place it makes sense is my UPDATE_PPU loop. The rest could have been written nearly as fast in C...and been easier to read, and shorter to type.

I may be prematurely optimizing...
http://c2.com/cgi/wiki?PrematureOptimization
Re: Some questions about C in CC65
by on (#164804)
dougeff wrote:
It's usually stuff like this... indexing arrays.

What kinds of arrays do you index?

The only loops that I have are the following:
One for moving all the characters by iterating from 0 to CharactersNumber - 1.
One for rendering all the characters.
(In both cases, I need the counting variable throughout multiple functions, so putting it into the X register would be a hazzle anyway.)
One for recalculating the score, which is stored in decimal numbers.
One for calculating how many pixels the character may fall in the next frame (either the full height or just as many pixels as there are between him and the platform below him).

Other than that, there are no counting loops in the action parts of my game. (Level creation isn't time critical, so it doesn't matter that the code isn't the fastest.)

dougeff wrote:
And anything thats time sensitive... Basically anything that happens during V-blank...writes to the PPU.

For this kind of stuff, I wouldn't suggest inline assembly. I would write this in pure assembly.

For example, the background update is nothing more than this:
Code:
_UpdatePpu:
.export _UpdatePpu
   LDX _PpuUpdate
   BEQ @skipUpdatePpu
   LDA _PpuUpdate + 1
   STA PpuCtrl
   LDA PpuStatus
   LDA _PpuUpdate + 2
   STA PpuAddr
   LDA _PpuUpdate + 3
   STA PpuAddr
   LDY #$04
@writeSingleItemLoop:
   LDA _PpuUpdate, Y
   INY
   STA PpuData
   DEX
   BNE @writeSingleItemLoop
   STX _PpuUpdate
@skipUpdatePpu:
   RTS


And the NMI itself is just that:
Code:
Nmi:
   PHA
   TXA
   PHA
   TYA
   PHA
   LDA WaitForNmi
   BEQ @nmiEnd
@nmiStart:
   LDA #false
   STA WaitForNmi
   LDA _PpuMaskValue
   STA PpuMask
   BEQ @nmiEnd
   LDA #<_Sprites
   STA $2003
   LDA #>_Sprites
   STA $4014
   JSR _UpdatePpu
   LDA #$00
   STA PpuScroll
   STA PpuScroll
   LDA #PpuCtrlDefault
   STA PpuCtrl
@nmiEnd:
   JSR FamiToneUpdate
   PLA
   TAY
   PLA
   TAX
   PLA
   RTI


So, no need to clutter your C code with inline assembly. Just use a bunch of dedicated assembly files that include whole functions.

rainwarrior wrote:
dougeff's "inline memcpy" would use less ROM space than a function call to a memcpy. The code he's posted is 10 bytes long. For a function call, the jsr is already 3 bytes, and I'm pretty sure you're not going to be able to fill 2 pointers and 1 size parameter in less than 7 bytes.

O.k., I'll have a further look at it. Maybe it's indeed better.