This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

cc65: Array access with int index even if it should be abyte

cc65: Array access with int index even if it should be abyte
by on (#239906)
I'm tying to access an array with a calculated index.

Variables:
Code:
unsigned char x, y, index, value;
unsigned char array[100];

This code of course transforms the index into an integer and therefore accesses the array in a complicated way, by copying it to ptr1, using ADC to calculate the index and all that stuff:
Code:
void Test1(void)
{
    array[y * 16 + x] = value;
}

This one makes the array access simple since the index is just a byte:
Code:
void Test2(void)
{
    index = y * 16 + x;
    array[index] = value;
}
-->
Code:
   ldy     _index
   lda     _value
   sta     _array,y

However, these two make it complicated again, even though the compiler should know that the index is a single byte:
Code:
void Test3(void)
{
    array[(unsigned char)(y * 16 + x)] = value;
}

void Test4(void)
{
    array[index = y * 16 + x] = value;
}

Is there a way to get the compiler to access the array in the simple way with a byte index, without having to put the index into a variable first?
Re: cc65: Array access with int index even if it should be a
by on (#239907)
Hmm, I did something like that in NESert golfing. The sprite_add function casted index expressions like that to access the oam buffer... I never checked its generated code, but am disappointed to notice now that it promoted them all to 16-bit, even the +0!

It seems that if the expression is anything more complicated than a single term it fails to reduce it from the 16-bit pointer access.

I dunno, the ability of this compiler to optimize stuff like this is pretty poor. I suspect the cases where it does recognize it can use an absolute index are due to an optimizer pass that looks for very specific patterns, and the addition of extra terms breaks up that pattern. Maybe more patterns could be recognized if you wanted to research that and add code to the compiler to address them, but in general it's not a very versatile system.


Trying to do this with C syntax mangling might be squeezing blood from a stone... usually I would just ignore the problem until it comes up as a real performance issue (i.e. why I didn't notice sprite_add wasn't more efficient: it was still efficient enough for that game), and then when it does I rewrite the affected thing in assembly.

Storing the computed index into a char variable that has #pragma zpsym is maybe about as reasonable a workaround as you'll get if your solution is restricted to C-side only.
Re: cc65: Array access with int index even if it should be a
by on (#239908)
Looking at that example I linked, I found:

Code:
// original: generates 16-bit indirect pointer for each store
void sprite_add(uint8 tile, uint8 x, uint8 y, uint8 attrib)
{
   oam[(uint8)(oam_pos+2)] = attrib;
   oam[(uint8)(oam_pos+0)] = (uint8)(y-1);
   oam[(uint8)(oam_pos+3)] = x;
   oam[(uint8)(oam_pos+1)] = tile;
   oam_pos += 4;
}

// better: gets an absolute indexed store for 3 of them, at least
void sprite_add(uint8 tile, uint8 x, uint8 y, uint8 attrib)
{
   i = oam_pos; // i is uint8 with #pragma zpsym
   oam[i] = (uint8)(y-1); ++i; // this one still fails to optimize because the y-1 expression confuses the pattern
   oam[i] = tile; ++i;
   oam[i] = attrib; ++i;
   oam[i] = x;
   oam_pos = i+1;
}


Ultimately though, I think the second solution is kinda worthless. It would have taken much less time to write this function in assembly than trying to deduce how to mangle my C to fit the compiler's optimization patterns, and it would perform better too.

As a shortcut I could even just copy and paste the generated assembly for the bad function, so I wouldn't have to look up how to fetch the parameters off the stack, etc. and just fix the actual stores.
Re: cc65: Array access with int index even if it should be a
by on (#239920)
For the OPs question, I've never found a particularly better way to do it either. I have a global zero page variable 'idx' that I use exclusively as a temporary value for indexes. You can use it with the comma operator to make it a little less obtuse, if a little more obscure. I like doing it this way because I can't forget that it's not really a local variable if I call another function that uses it.

Code:
some_array[idx = expression, idx] = whatever;


You still get a wasted zero page load/store, but I think that's the best you are going to get.

@rainwarrior You can optimize your function in C the same way you do it in assembly with an absolute address + an offset. I use the following pattern all the time. It produces exactly the assembly you would expect.

Code:
// better: gets an absolute indexed store for 3 of them, at least
void sprite_add(uint8 tile, uint8 x, uint8 y, uint8 attrib)
{
   i = oam_pos; // i is uint8 with #pragma zpsym
   (oam + 0)[i] = y-1;
   (oam + 1)[i] = tile;
   (oam + 2)[i] = attrib;
   (oam + 3)[i] = x;
   oam_pos = i+4;
}
Re: cc65: Array access with int index even if it should be a
by on (#239924)
Ah! Hadn't thought of that one. That's almost how it would look in assemble, heh.
Re: cc65: Array access with int index even if it should be a
by on (#239926)
slembcke wrote:
Code:
some_array[idx = expression, idx] = whatever;

That's actually a pretty creative use. Thanks for the suggestion.
Re: cc65: Array access with int index even if it should be a
by on (#239927)
If optimization is important (and it always is) I would write all this in assembly.

Perhaps an inline assembly macro.

I've seen na_th_an do something like this.
Re: cc65: Array access with int index even if it should be a
by on (#239939)
Related tip:
you can use the pseudo variables __A__ and __AX__ to make shuffling data in and out of inline assembly much easier. Quite handy when you pair it with (abuse?) the comma operator.

Dumb example:
Code:
#define DUMB_MACRO(value) (__A__ = value, asm("xor #$0F"), __A__)
my_variable = DUMB_MACRO(45);


Using those pseudo variables make the compiler select the correct way to load the value you wanted into the register instead of needing to do it with those % format chars. Then at the end of the sequence, you tell the compiler the result is already in registers and handle the store too.
Re: cc65: Array access with int index even if it should be a
by on (#239940)
Is xor a valid alias to eor?
Re: cc65: Array access with int index even if it should be a
by on (#239942)
no.

But ca65 has a .XOR operator (for calculations at assemble time of constants)
Re: cc65: Array access with int index even if it should be a
by on (#239952)
I didn't know about __A__. This might come in handy.


By the way, I noticed another thing:
Not only does a calculated array index produce bad code. Even setting a calculated value to the array isn't really effective, even if the array is a byte type.


Stuff like this is fine (we assume all variables are global or static):
array[index] = value;

But this isn't:
array[index] = value + 1;

And this time, the comma operator won't help you:
array[index] = (temp = value + 1, temp);

You have to do this for nice short code:
temp = value + 1;
array[index] = temp;


However, the code is perfectly fine when you use a constant array index since it acts like a regular non-array variable. And for some reason, regular non-array variables don't make such a fuss:
array[5] = value + 1;

Using __A__ doesn't help either:
__A__ = value + 1;
array[index] = __A__;


And type conversion are unhelpfull as well:
array[index] = (unsigned char)(value + 1);
Re: cc65: Array access with int index even if it should be a
by on (#239960)
Ugh. I think the important lesson here seems to be: cc65 isn't very efficient. If performance matters in the section of code you're working on, assembly is the answer.
Re: cc65: Array access with int index even if it should be a
by on (#239962)
calima wrote:
Is xor a valid alias to eor?


Uh... probably not. That's just the sort of shoddy off the top of my head examples I'm known for. LOL
Re: cc65: Array access with int index even if it should be a
by on (#239963)
DRW wrote:
But this isn't:
array[index] = value + 1;

So, something I was saying above, cc65's optimizations don't occur at a high level. It only has one way to generate the base assembly code, and then only after it's already in assembly form it matches patterns in the assembly code to reduce them.

Basically what you're looking at is that the code generated by assigning a value to an array element generates a known pattern. As soon as you put some extra operations on the value to be assigned, more code ends up inside that doesn't match the pattern. The problem is the order in which things are done, I guess. It creates the pointer to the array, constructs the expression, then assigns it. When the code in the middle of this gets more complex, it can't match the pattern anymore.

You can use --debug-opt-output to observe the process if you're curious.

So... if you expect cc65 to ever be able to handle this optimization step, it probably either requires a whole new high-level optimization system (big overhaul) or a bunch of pattern matches for every kind of intermediate expression you can think of (very cumbersome). I feel like the latter isn't worth doing, and the former requires someone with a lot of interest.
Re: cc65: Array access with int index even if it should be a
by on (#239964)
gauauu wrote:
Ugh. I think the important lesson here seems to be: cc65 isn't very efficient. If performance matters in the section of code you're working on, assembly is the answer.

Well, performance matters everywhere in the actual action gameplay. But I still wouldn't want to write all this in pure Assembly. That's why I use inline Assembly when necessary and C for the rest. (And a bunch of low level pure Assembly code files, mostly for general purpose functions or NES-hardware-related things.)

However, arrays and pointers are really the only stuff that I noticed that are turned into unnecessarily complicated code.
Pointers are always copied to ptr1, even if your pointer is in the zeropage itself.
And arrays, as we see now, produce bad code whenever either the index or the assigned value isn't an exact byte value, but something that gets promoted to int due to calculations and stuff.

But other than arrays and pointers, I have yet to see a situation where the generated Assembly code from C is so bad that I'd have to use Assembly to cut down the required ROM size and CPU time by a huge amount.

rainwarrior wrote:
Basically what you're looking at is that the code generated by assigning a value to an array element generates a known pattern. As soon as you put some extra operations on the value to be assigned, more code ends up inside that doesn't match the pattern.

I'm just a bit confused why even the assigned value makes a difference in calculating the array itself.

The array index, o.k., this one makes sense: Having a byte index or an int index makes a huge difference in the processing. And calculations always get promoted to int, so yeah, I understand why array[byteIndex + 1] has an issue over array[byteIndex].

But I just don't understand why the assigned value makes a difference on how the array's offset is calculated.

Well, looks like I have to create a macro:
Code:
/* Only use this if the calculated value is actually calculated at runtime.
   Example: array[index] = variable + 1;
   If you assign an unaltered variable or a constant value known at compile time:
   array[index] = variable;
   array[index] = 1;
   you can access the array directly.
   This macro is only there because the generated code gets large otherwise. */
#define SetArrayValueFromCalculatedValue(array, index, calculatedValue, tempVariable)\
{\
    (tempVariable) = (calculatedValue);\
    (array)[index] = (tempVariable);\
}
Re: cc65: Array access with int index even if it should be a
by on (#239966)
DRW wrote:
I'm just a bit confused why even the assigned value makes a difference in calculating the array itself.

Well, honestly it's not like the operation of cc65's optimizer is at all obvious. It generates the same initial assembly code whether or not -O is used, and then after doing this -O does a series of pattern match steps to refactor the already generated assembly code. It's a bit of a backward approach to the problem, and really weird. It would be much better to start optimizing at a higher level, but I'm not going to fantasize too much about that. This is what the compiler we have does.

If you want to understand what it does, like I said just above, you can use --debug-opt-output to observe the process if you're curious. It will show you exactly what the initial generated assembly code is, and every step it takes to optimize.

If you want an example:
Code:
// starting C line

(oam+3)[oam_pos] = x+5;

; 1. initial generated assembly:

   lda     #<(_oam+3)
   ldx     #>(_oam+3)
   clc
   adc     _oam_pos
   bcc     L02DF
   inx
L02DF:   jsr     pushax
   ldy     #$04
   ldx     #$00
   lda     (sp),y
   jsr     incax5 ; this is the +5 operation, which critically breaks up the optimization pattern (see below)
   ldx     #$00
   ldy     #$00
   jsr     staspidx

;  2. OptAdd5

   lda     #<(_oam+3)
   ldx     #>(_oam+3)
   clc
   adc     _oam_pos
   bcc     L02DF
   inx
L02DF:   jsr     pushax
   ldy     #$04
   ldx     #$00
   lda     (sp),y
   clc
   adc     #$05 ; jsr incax5 replaced with an inline add
   ldx     #$00
   ldy     #$00
   jsr     staspidx

; 3. OptUnusedLoads

   lda     #<(_oam+3)
   ldx     #>(_oam+3)
   clc
   adc     _oam_pos
   bcc     L02DF
   inx
L02DF:   jsr     pushax
   ldy     #$04 ; unused ldx #00 eliminated
   lda     (sp),y
   clc
   adc     #$05
   ldy     #$00 ; unused ldx #00 eliminated
   jsr     staspidx

; 4. OptStackOps

   lda     #<(_oam+3)
   ldx     #>(_oam+3)
   clc
   adc     _oam_pos
   bcc     L02DF
   inx
L02DF:   sta     ptr1 ; jsr pushax / staspidx (temporary pointer on stack) replaced by keeping it in ptr1
   stx     ptr1+1
   ldy     #$02
   lda     (sp),y
   clc
   adc     #$05
   ldy     #$00
   sta     (ptr1),y


At that point, it's out of patterns to match and stops optimizing.

For comparison, without the intervening +5:
Code:
// starting C line

(oam+3)[oam_pos] = x;

; 1. initial generated assembly

   lda     #<(_oam+3)
   ldx     #>(_oam+3)
   clc
   adc     _oam_pos
   bcc     L02DF
   inx
L02DF:   jsr     pushax
   ldy     #$04
   ldx     #$00
   lda     (sp),y
   ldy     #$00
   jsr     staspidx

;  2. OptPtrStore2

   lda     #<(_oam+3)
   ldx     #>(_oam+3)
   ldy     #$02
   ldx     #$00
   lda     (sp),y
   ldy     _oam_pos
   sta     _oam+3,y ; the temporary pointer on stack is eliminated, and the index add is replaced with Y index

; 3. OptUnusedLoads

   ldy     #$02 ; unused ldx #00 eliminated
   lda     (sp),y
   ldy     _oam_pos
   sta     _oam+3,y


So you can see the pattern that failed to apply is called OptPtrStore2.

The optimization patterns are each a function in the cc65 source. They tend to have a good explanation of the operation. Looking up OptPtrStore2:

Code:
unsigned OptPtrStore2 (CodeSeg* S)
/* Search for the sequence:
**
**      clc
**      adc     xxx
**      bcc     L
**      inx
** L:   jsr     pushax
**      ldy     yyy
**      ldx     #$00
**      lda     (sp),y
**      ldy     #$00
**      jsr     staspidx
**
** and replace it by:
**
**      sta     ptr1
**      stx     ptr1+1
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     (ptr1),y
**
** or by
**
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     (zp),y
**
** or by
**
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     label,y
**
** or by
**
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     $xxxx,y
**
** depending on the code preceeding the sequence above.
*/


As you can see, this pattern was written to match specifically a simple fetch from the stack and store. There's two other variations of this type of pattern, you can read about them all in the header.

The problem is simply that each of them can only match a very simple pattern. It's looking for a specific beginning, middle, and end. You can't just start adding arbitrary extra code in the middle (i.e. the expression to be resolved), that new stuff in the middle has to fit a pattern that's known to be safe to optimize. Probably possible to write such a thing, but it's complicated and difficult... so, instead only this bunch of simpler cases were written.

When you put an expression in a temporary variable first, that code gets generated before the array access portion, so doesn't interfere with the simple "generate address, fetch, store" pattern that these are capable of matching.


Anyhow, that's just one example of how to analyze cc65's optimizer. It's really all spelled out by --debug-opt-output, so if you want to know about any specific cases you're dealing with, that's the tool to use.

Still probably easier to just rewrite offending code in assembly as needed* than it is to try and understand whatever this byzantine optimizer is doing, though.

* ...and don't worry about it when it's not needed.
Re: cc65: Array access with int index even if it should be a
by on (#239967)
Thanks for the explanation. I'll have a more detailed look at it to understand how the optimization in compilers works.

rainwarrior wrote:
Still probably easier to just rewrite offending code in assembly as needed* than it is to try and understand whatever this byzantine optimizer is doing, though.

* ...and don't worry about it when it's not needed.

Yeah, that's what I'm doing anyway.
Although, in this case, I don't even need Assembly. I simply need to put a temporary variable in between and it's good enough.

Changing it into Assembly would mean that I also have to write the calculation itself in Assembly, so it would be different code for different situations. And sometimes, it is stuff like this:
array[index] = Tile16XToAbsoluteX(heroTile16X) + (width16Block - 1) * 16 + 1;
Which I don't feel at all like turning into inline Assembly.

But with my C macro, I can still write:
Code:
SetArrayValueFromCalculatedValue(
    array, index,
    Tile16XToAbsoluteX(heroTile16X) + (width16Block - 1) * 16 + 1,
    temp);

The only downside is one single additional STA temp in the generated output code.
Re: cc65: Array access with int index even if it should be a
by on (#239968)
So, like, to address this specific case, you might be able to augment each of the OptPtrStore functions to look for 1 or 2 extra instructions between the fetch and the store. INC (+1), DEC (-1), CLC/ADC (+), SEC/SBC (-), and if found leave them in there. Of course you might look for EOR, ORA, etc. as well, there's a lot of possibilities here.

That might take care of most expressions that are just 2 terms. 3 term expressions will still be unmatchable, but maybe that optimization would be worth writing code for since + a constant is such a common case.

Similarly for the [a+1] case you could try to pattern match the "generate address + add a constant to an index" thing, and other similar cases...

Of course after writing these you must make sure they still passes all the unit tests, and try it out on whatever you can to make sure it doesn't have some unexpected interaction with one of the other 100 optimization patterns that breaks stuff. Testing itself is a lot of work, and if you do happen to find a conflict, every attempt to resolve that multiplies your testing too. :P


Maybe you can see how this creates a combinatorial explosion of work for a fairly marginal gain. In a lot of ways, even if you could write these pattern matchers correctly and they worked, it's just not worth adding that much code to the optimizer to solve the problem backwards like this. The better solution is high level optimization before the assembly begins, but that's a whole project of its own.
Re: cc65: Array access with int index even if it should be a
by on (#239969)
DRW wrote:
Changing it into Assembly would mean that I also have to write the calculation itself in Assembly, so it would be different code for different situations. And sometimes, it is stuff like this:
array[index] = Tile16XToAbsoluteX(heroTile16X) + (width16Block - 1) * 16 + 1;
Which I don't feel at all like turning into inline Assembly.

The quick way is: write it in C, compile, copy paste the generated assembly, then just rewrite the bits you're interested in improving.
Re: cc65: Array access with int index even if it should be a
by on (#239982)
rainwarrior wrote:
Maybe you can see how this creates a combinatorial explosion of work for a fairly marginal gain.

Sure. Besides, I never intended that anybody should actually write an optimization in the compiler. I merely asked about this to find out whether there are tricks to make it smaller by doing some stuff in the C code that I haven't thought of yet.

rainwarrior wrote:
The quick way is: write it in C, compile, copy paste the generated assembly, then just rewrite the bits you're interested in improving.

Yeah, but I also want to keep my code readable. If I intended to clutter everything with inline Assembly, just to save one byte or two, I would write the whole thing in Assembly in the first place.

I mean, if this code:
Code:
Pointer[Index] = Value;
gets turned into this:
Code:
   lda     _Pointer
   ldx     _Pointer+1
   clc
   adc     _Index
   bcc     L0005
   inx
L0005:   sta     ptr1
   stx     ptr1+1
   lda     _Value
   ldy     #$00
   sta     (ptr1),y
then it really pays off if I have a general purpose inline Assembly macro that simply does this:
Code:
LDA Value
LDY Index
STA (Pointer), Y

Especially if I can reuse it many times and the code is still readable if I put it into a macro called
Code:
AsmSetVarFromPtrAtIdxVar(variable, pointer, indexVariable)

But I don't think it's a good idea to analyze every little code line to see if I can turn it into slightly better Assembly than the C compiler does.
After all, there's a reason why I write in C. Because this:
Code:
Value = heroTile16X * 16 + (width16Block - 1) * 16 + 1;
is much more readable than its Assembly version.

Also, I wouldn't even know how to start optimizing the output.

For example, if I had to write the above code in Assembly, I would probably do it like this:
Code:
LDA heroTile16X
ASL
ASL
ASL
ASL
STA tmp1
LDX width16Block
DEX
TXA
ASL
ASL
ASL
ASL
CLC
ADC tmp1
TAX
INX
STX Value

On the other hand, this is what the compiler creates:
Code:
   ldx     #$00
   lda     _heroTile16X
   jsr     shlax4
   sta     ptr1
   stx     ptr1+1
   ldx     #$00
   lda     _width16Block
   jsr     decax1
   jsr     shlax4
   clc
   adc     ptr1
   pha
   txa
   adc     ptr1+1
   pla
   clc
   adc     #$01
   sta     _Value

This doesn't look even slightly like my code, so I have no idea how I should take this one as the base and then convert it into something better. And if I had the patience to write my own code above for everything, I wouldn't have used C in the first place.

So, yeah, not really the place to manually fiddle with one-time code expressions that never get repeated again.
I have a bunch of inline Assembly macros for common performance issues, like pointer and array access. And for most of the rest, I simply swallow the CPU and ROM penalty that is included in C. I mean, in the moment, I can have about 12 characters on screen without any lag, so it looks like the C code isn't that bad.
Re: cc65: Array access with int index even if it should be a
by on (#239985)
Quote:
This doesn't look even slightly like my code


Implied promotion to INT on many C library functions.

Quote:
in the moment, I can have about 12 characters on screen without any lag, so it looks like the C code isn't that bad.


Depends on what kind of game you want. When I made a card game, C greatly sped up dev time. If you want a shoot em up with dozens of objects on screen, I would do it in ASM.
Re: cc65: Array access with int index even if it should be a
by on (#239987)
dougeff wrote:
Quote:
This doesn't look even slightly like my code

Implied promotion to INT on many C library functions.

Yeah, compounded by the fact that it does not attempt to identify that it only needs an 8-bit result at the high level. That reduction is sometimes made after the fact by assembly pattern recognition, but like we saw above it can't really identify 8-bit result patterns from expressions that aren't rather simple.

In many cases breaking an expression into intermediate results might actually help:
Code:
// C code, value/ht/wb are char
value = ht * 16;
value += (wb - 1) * 16;
value += 1;

;
; value = ht * 16;
;
   lda     _ht
   asl     a
   asl     a
   asl     a
   asl     a
   sta     _value
;
; value += (wb - 1) * 16;
;
   lda     _wb
   sec
   sbc     #$01
   asl     a
   asl     a
   asl     a
   asl     a
   clc
   adc     _value
   sta     _value
;
; value += 1;
;
   inc     _value
;
; }
;
   rts


There's not much rhyme or reason to this, though. What works or doesn't isn't very easy to predict.
Re: cc65: Array access with int index even if it should be a
by on (#239988)
dougeff wrote:
When I made a card game, C greatly sped up dev time. If you want a shoot em up with dozens of objects on screen, I would do it in ASM.

This comparison comes up so often: "If you want to use C, you might be able to program a "Pac-Man" clone. For anything else, you should use Assembly."

Well, I programmed a lagless platformer. And I'm in the middle of programming a "Zelda"-like action adventure.
Yes, I had to use some Assembly functions, usually when it comes to array-based stuff.
However, the general gameplay algorithm is still mostly in C.

And I'm pretty sure a shoot'em up would be doable in C, especially since enemies in these games mostly move in fixed patterns and don't even care about the environment. (Unlike in many other games, where each character always has to check for walls.)

So, yeah, whoever is fluent in Assembly should write the game directly in this language. However, it is not true that C only enables you to write simple 1985-style games. I'm pretty positive that a game like "Ninja Gaiden" would largely be possible in C.
Re: cc65: Array access with int index even if it should be a
by on (#239991)
rainwarrior wrote:
Yeah, compounded by the fact that it does not attempt to identify that it only needs an 8-bit result at the high level. That reduction is sometimes made after the fact by assembly pattern recognition, but like we saw above it can't really identify 8-bit result patterns from expressions that aren't rather simple.


The authors of the Standard noted in the Rationale that on commonplace implementations, if the result of an additive, multiplicative, or bitwise is coerced to a type which is not larger, it won't matter if the operation is performed as signed or unsigned(*). For the same reasons, it wouldn't matter if the operand were performed on a type larger than the result type. A good compiler for 8-bit machines really should use a production rules for assignment operators that evaluates the right side as an 8-bit quantity if the left side is likewise, and for a production rule for 8-bit expressions which processes additive, multiplicative, and bitwise operators by evaluating their operands as 8-bit quantities.

(*) The authors of the Standard expected that commonplace implementations would do this even for values between INT_MAX+1u and UINT_MAX, but did not want to require such behavior on platforms where it would be impractical. They didn't think should it necessary to say that commonplace implementations should handle such values in the expected fashion, but gcc exploits the Standard's laxity by generating code that doesn't.
Re: cc65: Array access with int index even if it should be a
by on (#239992)
supercat wrote:
The authors of the Standard noted... A good compiler for 8-bit machines really should...

What I said has nothing to do with the standard. I was talking about cc65. cc65 just doesn't have the capability to recognize it, and the many cases that it manages to generate succinctly are kind of miraculous given what it's working with as an optimizer. What it "should" do is something completely different than what it does, but I doubt this will change any time soon, unfortunately. (Fortunately, it's still pretty useful in this compromised state anyway.)
Re: cc65: Array access with int index even if it should be a
by on (#239993)
DRW wrote:
Yes, I had to use some Assembly functions, usually when it comes to array-based stuff.
However, the general gameplay algorithm is still mostly in C.


If 25% of the program's non-vblank time is spent waiting for vblank, 70% is spent in 10% of the code, and the remaining 5% of the time is spent running the other 90% of the code, the performance of that 90% of the code would simply not matter unless something caused it to run more than five times as slowly. Some games have their execution time spread out among large parts of the program, but others have it concentrated in a few tiny places. Micro-optimizing parts of the code that dominate execution time can be much easier and more effective than worrying about the performance of anything else.
Re: cc65: Array access with int index even if it should be a
by on (#240011)
DRW wrote:
So, yeah, whoever is fluent in Assembly should write the game directly in this language. However, it is not true that C only enables you to write simple 1985-style games. I'm pretty positive that a game like "Ninja Gaiden" would largely be possible in C.


As somebody who likes using C, I partially agree. If you're willing to optimize a few routines in assembly, write your C carefully and evaluate bottlenecks, and use a library of assembly routines for interacting with the hardware, you could do Ninja Gaiden in C.

That said, you can definitely run into trouble using C if you aren't careful. Both my Robo-Ninja Climb and Super Homebrew War were written mostly in C (For the sake of development time), but my first pass was too slow in both of them, requiring some optimization (both re-writing some C parts to better take advantage of how cc65 does things, and also rewriting some critical bits into assembly)
Re: cc65: Array access with int index even if it should be a
by on (#240012)
I've skimmed some of the cc65 optimizer code, and there's really not that much to it. It simply doesn't recognize many cases where it can optimize, and those are the only places where it really seems to relax the spec.

I'm not going to tell you "just write it all in assembly", because I don't. I just do what you do: Spend some minimal effort to trick the compiler into doing what you want. Once you know what you really want, rewrite parts using inline or regular assembly. Rinse and repeat.