This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Push arguments to stack

Push arguments to stack
by on (#169728)
Do you think this is a good idea? I was trying to think of a way to push a bunch of numbers (passed values) to the stack (before a jsr), but I wasn't sure how to return from subroutine without alot of stack manipulation. But then I thought of this strange idea...

Code:
  jsr Argumentpush
   Jmp Overarguments
Argumentpush:
   Lda #
   Pha
   Lda #
   Pha
   Lda #
   Pha
   Lda #
   Pha
   Jmp ToActualSubroutine ;rather than jsr
Overarguments:


So this would push the return address to stack first, then push all the arguments on top of that. Presumably, the actual subroutine would pull all the arguments, and use them, and then RTS would take you to the 'Jmp Overarguments' which would be the next bit of code.

I'm not sure if this is a good idea, but I have been thinking about this problem for a long time.
Re: Push arguments to stack
by on (#169730)
You don't need "Overarguments" if you lift "Argumentpush" out of the subroutine and put it somewhere else.

The stack manipulation technique doesn't seem too bad (see below), but I like your method better. Maybe there's a better way though. What does CC65 do for arguments?
Code:
; Push the arguments
lda #1
pha
lda #2
pha
lda #3
pha
jsr subroutine
; Pop the arguments
txa
axs #5
txs
rts

subroutine:
tsx
; Access the arguments without popping
lda $0100-3,x
lda $0100-4,x
lda $0100-5,x
rts
Re: Push arguments to stack
by on (#169731)
I've read this 3 times and I don't quite understand what the "problem" is that you're trying to solve (and the code doesn't really indicate to me anything either).

I keep wanting to say one of these things, but I don't know what's applicable:

a) it sounds like what you're wanting is a jump table (e.g. jmp ($addr))
b) it sounds like what you're wanting is to push the return address (minus 1 -- look up how JSR/RTS works in actual detail) of the next routine to go to so that you can go there using an RTS (i.e. avoids the use of JMP)
c) it sounds like what you're wanting is to not use the stack for storing this data at all, but use RAM or zero page instead.

(b) and (c) seem most likely. The negative to (b) might be that depending on how you do this, you may have to deal with handling 16-bit numbers (e.g. a CLC/SBC #1 might not be enough; consider addresses that are right at a page boundary, e.g. $9000-1 = $8FFF). The negative to (c) is that a bunch of STAs take more time (cycles) than a bunch of PHAs (speaking generally here), and STAs are also larger (PHA is 1 byte, STA zp is 2 bytes, STA abs is 3 bytes).

Possibly an amalgamation of one or more of the above proposals will suffice.

P.S. -- Be aware pubby's above explanation uses unofficial opcodes (axs #5).

What you're dealing with (again, I'm not sure what all the issue here is so I'm making some assumptions) is a lot easier on 65816. There are addressing modes that can help with this (stack-indexed, and things like jsr ($addr,x)), and opcodes that can help with this (PEA/PEI). Even the 65c02 has some things that can make one of those easier (e.g. jmp ($addr,x)).
Re: Push arguments to stack
by on (#169732)
pubby wrote:
You don't need "Overarguments" if you lift "Argumentpush" out of the subroutine and put it somewhere else.

That could hurt readability though, since the values would be written somewhere else than place where they're used. In ca65 you could solve this by creating a segment for these "trampolines", and temporarily switch to them just to setup the parameters.
Re: Push arguments to stack
by on (#169733)
pubby wrote:
What does CC65 do for arguments?
CC65 uses two bytes of zeropage for a stack pointer in software, that is used for all the normal C stack-shaped things, while subroutine calls still use JSR and RTS
Re: Push arguments to stack
by on (#169734)
koitsu wrote:
I've read this 3 times and I don't quite understand what the "problem" is that you're trying to solve

He's talking about passing arguments to subroutines through the stack, but when you push the arguments first and then JSR, you can't simply PLA the parameters within the subroutine because the return address is in the way. What he did was find a way to push the return address first, and then the arguments. It's pretty clever, and some may prefer this over manipulating the stack pointer, which is a more advanced and dangerous technique.
Re: Push arguments to stack
by on (#169735)
tokumaru wrote:
koitsu wrote:
I've read this 3 times and I don't quite understand what the "problem" is that you're trying to solve

He's talking about passing arguments to subroutines through the stack, but when you push the arguments first and then JSR, you can't simply PLA the parameters within the subroutine because the return address is in the way. What he did was find a way to push the return address first, and then the arguments. It's pretty clever, and some may prefer this over manipulating the stack pointer, which is a more advanced and dangerous technique.

Thank you -- this makes sense now! In this case, since we're sharing approaches/solutions: I tend to just use some ZP or RAM space for this situation (i.e. option (c) in my examples).
Re: Push arguments to stack
by on (#169736)
I don't see anything particularly wrong with the style.

PLA is 4 cycles, LDA $100+N, X is also 4, but also has the overhead of TSX and consuming the X register. PLA is also 2 bytes smaller. I think these are valid but minor reasons to consider this technique, iff you need your parameters on the stack and you can consume them in a specific non-overlapping order. If you've got to store any to RAM, you might as well have just used that RAM to pass the argument.

I find the stack most useful for recursive functions, or inefficiently resolving RAM-overlap between functions. I would not normally be passing parameters on the stack, and usually it wouldn't be in performance sensitive code. The fastest way to pass parameters not in a register is the zero page, as koitsu suggested. This is also what I mean about RAM-overlap; if due to lack of foresight two functions need to use the same RAM variables for parameters and one needs to call the other, I might use the stack to temporarily save them around the call.

I don't think the "RTS trick" is applicable here, it is maybe a similar idea but isn't the same problem, or really applicable to this one.

Anyhow, I don't see anything wrong with it, but I can't think of any good application for it either. A C compiler might have a use for more efficient stack usage, if you were writing one. I don't really see accessing variables on the stack as dangerous or error prone though.
Re: Push arguments to stack
by on (#169737)
Something I've considered, that's sort of a similar kind of optimization, is finding a way to minimize the size of argument preamble for repetitive function calls with immediate arguments. This is a size over speed optimization, though, and you really need a large number of arguments and high repetitive usage to make this worth doing. As such, I've yet to actually find it useful- every time I've considered it, it was not a desirable trade.

Code:
.macro QUICK_BLOO a,b,c,d,e,f
jmp :+
.byte a,b,c,d,e,f
:
jsr bloo
.endmacro

; ...

bloo:
    ; fetch return value from stack, find the arguments 6 bytes back (+2 for the jsr instruction minus increment)
    tsx
    lda $100+1, X
    sec
    sbc #<(6+2)
    sta ptr+1
    lda $100+2, X
    sbc #>(6+2)
    sta ptr+0
    ; now we can access the inline arguments like:
    ldy #3
    lda (ptr), Y ; fetch argument 3
    ; ...
    rts
Re: Push arguments to stack
by on (#169738)
I agree with koitsu and rainwarrior, for most NES applications, using ZP for arguments is the simplest and fastest solution, since NES games hardly ever need recursion. Recursion isn't even just about arguments, local variables also have to be unique to each function instance, which is another good reason to avoid it.

RAM collisions between routines that call one another is a real issue though, that you could probably solve by increasing the amount of RAM for arguments and local variables and remapping some things, or by using the stack through techniques such as this.
Re: Push arguments to stack
by on (#169769)
Quote:
I don't quite understand what the "problem" is


I guess the problem is...would push/pull from the stack be a good way to pass arguments. How could that be done? And, I feel like I finally thought of a way to do it.

(On a side note... when I first read about a 'stack', back in 1990 or so [how to program Apple IIGS book], I thought that programmers would use it to store variables and pass arguments, but I don't find that to be true)

The alternative, for me, would be to load #1 in A, #2 in X, and #3 in Y, and further arguments into zero-page variables.

But, I also like the idea of having a set of reused/generic zero-page variables, just for passing variables back and forth from subroutines.

The only time this seems to come up is collision checks.
Re: Push arguments to stack
by on (#169771)
pubby wrote:
Code:
; Access the arguments without popping
lda $0100-3,x
lda $0100-4,x
lda $0100-5,x

I'm pretty sure that's wrong - on the 6502, the stack grows downward.

In other words, $0100,X points to where the next byte will be pushed, $0101,X and $0102,X are the return address, and $0103,X / $0104,X / $0105,X are the parameters you pushed.

rainwarrior wrote:
LDA $100-N, X is going to be 5 (unfortunately always a page crossing)

Thus, the above mentioned page crossing is never a concern.
Re: Push arguments to stack
by on (#169775)
Quietust wrote:
pubby wrote:
Code:
; Access the arguments without popping
lda $0100-3,x
lda $0100-4,x
lda $0100-5,x

I'm pretty sure that's wrong - on the 6502, the stack grows downward.

In other words, $0100,X points to where the next byte will be pushed, $0101,X and $0102,X are the return address, and $0103,X / $0104,X / $0105,X are the parameters you pushed.

The axs intsruction should be axs #-4 then... I think. (I obviously didn't test that code!)
Re: Push arguments to stack
by on (#169781)
As long as you don't get close to running out of stack space its a reasonable way to do it. What I am describing below only applies if the function you are jumping to has multiple possible return address. Usually it will or it wouldn't be a function.

Push return address - 1 onto stack
Push variables onto stack
Push function address - 1 onto stack
Use RTS to pull the function address off the stack

At function
Pull variables off stack
do function
Use RTS to pull the function address off the stack and return

However like others have said its usually easier to just have a set of temporary "scratch pad variables" in zero page to store the arguments.

The game I've been working on disassembling almost always uses the scratchpad technique.
Re: Push arguments to stack
by on (#169785)
Quietust wrote:
on the 6502, the stack grows downward.

Wow, sorry, I must have been in a fever dream yesterday. I'll fix my examples...
Re: Push arguments to stack
by on (#169796)
Actually the more I'm thinking about it, the more I think dougeff's idea is clever and desserves a second thought. When there is 3 bytes or less worth of argument there's no need to discuss and it's obvious they should be passed by registers. But when there's more? Passing by zero-page is logical, but the "lda value / sta zeropage" is long and tedious. "lda value pha" takes less bytes for the same thing.

We also have to consider what will stay in the function. In many case, an argument is used only at once specific place, and then it can be used with a simple "pla" instead of "lda value", saving one byte.

There is however a overhead, we need a "jsr" and a "jmp" to call a function instead of just a "jsr". This creates a 3 bytes overhead. So even when no "C" or re-entrency or whathever is considered, using stack as argument passing can be a way to save bytes, even if it doesn't save time.

As a such it will save 2 bytes/argument (one in the caller, and one in the callee) but waste an overhead of 3 bytes. So it's only worth it if 2 or more arguments bytes are passed on the stack. This is assuming arguments are used once in the function. If they're used at many places, then there's no fixed rules, but they're definitely better in a ZP variable anyway. The methods are not exclusive, it could use stack passing as ZP passing.

Quote:
The alternative, for me, would be to load #1 in A, #2 in X, and #3 in Y, and further arguments into zero-page variables.

This is, indeed, not an "alternative" but the most standard way to do things. Also don't underestimate the carry flag, extremely useful as both an input and output parameter :) A good 1/2 of my assembly function use the carry flag for something as either input, output or both.
Re: Push arguments to stack
by on (#193958)
I know this topic is a year old, but the material is always valuable.

If you need more than just A, X, and Y, passing parameters via the page-1 hardware stack or a virtual stack in ZP or elsewhere in memory can work very well and have major benefits over using static variables. The 6502 stacks treatise particularly addresses these things in detail in the following sections:



See also:
Re: Push arguments to stack
by on (#193977)
I came up with a better method for pushing arguments to the hardware stack.

Basically, you have an indirect address in the zero page, the high byte is always 1.

Let's say, you push 5 variables LDA PHA, then get the stack pointer TSX, and store that as the low byte of an indirect address STX. Now, when you JSR, you can access the argument any time, just load Y with it's relative position, and LDA (zp), Y.

Just before the subroutine, you can define these as constants, and reference them by name inside the subroutine...

xPosition = 0
yPosition = 1

....later...

LDY #xPosition
LDA (stackPointer), y

After you RTS, you should pop all values off the stack, just in case it's burying a return address.

(I haven't tested this, I might be off by 1 or something).

This would also work, for managing your own stack in the RAM (similar to how cc65 does it).
Re: Push arguments to stack
by on (#193981)
That works; but note that if you do that, nesting routines requires that you push the ZP indirect address onto the stack and restore it at the end (in addition to pushing and pulling the index register). LDA abs,X does not require any variable space, and it takes one less cycle than LDA (ZP),Y. In the "LDA abs,X" (or other instruction with that addressing mode), the "abs" will typically be $101, $102, $103, etc..

Be sure to look at the pages mentioned about using a separate data stack in ZP too. It avoids certain complications that you'll run up against when using the hardware stack for passing and manipulating temporary data. It may sound like a sacrilege to use precious ZP for a separate stack for data; but it's surprising how little ZP space you actually need for it (I've measured it in a stack-intensive application), and keep in mind that it reduces the need for other ZP variables too, thus paying for itself. ZP offers more addressing modes, a valuable one being (ZP,X) when you have an address on the data stack.

Then of course there's also the technique of having the subroutine use the return address to find data that immediately follows the JSR, and adjust that return address so the processor skips over the data and doesn't try to execute it as if it were instructions. The data space following the JSR is usually a constant (like a string for example), but if it's in RAM, it can be variable data as well. It's all discussed in the pages linked above.
Re: Push arguments to stack
by on (#193991)
LDA abs,X. Interesting. Let me see if I can pseudo code that...

LDA #var 3
PHA
LDA #var 2
PHA
LDA #var 1
PHA
TSX
STX stackP ;stack pointer
JSR subR
PLA
PLA
PLA ;restore original stack position


;blah

var1 = $0100
var2 = $0101
var3 = $0102

subR:
LDX stackP
LDA var1, X ;to access var1
...
LDA var2, X ;to access var 2
...
LDA var3, X ;to access var 3
RTS


I'm not sure this is preferred to any other method. Personally, I've been using zp variables (called temp1, temp2, etc) to pass to subroutines.
Re: Push arguments to stack
by on (#193998)
dougeff wrote:
I'm not sure this is preferred to any other method.

I thought abs, X was the standard way to access stack variables on 6502.

Also didn't pubby already suggest it in the second post in this thread? (Maybe it's a bit distant now, because of the 1 year bump...)
Re: Push arguments to stack
by on (#194005)
There's not a lot of discussion on passing variables on the stack.

I see this topic, from 2009

viewtopic.php?f=2&t=5558

(Bregalad says "lda $100 + wathever,X")

Which references this...

viewtopic.php?t=5491

(Disch says "tsx
lda $101,X "

And this...(also 2009)

viewtopic.php?f=2&t=5038

It seems abs, x...is popular.

Another here...(2013)

https://forums.nesdev.com/viewtopic.php?f=2&t=10521
Re: Push arguments to stack
by on (#194008)
dougeff wrote:
LDA abs,X. Interesting. Let me see if I can pseudo code that...

LDA #var 3
PHA
LDA #var 2
PHA
LDA #var 1
PHA
TSX
STX stackP ;stack pointer
JSR subR

Leave the TSX for the subroutine, so it is only done there and not in every place that calls it. The $1XX numbers will be a little different because of stepping over the return address, but that doesn't add any overhead. If you need to save and restore X, that can be done in the subroutine too. You'd normally push it onto the stack before doing the TSX, which will again change the $1XX numbers (but it's still no problem).

Quote:
var1 = $0100
var2 = $0101
var3 = $0102

subR:
LDX stackP
LDA var1, X ;to access var1
...
LDA var2, X ;to access var 2
...
LDA var3, X ;to access var 3

Yes; but var1, var2, and var3 in the subroutine will be local variables, separate (and for clarity, probably deserving different names) from var1, var2, and var3 above in the calling routine. You can do all the instructions on them that have the abs,X addressing mode, like ADC, ROR, ORA, STA, INC, LDY, etc..

Quote:
I'm not sure this is preferred to any other method. Personally, I've been using zp variables (called temp1, temp2, etc) to pass to subroutines.

Then you have to be awfully careful where they get used so that one routine doesn't use one of the temporary variables and overwrite a value that another pending routine still needs. Been there, done that. The better use you can make of the stack(s), the more you can stay out of that kind of trouble, and the fewer global variables (ZP or otherwise) you'll need.

In the book "Thinking Forth" by Leo Brodie (which is really more about programming philosophy, and its material is not just for applying Forth), there's a section called, "The Problem With Variables." On page 211 there's a cute cartoon I'm tempted to scan and post here (although I'm sure it would be a copyright violation) showing a young man in a body cast in a hospital bed, and a young, slim, curvy woman explaining to him, "Shot from a cannon on a fast-moving train, hurtling between the blades of a windmill, and expecting to grab a trapeze dangling from a hot-air balloon...I told you Ace, there were too many variables!"

Another good reason to keep it on the stack and avoid excessive variable usage is re-entrancy. If the subroutine gets interrupted by an IRQ or NMI, and the ISR needs that subroutine, keeping it re-entrant by having the stack-based local variables will keep you out of trouble. Recursion is possible too (where the subroutine calls itself, over and over until the exit condition is met so it can unwind its way out), but most users here are pretty unlikely to even use recursion.
Re: Push arguments to stack
by on (#194011)
dougeff wrote:
Let me see if I can pseudo code that...

LDA #var 3
PHA
LDA #var 2
PHA
LDA #var 1
PHA

For conciseness (since I'm a macro junkie), you could make macros to synthesize the 65816's PEA, PEI, and PER instructions (which stand for "Push Effective Absolute," "Indirect," and "Relative." (This is covered also in the 6502 stacks treatise, chapter 13, "65816's instructions and capabilities relevant to stacks, and 65c02 code which partially synthesizes some of them.") For the above, PEA is the closest. The 65816's PEA instruction always pushes a 16-bit quantity (and does not affect the processor registers); but you could write a macro that does nearly the 8-bit 6502 equivalent, and call it perhaps PEA8, so you would have:
Code:
        PEA8  var3
        PEA8  var2
        PEA8  var1

and each PEA8 would assemble LDA#, PHA. (If you want it to save and restore A and status, that's extra. The '816 does the whole 16-bit PEA job in five cycles, without affecting processor registers. The closest the '02 could come to doing the same thing with 16 bits is 22 cycles, including saving and restoring A and P.)

Depending on your assembler, you might be able to write a macro to take in an unpredetermined number of parameters and push them all, like this:
Code:
        PEA8   var3, var2, var1    ; Push these ZP variable addresses onto the stack for the subroutine to use.

Now we're down from six lines to one, making the source code more manageable.