This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

convoluted rts trick macro...need better approach

convoluted rts trick macro...need better approach
by on (#55794)
I had a few indirect jumps followed by hard coded return locations in parts of my game engine. I decided to revise this and use the well known "rts trick." The wiki pointed out you must have a subroutine that does the trick in order for the jsr to push the correct return address on the stack. However I didn't like that I'd have to jump far away from the code just to jump far away again, so I developed this macro (ca65 syntax):

Code:
.macro indirectJsr address

  lda #>(*+12)
  pha
  lda #<(*+9)
  pha
 
  lda address+1
  pha
  lda address
  pha
 
  rts

.endmacro


Where * is the current program counter address (as calculated during assembly of your code). I wondered if anyone else has used a similar approach for their own usage of the rts trick?

by on (#55795)
Interesting on the first part. But for the last part, you'd need to read from a table that has something like:

Code:
TableOfPlaces:
 .dw DesiredAddressA - 1, DesiredAddressB - 1

LDY navigator
LDA TableOfPlaces + 1, Y
PHA
LDA TableOfPlaces, Y
PHA
RTS


That's how I did it, anyway. I think the indirect jump method is one fewer cycle though.

by on (#55796)
"Low overhead"?

This seems like high overhead. I never liked the "push your address then RTS it" crap. It always seemed absurd to me.

Code:
; bytes,cycles
;  (my cycle count migiht be off.  I'm doing this from memory
;  and I'm rusty)

.macro indirectJsr address

  lda #>(*+12)  ; 2,2
  pha             ; 1,3
  lda #<(*+9) ; 2,2
  pha             ; 1,3
 
  lda address+1  ; 3,4
  pha         ; 1,3
  lda address  ; 3,4
  pha         ; 1,3
 
  rts    ; 1, 6

.endmacro

; total:   15 bytes
;           30 cycles
; AND your 'address' has to be -1 the actual address you want to jump to
;  (ugh)


The straightforward approach seems simpler:

Code:
; this is pseudo code
; my ca65 macros (or whatever) is rusty

.macro IndirectJSR address
  jmp phoneylabel_jsr  ; 3,3

phoneylabel_jmp:
  jmp (address)  ; 3,5

phoneylabel_jsr:
  jsr phoneylabel_jmp  ; 3,6

; total:  9 bytes
;          14 cycles


iirc you can have ca65 generate phoney labels that only appear in the macro, so it won't interfere with other labels in your program. I forget exactly how that works though.

But having a common indirect JMP somewhere in the hardwired bank and then JSRing to it still seems like the best solution.
Re: low overhead indirect jsr (rts trick)?
by on (#55797)
I have to honestly say that this method is not very good. First, even if you plant the return address like you're doing, I see no reason for you to use...

Code:
  lda address+1
  pha
  lda address
  pha
 
  rts

...intead of...

Code:
  jmp (address)

...which is much faster. It's like you want to use the RTS just for the heck of it, not because you need it. That trick is often used with jump tables, because you'd have to fetch the destination address from the table even if you were to use JMP (), but in your case the address is already at a known location in RAM, there is absolutely no need to copy it to the stack.

You also waste a lot of time planting the return address manually when you could do it with a JSR much quicker. If I were you I wouldn't worry about "having to jump far away from the code just to jump far away again", because although it sounds like a bad thing to do it's still faster and more compact than your current solution.

Here's how I do it: I have a few temp locations in ZP that I use as scratchpad memory. Somewhere in ROM I have a few (as many as necessary, but usually no more than 3 or 4) indirect jumps to some of the temp locations acting as subroutines.

Code:
   Address0 .dsb 2
   Address1 .dsb 2
   Address2 .dsb 2
   Address3 .dsb 2

   (...)

CallAddress0:
   jmp (Address0)

CallAddress1:
   jmp (Address1)

CallAddress2:
   jmp (Address2)

CallAddress3:
   jmp (Address3)

Those locations act as virtual address registers, which I can use not only for indirect JSR'ing but also as pointers and such. Those few indirect JMPs take much less space than what your macro expands to.

by on (#55798)
I'm glad I posted. Thanks for the ideas/correction! I wrote a new macro based on Disch's idea, and holy crap, that's a lot simpler =).
Code:
.macro indirectJsr address

  jmp *+6
  jmp (address)
  jsr *-3

.endmacro


*edit*
I guess I could use phony labels like Disch mentioned and do:
Code:
 
jmp :++
: jmp (address)
: jsr :--


But, I think I'll stick with the program counter approach just to account for the extremely unlikely situation I'm still using anonymous labels anywhere in my code. I tried to get rid of all of them a while back, it makes one's code impossible to read.

by on (#55799)
Disch wrote:
I never liked the "push your address then RTS it" crap. It always seemed absurd to me.

I don't think it's necessarily crap, but it's also not such the big find that we sometimes make it to be. It is 1 cycle slower than the indirect JMP way if the JMP uses ZP to hold the address, but it is 1 cycle faster if the JMP doesn't use ZP. Also, there are cases when we don't want to create a new variable just for a certain purpose, and we'd rather use the stack instead. But I admit that there are few advantages, when any, in using the JSR trick instead of an indirect jump.

by on (#55801)
If you're looking to have a substitute for the non-existent JSR ($XXXX), and you only want to use 2 bytes of RAM for the instruction, do this in your code:

Code:
....
jsr IndirectJSR
....

IndirectJSR:
jmp ($XXXX)



That takes 3 or 5 extra cycles, and saves you a lot of hassle. And wherever $XXXX points to, you can have an RTS and it will take you right back to after the "jsr IndirectJSR". I haven't tested this method; I just came up with it and I think it would work great.

EDIT: Oh, I guess Disch already kind of posted the same solution! Except I'm not sure why there's a JMP to the JSR, which is after the JMP ($XXXX). Why not just have one universal "IndirectJSR" routine that you use and never have to define again? I suppose if you're using different values for $XXXX, then yes, you'd want more than one routine, but it wastes time to needlessly stick in a JMP + 6 to skip the indirect jump that you JSR to... It seems the macro makes things easier to program, but performance goes down a little, and it takes up more space, it seems.

by on (#55804)
Celius wrote:
Except I'm not sure why there's a JMP to the JSR, which is after the JMP ($XXXX).

Disch's solution is a macro which is to be used "in place", so if you want to return to the correct location later you have to skip the indirect jump.

Quote:
Why not just have one universal "IndirectJSR" routine that you use and never have to define again? I suppose if you're using different values for $XXXX, then yes, you'd want more than one routine

Which is the solution I presented. With 3 or 4 fake "address registers" I have never had to worry about this again.

Quote:
It seems the macro makes things easier to program, but performance goes down a little, and it takes up more space, it seems

In this case, yes. Usually macros do need more space, but they are supposed to be faster, because there is no calling and returning, but in this particular case using macros is indeed a bit slower, so I really don't see a reason to use them in this case.

For every address you call this macro with, two JMP instructions will be generated, when you could very well manually write just the indirect one somewhere else... So you are really just wasting space and time (it may not be much, but there is no advantage here that justifies the waste) IMO.
Re: low overhead indirect jsr (rts trick)?
by on (#55818)
tokumaru wrote:
I have a few temp locations in ZP that I use as scratchpad memory [for] indirect jumps to some of the temp locations

So you're writing the function pointer to a global variable in RAM. You almost hit on the one advantage of using the stack for jump table calls vs. a temporary location in allocated memory (zero page or BSS): you can use jump tables in both your main thread and your interrupt handler without them stepping on each other.

by on (#55821)
Thanks for the additional pointers. I'm starting to learn that Disch likes to frame his answer assuming the OP had some reason for the original way they structured their code: He once found a way to solve a convoluted problem I was trying to solve where I didn't really need the solution (wraparound test, remember?)---and he has done it again for me =) In both cases, it seems I should just change my approach and do it the simpler way. I was using my macro in several different places in my code where I use a specific ZP variable to hold the address to jump to. I guess what I'll change it to now is just a jsr to a location that jmp (to what I was passing into the macro), and I'll save an extra jmp. Thanks everyone!

For some reason, I had been really locked into that "rts trick" thing, thinking it was the only way to simulate an indirect jsr. It didn't occur to me to search for some other way of doing it, hence my original, rather convoluted macro. Is there really any value to using the rts trick?

by on (#55823)
The only place I use the rts trick, I used it instead of jmp() because it saved me 2 bytes (2 times pha instead of 2 times sta zeropage), and because it removes the need of 2 temp variables.

There is absolutely no other advantage of this over a regular jmp().

by on (#55832)
I want to make sure I have something straight here, regarding the jmp indirect bug:

jmp ($xxxx) is safe to use in this case because you are jumping to a variable, which is obviously not going to straddle a page boundary? (Unless you set it up that way...)

by on (#55836)
UncleSporky wrote:
jmp ($xxxx) is safe to use in this case because you are jumping to a variable, which is obviously not going to straddle a page boundary?

Correct. You can .align 2 before declaring the variable to be absolutely sure.