This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

NORTH

NORTH
by on (#195204)
I've been meaning to write a FORTH for the NES for quite some time, and inspired by na_th_an's thread I finally got around to it. So here it is: NORTH; NES FORTH.

It's a compiled language. The compiler takes source code and spits out CA65-compatible assembly. The compiler does very little work though. Most of the implementation is just CA65 macros and subroutines.

Because the output is CA65 assembly, it is very easy to mix with hand-written assembly code. You can reference assembly labels in NORTH code, and also reference NORTH labels in assembly code. It's easy to write games that mix and match the two languages.

Anyway, here's a quick top-level overview:
  • All values are 16 bit.
  • Values are stored in a stack, stored in RAM. This can be up to 512 bytes, or less.
  • The X register is used to index into the stack.
  • The hardware stack at $100 is used to store return addresses.
  • 2 bytes of zeropage are needed for storing temporary variables.

And here's some example NORTH code that implements basic movement. I'll attach a ROM of this to this post.
Code:
+=: [ over load + store ]
-=: [ over load minus store ]

umax: [ over over u< 'swap when drop ]
umin: [ over over u> 'swap when drop ]

smax: [ over over s< 'swap when drop ]
smin: [ over over s> 'swap when drop ]

negative?: [0 s<]
abs: [dup negative? [0 minus] when]

friction: [dup abs 2 u<= [drop 0] [dup negative? [2 +] [2 -] if] if]

movePlayer: [
    'player_x player_xspeed.load
    buttons_held.loadLo 'BUTTON_LEFT  & [4 -] when
    buttons_held.loadLo 'BUTTON_RIGHT & [4 +] when
    buttons_held.loadLo 'friction unless
    512 smin -512 smax
    player_xspeed.copy +=

    'player_y player_yspeed.load
    buttons_held.loadLo 'BUTTON_UP   & [4 -] when
    buttons_held.loadLo 'BUTTON_DOWN & [4 +] when
    buttons_held.loadLo 'friction unless
    512 smin -512 smax
    player_yspeed.copy +=
]

drawPlayerSprite: [
    player_y.load (240<<8) umin (OAM+0).storeHi
    0                           (OAM+1).storeLo
    0                           (OAM+2).storeLo
    player_x.load               (OAM+3).storeHi
]

init: [
    0 player_xspeed.store
    0 player_yspeed.store
    (128<<8) player_x.store
    (128<<8) player_y.store
]

mainLoop: [ movePlayer drawPlayerSprite ]


I'll put the code here: https://github.com/pubby/north
You'll need a Haskell compiler.

Maybe I'll write documentation if people are interested.
Re: NORTH
by on (#195282)
This is really interesting! I had never really looked at forth before, but I see that it's very well suited for the 6502.

How could you handle the fact that almost all nes games has to have two threads (main thread and nmi thread)? Should each thread have its own stack? How would the two threads communicate?
Re: NORTH
by on (#195300)
Except for "all in NMI" games like Super Mario Bros., the NMI thread is usually limited to a couple things: pushing queued video memory updates and running a music engine. These would typically be in assembly language and thus not need their own Forth stack.
Re: NORTH
by on (#195329)
Always glad to see a widening of the NES development toolset.
Re: NORTH
by on (#195336)
In north.inc, "rot" and "rotR" seem to have a TAY where they should have a TYA. Figured I'd point that out.

About the actual language, I like that it's actually compiled instead of having the overhead that an interpreter would bring.
Re: NORTH
by on (#195337)
tepples wrote:
Except for "all in NMI" games like Super Mario Bros., the NMI thread is usually limited to a couple things: pushing queued video memory updates and running a music engine. These would typically be in assembly language and thus not need their own Forth stack.


Yes. Of course you could write the two threads in different languages so that they won't use each others resources. But that takes away some of the fun, doesn't it? :)

I saw this neat approach to interrupts in forth. Since it's a "real" forth environment, where words are interpreted at runtime, it can just acknowledge the interupt and insert a word to handle the interrupt after the word currently being processed. So the two can use the same stack without interfering with each other.

Since north is compiled, that approach won't be feasible though, so I'm just curious if pubby has any tricks up his sleave.
Re: NORTH
by on (#195338)
The Forth words @ and ! are supposed to peek and poke an integer; C@ and C! peek and poke individual bytes.

Source: Variables, Constants, and Arrays
Re: NORTH
by on (#195340)
Anders_A wrote:
I saw this neat approach to interrupts in forth. Since it's a "real" forth environment, where words are interpreted at runtime, it can just acknowledge the interupt and insert a word to handle the interrupt after the word currently being processed. So the two can use the same stack without interfering with each other.

Thanks for the mention. I'm the author, and that was one of my first published articles (in Forth Dimensions magazine), about 23 years ago, when I was quite green at writing. I've been using that interrupt method for at least 26 years now. I revised the article for 6502.org (where you linked to) in 2003, but making updates to the things I had there was too difficult, so I started my own 6502 site in 2012, and the article is at http://wilsonminesco.com/0-overhead_Forth_interrupts/ . It will remain on 6502.org as well, but my site will always be the one with the latest updates.

The threading method there is indirect threaded code (ITC) which is probably the slowest-running of all the major method but has a couple of advantages otherwise. Forth is always compiled (unless you're interpreting from an input text stream), but you might be thinking of subroutine-threaded code (STC). Interestingly, Bruce Clark explains how the faster-running STC Forth avoids the expected memory penalties. He gives 9 reasons, starting in the middle of his long post in the middle of the page. STC of course eliminates the need for NEXT, nest, and unnest, thus improving speed.
Re: NORTH
by on (#195363)
NovaSquirrel wrote:
In north.inc, "rot" and "rotR" seem to have a TAY where they should have a TYA. Figured I'd point that out.

Thanks.

tepples wrote:
Except for "all in NMI" games like Super Mario Bros., the NMI thread is usually limited to a couple things: pushing queued video memory updates and running a music engine. These would typically be in assembly language and thus not need their own Forth stack.

I agree.

Anders_A wrote:
Since north is compiled, that approach won't be feasible though, so I'm just curious if pubby has any tricks up his sleave.

Sharing the same stack is possible if the stack register (X or Y) is preserved throughout the program. The current implementation doesn't do this though. You'd probably need to change register X to Y, then implement indirect loads as self-modifying code. If you wanted to store the stack register temporarily you'd use a semaphore system.
Re: NORTH
by on (#195375)
My knowledge of FORTH is a bit rudimentary but I'm curious about terms like (OAM+1) and (128<<8). Why and how are they using an infix notation? Does () pass the enclosed text directly to the assembler without being processed by FORTH?

Is (xxx) equivalent to .word xxx in the generated bytecode or something?

If you're willing, it might be useful to see the assembly output (example.inc?) to understand an example of how things get translated, for people that don't have make and haskell (or other dependencies?) ready to go.
Re: NORTH
by on (#195382)
rainwarrior wrote:
My knowledge of FORTH is a bit rudimentary but I'm curious about terms like (OAM+1) and (128<<8).

I should probably clarify that this is not a true FORTH. Rather, it is a stack language that takes certain ideas from FORTH, certain ideas from Factor, and mixes in some convenient syntax to mesh with CA65 better. Actually, it's probably closer to Factor than to FORTH. Maybe I should have called it NACTOR.

rainwarrior wrote:
Does () pass the enclosed text directly to the assembler without being processed by FORTH?

Yes, that's it. Stuff inside parenthesis is copied verboten into the output.

rainwarrior wrote:
If you're willing, it might be useful to see the assembly output (example.inc?) to understand an example of how things get translated, for people that don't have make and haskell (or other dependencies?) ready to go.

https://pastebin.com/raw/f1WcCYfr

For whatever reason, FORTH people like to call subroutines "words" and so that's the terminology I'm using. Anyway, the compiler understands six types of expressions:

Integer Literals

Integer literals (e.g. 100, $FF, -5, %0101) are translated to:
Code:
__push value

(where value is the integer literal)

Word Literals

Word literals, which are just identifiers prefixed by an apostrophe (e.g. 'foo, 'bar, 'qux, '+, '-) are translated to:
Code:
__push wordname

(where wordname is the CA65 label of the word after name mangling)

CA65 Literals

CA65 literals, which are CA65 expressions inside parenthesis (e.g. (OAM+0), (128<<8), (.lobyte(FOO))) are translated to:
Code:
__push expression

(where expression is the CA65 expression copied verboten) Note that (foo) is equivalent to 'foo except it doesn't perform name mangling.

Words

Words (e.g. foo, bar, +, -) are translated to:
Code:
__call sub,wordname

(where wordname is the CA65 label of the word after name mangling) This gets translated into "jsr wordname", though the macro can inline certain calls.

Tail calls have the form:
Code:
__call tail,wordname

And gets translated into "jmp wordname" instead.

Quotations (Lambdas)

Quotations, which are code expressions enclosed in [ ] brackets (e.g. [foo bar qux], [2 +]) are translated to:
Code:
__push __quotN

(where __quotN is a CA65 label corresponding to the quotation subroutine, which will be defined later on in the assembly output)

Address Operations

Address operations, which are labels or CA65 expressions followed by a period and an operation name (e.g. foo.store, bar.load, (OAM+0).store) get translated to:
Code:
__addrOp op,addr

(where addr is the expression to the left of the period and op is the expression to the right.)

Address operations are used to implement loads and stores inline, which would otherwise require slow indirect addressing. The operations are all defined in the macro body.
Re: NORTH
by on (#195384)
pubby wrote:
rainwarrior wrote:
If you're willing, it might be useful to see the assembly output (example.inc?) to understand an example of how things get translated, for people that don't have make and haskell (or other dependencies?) ready to go.

https://pastebin.com/raw/f1WcCYfr

Thanks for this. Really helps understand what's going on.



Ah, I'm realizing that these are all macros, and there is no bytecode, like I had initially assumed.

I notice that the two most common macros __push and __call are ~11 bytes each. This seems to be adding up quickly. This example.nes already appears to be using ~2k of compiled code? (Am I estimating this correctly?)

I mean, it's a speed vs size tradeoff of course, but I've usually found that size is the more precious resource on the NES, and speed can be addressed by selectively optimizing problem areas in assembly.

Anyhow, I know it's still very early, I'm not trying to criticize the design, just I was quite surprised to see that it translates directly into unrolled native code.
Re: NORTH
by on (#195390)
rainwarrior wrote:
I notice that the two most common macros __push and __call are ~11 bytes each. This seems to be adding up quickly. This example.nes already appears to be using ~2k of compiled code? (Am I estimating this correctly?)

Without inlining, _call is only 3 bytes (it's just a jsr or jmp). Maybe you were looking at .proc call (library code) by mistake?

Looking at the map file, the example NORTH code compiles to about ~1200 bytes. Disabling inlining and using zeropage stack drops it to ~600 bytes. Using a space efficient __push theoretically drops it to ~450. The library code adds a few hundred bytes on top of this.

Like you said, it's hard to know how these numbers would scale when writing a real program. FORTH can reuse code better than 6502 assembly, for example.

Oh, and there's also sonder's NES forth: Eight. It uses bytecode and was optimized for space, so maybe that would be good to compare to.
Re: NORTH
by on (#195393)
pubby wrote:
Without inlining, _call is only 3 bytes (it's just a jsr or jmp). Maybe you were looking at .proc call (library code) by mistake?

Yes, that was the mistake I made, sorry.

My ~2k estimate was that your NMI vector is at $C8BC (2236 bytes), and it seemed from the CFG/etc that everything above that address should be from "north.inc" and "example.inc". Was there something else in that block that I missed?

pubby wrote:
FORTH can reuse code better than 6502 assembly, for example.

What code reuse feature is present in FORTH that is missing in assembly?

I mean, the text of assembly code is hella verbose, but the ability to reuse the binary code is extreme... you can jsr/jmp right into the middle of a subroutine or loop, for example. Self modifying code makes a ton of interesting reuse patterns possible too.

As far as "code reuse" language features, I'd think of things like lambdas, lisp macros, polymorphism, generics/templates etc. but does FORTH have stuff like that? What power of reuse are you invoking? (This is not a rhetorical question, my knowledge of FORTH is limited and I'd be happy to be educated.)

Like if we're talking about the text size of code, there's no argument here that you can do tons more with less text, but when you get down to the size of the binary code... I really don't believe anything has the power that assembly does? I'm having trouble imagining what you meant by that.
Re: NORTH
by on (#195394)
pubby wrote:
For whatever reason, FORTH people like to call subroutines "words"

Everything in Forth is a word except the data itself. Even . , @ ! " ( etc. are words, just with very short names since they're used constantly, and there is no real punctuation, nor syntax. For example, a variable may not seem like a "word" or routine, but it is one that puts the address of the variable's data space on the data stack. I wouldn't doubt if with some modern, very complex processors, that may take only a single assembly-language instruction.

Anyway, all words have definitions, and the whole collection of them is called the "dictionary." A word can have different meanings in different contexts, just as the word "ball" has one meaning in the context of dance, another in the context of baseball, and another in the context of football. AND has a different meaning in the assembler context (if you've included an assembler in your Forth—and BTW, macro capability is automatic in even a very simple Forth assembler), and another meaning in the Forth context. Not surprisingly, CONTEXT, VOCABULARY, and DEFINITIONS are standard Forth words.

Everything you write extends the dictionary, and your words become every bit as much a part of the language itself as the original kernel's words. ("Language"..."dictionary"..."vocabulary"... see the theme?) You can write new functions, program-flow structures, or anything else, even new operators, and they become part of the dictionary. The compiler looks up the words in the dictionary. When it finds each word, if it finds that the word is not immediate, it lays down the addresses of the code to run it. But if it is immediate, it executes it right then, and that word can optionally take temporary control of the compiler. It makes for a system with virtually no limits.
Re: NORTH
by on (#195398)
I guess interpreted FORTH should have a form of "self modifying code", because you can redefine old dictionary entries, right? This doesn't seem to apply to this compiled FORTH implementation, though, which appears to have a static dictionary at run-time. This is an interesting language feature to me, though, at least, even if not relevant here.

I can also see how having a separate parameter stack and execution stack makes a lot of argument wrangling very simple compared to most other languages. I guess part of that is only having one "type", as well, though that seems to be as much a help as hindrance...

Writing the smallest possible code would likely involve a bytecode interpreter, again not part of this implementation, but there are other examples of this (e.g. SWEET16).

My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once. You can use a separate parameter stack as a convention if you like. You can make bytecodes if it'll make things smaller. Maybe you would argue that once you have a bytecode interpreter you're not actually writing in "assembly" anymore, but I don't really want to have an argument about the definition, I'm just wondering if there's really something special in FORTH that you were making reference to that I'm not understanding.

I mean, if it was really a statement about code text size, that's fine, I'd agree, I just misunderstood the context because I thought we were talking about binary size. If it was a statement about the "FORTH mindset" producing better code, then that's some subjective thing that I probably don't care to argue about.


At any rate, I'm looking forward to seeing what you make with it. I'd really like to know what a larger well written game in FORTH looks like.
Re: NORTH
by on (#195402)
rainwarrior, I take it you are addressing multiple people here. My own last post was just giving some background to answer the implied question about why the term "words" is used in Forth. Parts of the Forth approach can be taken even in assembly language though, and you can also cut into and out of assembly at any point in an otherwise Forth program.

One of the Forth threading methods is called "token threading," where all the most commonly used words are given a one-byte token in the compilation instead of the two-byte address of the code to run for each word. Then there must be an interpreter that looks up the execution address from a table at run time. This slows it down but reduces the binary size. Assuming you allow for more than 256 words, at least one byte value must be reserved to tell this interpreter that next you're going to give an address or at least a second token byte, instead of it being a one-byte token. If you implement this, you can still add words at any time.

You can indeed do self-modifying code in Forth, but the more common way to redefine words is to just write your new version. Anytime there's compilation or there's interpretation of the input stream, FIND searches for the word in the current context, beginning with the most recently defined word in the context, and working backwards. That way the most recently defined version will be found first, ending the search. That one could even be defined in terms of the last previous defined version. If you decide you don't like the new one and you want to go back to the old one, you can just FORGET the new one, and the dictionary pointer is put back to point where it did right before you defined that newest one.

Parenthetical note: FIND normally runs only during compilation or when you're interpreting the text input stream, not when compiled code is running; so if you redefine a word that other already-defined word is using, that word will also have to be redefined to use the new one, or you'll need to modify it. Several approaches are possible, and none of them require re-compiling very much, which is one reason Forth's interactive development goes so fast.

If desired, you can have more stacks than just the return stack (which is in page 1 on the 6502) and the data/parameter stack (which is normally in page 0 on the 6502). The most common third stack is a floating-point stack (although it's also common to handle floating-point operations on the normal data stack if there's no separate floating-point stack). You could add a complex-number stack, even a string stack, or whatever you like.

Quote:
My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once.

Absolutely. Go for it.
Re: NORTH
by on (#195403)
It was mostly a continuation of my previous post, speculating about Pubby's comment about code reuse. I knew the FORTH definition of "word" but maybe your post got me thinking about it a little more.


Why does "token threading" take the name "threading"? Does it have anything to do with threads as concurrently executed code, or does it have a different meaning in this context? What you described sounds like a normal bytecode size optimization (which I've seen in several bytecode formats)-- maybe the extreme case could be a huffman bit encoding based on frequency of use rather than working at the byte level? ;)


I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources, though most of them seem to talk about how small the interpreter and bytecode implementation are. I can see how it has relatively good value for size of interpreter vs. code utility, at least, as a language with a very small definition. Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore. Being simple to implement is also a big advantage on its own, though, and it's not like we have a good optimizing 6502 compiler for any language at the moment. :P

I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
Re: NORTH
by on (#195415)
rainwarrior wrote:
Why does "token threading" take the name "threading"?

I suppose it's like using a needle to pull a thread through a group of beads, where each "bead" is an instruction. Dr. Brad Rodriguez who's a big name in the field of Forth has an article on five different threading methods at http://www.bradrodriguez.com/papers/moving1.htm . The three most common are indirect-threaded code (or ITC) which I use regularly), direct-threaded code (DTC), and subroutine-threaded code (STC). STC is mostly a list of JSR's, but since it's essentially machine language, you can mix in non-JSR code as well, and select your balance of JSRs versus straightlining for speed. ITC and DTC are mostly lists of addresses, meaning they only take two bytes (instead of three) per instruction. Token threading takes only one byte for all the most commonly used instructions.

Quote:
I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources

It is, but it has also been used for a lot of big systems, including (but not limited to) database, spacecraft, space shuttle experiments, airport facilities handling, and hospital & banking management. You can get a fairly powerful system in a small memory footprint too, although the kernel takes a certain amount of memory (probably not less than a few K for a rudimentary one) before the application goes in. A tiny application might be smaller in assembly language since it doesn't need the Forth kernel; but past a certain point, Forth's program memory savings start to pay off nicely.

Quote:
Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore.

You can still get a huge advantage in development time once you get your system set up. I hope I don't sound like I'm trying to make anyone do things any certain way though. I've used a few algebraic languages, and most recently as I've tried to learn C, I have been realizing that some people's brains seem to be destined to think one way more than the other (algebraic versus RPN), like it's inborn, not just a matter of background. I definitely do better in RPN.

Quote:
Being simple to implement is also a big advantage on its own

True, and probably any mid-level 6502 programmer who wants to apply himself can understand the innards of Forth, including the compiler (which is only a couple hundred bytes or less!).

Quote:
and it's not like we have a good optimizing 6502 compiler for any language at the moment. :P

So true (and I'm especially thinking of cc65).

Quote:
I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.

I've done tests to see how much stack space I was using with the most intensive Forth application I could think of, with IRQs serviced in high-level Forth, plus NMIs going too and serviced in assembly, and it was not even 20% of ZP and page 1. The only truly multitasking projects I've done were cooperative and without a multitasking OS, so they weren't hard on stacks at all. It is easy though to do a round-robin cooperative multitasking OS even on the 6502 though, by dividing the stack space into sections and assigning a section to each task. Then the number of tasks is limited by the stack space. Three would be no sweat at all. Six might be done with care. Beyond that, you'll have to do a lot of analysis, or be copying out sections of dormant tasks to higher memory to make room for active tasks.
Re: NORTH
by on (#195454)
rainwarrior wrote:
speculating about Pubby's comment about code reuse.

Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.

Quote:
Similar but maybe less rigidly than something like Haskell?

I think the most enlightening statement I've heard is that stack languages model function composition by default. So "[foo bar qux baz]" can be read as the composition of four functions. You can do this in Haskell too, but it requires a decent amount of type plumbing, operator overloads, and it's not always flexible.
Re: NORTH
by on (#195455)
pubby wrote:
Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.

Might it be that you're using assembly language for different and less-complex jobs? My last major assembly-language project was on a PIC16 microcontroller which only has an 8-level return stack, and toward the end I was constantly overrunning it and had to find relevant places I could straight-line things that wouldn't run me out of program memory (whose limit I was also pushing). If the PIC16 would let you store data on the return stack, I could have used it even quite a lot deeper. (The PIC16 is really mickey-mouse compared to the 6502, BTW.) I don't think my assembly nests any less deep than my Forth; but my Forth work has definitely influenced my assembly.
Re: NORTH
by on (#195459)
Garth wrote:
Might it be that you're using assembly language for different and less-complex jobs?

That's true, too.
Re: NORTH
by on (#196531)
Regarding letting interrupts use the same stack. I don't think you need to do anything fancy. The only rule is that the stack pointer needs to be accurate at all times -- you should never read from an unreserved part of the stack.

Code:
lda STACK-1, x ; not allowed (assuming stack grows down)


But it would be totally fine to read from any part of the reserved stack

Code:
lda STACK+1, x ; ok


The interrupt handler would need to have a type of ( -- ), that is, it should leave the stack undisturbed once it returns, and not read the current data on the stack. It would also need to save any temporaries the code uses and restore them once it's done.

The interrupt handler could also leave a small red zone.
Basically, it could leave a small (say 2 byte) buffer below the stack data of the normal code. That would allow normal code to use STACK-1, x and STACK-2, x, but no further. This could potentially cut down on the amount of stack pointer manipulation.

Here's the interrupt handler I'm thinking of. Assume for simplicity x is the stack pointer and STACK only holds 8 bit values.

Code:
irq:
    ; save registers
    pha
    tya
    pha

    ; reserve 2 bytes for temporaries, 2 bytes of red zone
    dex
    dex
    dex
    dex

    ; save temporaries
    lda TMP1
    sta STACK, x
    lda TMP2
    sta STACK+1, x

    ; do stuff... should preserve x by the end

    ; restore temporaries
    lda STACK+1, x
    sta TMP2
    lda STACK, x
    sta TMP1
   
    ; free stack space
    inx
    inx
    inx
    inx

    ; restore registers
    pla
    tay
    pla

    rti


Does this make sense?
Re: NORTH
by on (#196536)
That's true about the stack pointer (extra overhead notwithstanding), but there's at least one more consideration (and maybe more will come to me after I post, LOL). Forth typically has a scratchpad area called N which primitives can use however they like, as long as they are completely done with it when they finish. They cannot use it to pass data to another primitive, or to hold data there for the next time the same primitive runs. On the 6502, 8 bytes seems to be enough for N. It is put in ZP for efficiency, although its being in ZP also means you can use the extra ZP addressing modes in N if necessary. If you allow interrupt service to start without letting the primitive finish, then you'll also need to push all 8 bytes of N. You could have a separate N for interrupts, but then you would be consuming more precious ZP space, and interrupts would not be nestable, meaning you can't have a more-urgent interrupt that would be quick to service cut in on a lower-priority interrupt that may take longer to service.
Re: NORTH
by on (#196546)
That makes sense. Saving and restoring 8 bytes on the stack for every interrupt wouldn't be ideal. That's a lot of overhead. My code above assumes only 2 bytes that need to be saved.

Having a separate N also forces you to have different versions of the primitives that use N depending on whether it's an interrupt or not. That would preclude code sharing between interrupt code and normal code.
Re: NORTH
by on (#196704)
There are some trade-offs, but I wonder if the primitives could get by with just 2 bytes of fixed zero page, and a larger red zone? Fixed bytes have a 1 cycle speed advantage, but you can still use the useful (indirect, x) addressing mode in the red zone. If (indirect), y addressing is needed, then they can use the fixed bytes.

The hardest part would be to deal with cases where a primitive needs more than 1 pointer that uses indirect, y addressing, but you could probably work around it.

The code for an interrupt handler that leaves an arbitrary red zone is mostly the same as above. It just uses this code instead of a string of inx or dex:
Code:
    txa
    clc
    adc #redzone
    tax