This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Ophis 2 Assembler, "To HLL and Back" published

Ophis 2 Assembler, "To HLL and Back" published
by on (#95944)
I've just published a modernized version of my old "Ophis" assembler: http://michaelcmartin.github.com/Ophis/

It does seem like the community has largely standardized on CA65, which is a very good assembler indeed, but there are three reasons this may be of general interest:
  • Its complexity level occupies a midpoint between CA65 and NESASM; it's as flexible in its output as CA65 but tries to bind source and binary together a little more closely, hopefully making it a little easier to get off the ground.
  • People who (like me) grew up on x86 and MIPS assembler may find the format more congenial.
  • I've gathered all my writings on exploiting high-level language features in 6502 assembler and bundled it with the documentation. These are likely to be useful to programmers of medium skill in any assembler. This includes working protocols for object-orientiation-like effects as well as full recursion.
Ophis's primary distinguishing features are:
  • Full memory segmentation, with a complete divorce between the "program counter" and the file format. You may maintain as many segments as you want, and can enforce that some segments never hold initialized data. Within a segment you can arbitrarily reassign the program counter, allowing for one to write relocatable code without necessarily also making it position-independent.
  • A strict relationship in output order between source and binary. This is either a missing feature compared to CA65 or one less thing to worry about compared to it, depending on taste.
  • Optional support for emitting undocumented opcodes.
  • Automatic instruction collapse (zero page selection) and automatic expansion of illegal branch instructions into branch-jump pairs.
  • Full support for temporary labels within arbitrarily nested scopes as well as fully anonymous labels to make writing tight loops more convenient.
  • A unique macro system based on function-call semantics instead of the more traditional textual replacement.
  • In-source output control, which means that someone else assembling your code will usually only have to type "ophis srcname.oph" and get the right results, while still letting the user override it at the commandline.
Now to actually do something fun with it. :P

by on (#95946)
Sounds a little like ASM6's style of memory management, except ASM6 does the same kind of thing without naming any segments. In ASM6, you just reassign $, and it does it. File output position doesn't change, but addresses do. ".org" is used as a space filler, and sets the origin address only if it hasn't already been set. By using $ = $8000, some code, and .org $C000, then another $ = 8000, you can define banks that way. No segments, just reassigning $ and padding to a finishing address.

ASM6 also has a mode where it doesn't emit any code, but still advances the address, it's called enum mode.

But ASM6 just does things in a very simple way, so I use that assembler.

by on (#95957)
How about 65816, Hu62c80, and 65c02 processors?

by on (#95976)
Hamtaro126 wrote:
How about 65816, Hu62c80, and 65c02 processors?


65816 and z80 support have never really been on the table for Ophis, since I don't have a personally compelling application for them and there's a lot of extra infrustructure involved in supporting it unused by the other systems. It's a lot closer to "write a different program and stick it in the same executable", while these other CPUs are much closer to "tweak a few configuration options".

65c02 is fully supported but requires a command-line switch to activate (so you don't get spurious 'you've named your variables after opcodes' warnings, and because they conflict with the undocumented opcodes).

HuC6280 isn't directly supported but it looks like it wouldn't be too tough to write macros to support the bankswitching extensions. (The 14 new instructions are all Implied mode in source, so you could make macros for .byte statements.) Adding direct source support would be a bit trickier, since you'd have to special-case them as another invisible addressing mode, a la Relative - those instructions aren't *really* "Implied" mode even though they look like it.

by on (#95977)
mcmartin wrote:
65c02 is fully supported but requires a command-line switch to activate (so you don't get spurious 'you've named your variables after opcodes' warnings, and because they conflict with the undocumented opcodes).

Another assembler handles those with the .SETCPU instruction.

Quote:
HuC6280 isn't directly supported but it looks like it wouldn't be too tough to write macros

"Super Mario Bros. 3? Our console is so advanced it has SMB3 as one instruction."

Quote:
Adding direct source support would be a bit trickier, since you'd have to special-case them as another invisible addressing mode, a la Relative - those instructions aren't *really* "Implied" mode even though they look like it.

If your macro system is strong enough, you can write each CPU as a set of macros, even standard 6502.

by on (#95979)
Quote:
If your macro system is strong enough, you can write each CPU as a set of macros, even standard 6502.


Or use Tables for Opcodes.

by on (#95981)
Lots of good comments - thanks, folks. I'm going to open tickets on my project for .setcpu (actually, more generally, "anything you can set by the commandline should be settable by an assembler pragma") and for the bankswitching extensions on the HuC6280 (the only bit missing from supporting that).

tepples wrote:
Another assembler handles those with the .SETCPU instruction.


That's pretty clean.

Quote:
mcmartin wrote:
HuC6280 isn't directly supported but it looks like it wouldn't be too tough to write macros

"Super Mario Bros. 3? Our console is so advanced it has SMB3 as one instruction."


Ha :lol:

SMB3 is part of the general 65c02 support, though, so Ophis does that fine - I was talking about the bankswitch instructions which seem to be the main addition to the ISA there.

Quote:
If your macro system is strong enough, you can write each CPU as a set of macros, even standard 6502.


Very true. There are enough assemblers out there at this point that you really need an ulterior motive to write one now (mine were "I need to learn Python and it's 2001 and I don't like NESASM or DASM" and, for this iteration, "My God, this code was written back before Python had True and False as things, I should probably modernize it") and some piece of it that is The Fun Part. I could see one where The Fun Part is that the assembly process is just repeated/progressive macro replacement, but Ophis's Fun Part was the instruction selector passes (automatic zero page compression and illegal branch extension). That's also a big piece of why I'm not planning 65816 support - stuff like a reassignable direct page and modal register width mean I'd have to revisit all the assumptions. (This is less of an issue for CA65 both because it has a much stronger emphasis on relocatability, and because it was designed for 816 code from the start, so "any time you name this symbol, the direct page is XXXX" is part of the basic protocol.) It would be effectively starting from scratch and then linking it into the rest.

Hamtaro126 wrote:
Or use Tables for Opcodes.


Ophis does use tables for opcodes, but it also needs to distinguish addressing modes - idiomatic 6502 code will need to assemble "lda $05", "lda $105", and "bne $105" into three different addressing modes despite them all being lines that are just "opcode, expr". Where it's something it already does - like SMB3 or BBS3 - that's really easy because all you have to do is extend the table. TMA i, though, needs to emit 2**i as its second byte, so this means you either need to extend the concept of "implied" to be "opcodes can be multiple bytes long" and make each TMAi a single opcode or you need a new kind of Immediate that does the exponentiation. It's not hard, especially since you don't need any context beyond what the opcode is and the value of the number next to it to make your choice, but it is a new logic path to add.

The z80 (and x86, really) are a lot less amenable to this without pretty extensive preprocessing, too, since they like to pretend in the assembler code that there's this LD instruction even though it's still more like LDA/LDX/LDY under the hood. Still quite doable, but I'd be picking a language optimized for that kind of multiple lookup to target it.