This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Writing my own assembler

Writing my own assembler
by on (#227233)
As I recently mentioned in another thread, after years trying to adapt to many of the existing 6502 assemblers and feeling constantly frustrated due to quirks and lack of specific features, as well as to the time I have spent trying to customize them to suit my needs, I've decided to write my own assembler. It's not supposed to be the ultimate assembler to dethrone them all (far from it!), but it'll pack everything I need out of the box so I don't have to overcomplicate things with intricate macros and jerry-rigs. The goal is to write something simple (so it doesn't take forever to get done), easy to use (no need for complex configurations) and generic enough to produce binaries for any 6502 machine (no need for NES header directives, for example, that can be done with macros). If I can make it flexible enough so it's easy to add support for other CPUs, even better!

I'm modeling this mostly after ASM6, which is the assembler that most closely resembles the ideal tool for me, but I'm picking up ideas from other assemblers as well. You might be asking: "If this is so close to ASM6, then why not just create a fork of it?" Well, besides not knowing HOW exactly to do that (I'm not particularly skilled in C and I can hardly understand the design of ASM6 from just looking at the source code), I'd need to modify a few core features of the program, and that would probably be too hard or even impossible for me to do, so I might as well write the whole thing from scratch. Plus I'm a control freak and don't want to be bound by other people's design choices, if I decide to include even more stuff later on. For now I'll be using a higher-level language than C, most likely JavaScript (one of the languages I'm most proficient in) running on Node.js, at least for prototyping, in order to get something working ASAP. If everything goes well, I may or may not port it to something more efficient at a later time.

The reason I'm making this thread is not to advertise my assembler, create hype (as if!), take requests, or anything like that. I'm actually a little insecure about my design ideas and ways to implement them, seeing as I've never written a program like this before, so I'd like to discuss some of these ideas with you guys first, to make sure I'm not missing anything important and making bad decisions left and right. If the end result is something people will be interested in using, I'll be more than happy to share the program, but like I said, my ultimate goal is to be able to write 6502 programs in a way that *I* am comfortable with.

First, I'd like to talk about the things I plan to carry over from ASM6, which are the following:

- LINEAR ASSEMBLY: I never cared much for outputting to different segments and linking a bunch of separate modules to put a ROM together, to me it makes more sense to simply fill the ROM linearly. Even when using ca65 I never felt the need to use segments or link individually assembled modules, I just used what as necessary to write my code as linearly as possible.

- MULTIPLE PASSES: I value this a lot because it allows for complex symbol resolving, which in turn allows for more dynamic memory arrangements, such as overlays, right-aligned sections and relocatable code, without specific features to handle those cases. All in all, the ability to freely use symbols/labels in expressions and directives greatly improves the versatility of the assembler.

- SIMPLE MACRO SYSTEM: Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler. Labels inside macros are all local. Recursion is not allowed.

Now here are a few things I'm also basing off of ASM6 but I plan on changing, taking other assemblers and my own use cases into account:

- OVERLAYS: ENUM is still the primary way to declare variables, but in order to facilitate the creation of overlays, ENUMs can optionally be named. This allows for an ENUM to pick up from where another ENUM left off. If two or more ENUMs pick up from the end of the same ENUM, they're effectively overlays. ASM6's ENUM can already kinda be used like that, you just have to define a label at the very end of the ENUM, and use that label as the starting address of another ENUM to keep going from where the first one stopped. The problem with that is that all these extra labels will get exported to label files along with the ones that are actually relevant for debugging. Maybe the solution is not to change ENUM, but to take a cue from ca65 and implement two forms of assignment for symbols, one that marks the symbol as a label (:=) and one that doesn't (=).

- ANONYMOUS LABELS: Anonymous labels are unidirectional in ASM6, and I have had to awkwardly put a + label and a - label on consecutive lines because I needed to access that point both from before and after it. For this reason, I feel like ca65's way is superior - the label is just a colon, and the direction is specified in the reference instead. Matching the number of colons to the number of + and - symbols is a bit error-prone though, specially if you need to add or remove an anonymous label between others that were already there, requiring you to double check and adjust all the nearby references, but I can live with that, specially considering that you're not supposed to abuse anonymous labels in the first place.

- LOCAL LABELS: Local labels start with "@", but having their scope delimited by non-local labels is too restrictive IMO. I constantly write subroutines with multiple entry points, which are obviously defined via global labels, and having those global labels break the scope of the local labels that should be visible in the entire subroutine is a major annoyance. To fix this, scopes now must be explicitly started with the SCOPE directive. Unlike in ca65, scopes are not blocks, you can only end a scope by starting a new one, meaning it's not possible to have nested scopes. Scopes can be named, which allows their local labels to be accessed from the outside (e.g. SomeScope.@LocalVariable).

- REPEATED LABELS: Several NES mappers require reset stubs and common subroutines to be repeated across multiple banks, and this is a problem when assembling a program because labels can't be repeated. One way to work around that in ASM6 is to define labels by assigning the PC to a symbol (e.g. MyLabel = $), since symbols can be reassigned without errors. However, I have created a problem when I introduced the SCOPE directive, because now I have to get around repeated scope names too. The only solution I could think of was to ignore repeated scope names, and create a nameless scope whenever a repeated name is supplied. This will cause any duplicates to be essentially invisible to the rest of the program.

And finally, a few things that ASM6 doesn't have at all:

- ZP ADDRESSING OVERRIDING: ASM6 uses ZP addressing whenever possible, but when you're writing timed code, you may want to access ZP locations using absolute addressing. Maybe an address size modifier like in ca65 (a:address) is the answer.

- MEMORY BANKS: Keeping track of what's where when dealing with bank switching is a big annoyance, and doing that manually is just too error-prone. My solution is to simply create a BANK directive that you can use to set the current bank number, causing any subsequent labels to be assigned that bank number. This information can then be extracted from the labels whenever necessary. The same numbers can be used over and over, since you may need to index PRG-ROM banks, CHR-ROM banks, RAM banks, and so on.

- FUNCTIONS: As far as I can tell, ASM6 doesn't have any built-in functions, and doesn't offer any means for users to create their own. User defined functions are probably outside of the scope of my simple assembler, but a few built-in functions could be really useful. There will certainly be a function to extract bank numbers from labels, and maybe a function to test whether macro parameters are blank or not.

- TEXT OUTPUT: ASM6 can only output error messages, but other kinds of messages are also very important. The OUT directive can output text without aborting the assembly process. Since this is a multi-pass assembler, text output must be buffered during each pass, and only text generated during the last pass must be displayed.

- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).

These are all my ideas so far. If anyone was brave enough to read through all of this, please share your opinions on what I have so far. Am I missing something important? Am I doing something in a dumb way? Please comment.
Re: Writing my own assembler
by on (#227238)
I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.

Having bidirectional anonymous labels seems overkill, and more work that it is worth. They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for. It will be much more readable, and far less error prone than trying to balance the forwards and backwards. Its just pointless cryptic.

For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.

ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.
Re: Writing my own assembler
by on (#227240)
Oziphantom wrote:
I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.

I'm only using Node to run the .js file locally as a command line application (e.g. node.exe assembler.js source.asm game.nes), it's not much different from using Python, PHP, or any other scripting language. Also, Node has a file system library that really helps with reading/writing files, one of JavaScript's weaknesses.

Quote:
Having bidirectional anonymous labels seems overkill, and more work that it is worth.

I think that the anonymous labels in ca65 are in fact easier to implement than those in ASM6, not that there's a huge difference though. Since the complexity is equivalent, I'd rather go with the method I find more useful.

Quote:
They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for.

It may not be very common, but even in tiny tight loops there are cases when I need to jump to the middle from both the top and the bottom, and I don't feel like naming a label in such a little piece of logic.

Quote:
For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.

Exactly. Changing addresses sizes is the most complicated part IMO.

Quote:
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

Really? I've never seen that...

Quote:
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

Hum... that's interesting.

Quote:
It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.

I like this kind of automation. Have you seen this implemented anywhere?

Thanks for the tips.
Re: Writing my own assembler
by on (#227241)
@Oziphantom

Nodejs is often the backend for electron based app so it doesn't mean that you use a server per se.

@Tokumaru

I took the time to read but didn't answer right away (I'm half asleep at the wheel today ^^;;) but for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.

As for Bank, I guess it's just a concept with no actual size? With cc65 segments you know how big they are and knows how much is used with the map file, which is useful when looking in what used inside that segment/bank etc.

As for local label, I don't think you should have to write the @ when writing the name of the scope too. In Ca65 you just write scope::localLabelName and it works fine. The @ seems superfluous and should only be used when defined the label so the parser knows that it's local.

I may have more comments later, when I'm less sleepy :D
Re: Writing my own assembler
by on (#227242)
When it comes to local labels, I'm personally partial to the style used in NASM and FASM for x86. Local labels look like .foo, but can also be referred to as bar.foo globally, where bar is the first global label before .foo. So for instance, a subroutine with two entry points could look something like this:
Code:
entry1:
    ...
    ...
    bne entry2.loop
    ...
entry2:
    ...
.loop:
    ...
    ...
    beq .loop
    rts

Of course, whether it's too annoying to have to write entry2.loop or not is up to you.
Re: Writing my own assembler
by on (#227243)
tokumaru wrote:
Oziphantom wrote:
I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.

I'm only using Node to run the .js file locally as a command line application (e.g. node.exe assembler.js source.asm game.nes), it's not much different from using Python, PHP, or any other scripting language. Also, Node has a file system library that really helps with reading/writing files, one of JavaScript's weaknesses.

cscript.exe will run JS as well without most people having to install something is all.

tokumaru wrote:
Quote:
Having bidirectional anonymous labels seems overkill, and more work that it is worth.

I think that the anonymous labels in ca65 are in fact easier to implement than those in ASM6, not that there's a huge difference though. Since the complexity is equivalent, I'd rather go with the method I find more useful.

Quote:
They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for.

It may not be very common, but even in tiny tight loops there are cases when I need to jump to the middle from both the top and the bottom, and I don't feel like naming a label in such a little piece of logic.

yeah but _l or in your case @l is just as much to type basically ;)

tokumaru wrote:
Quote:
For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.

Exactly. Changing addresses sizes is the most complicated part IMO.

Quote:
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

Really? I've never seen that...

NES assemblers seem to have been written in vacuums by people who haven't used a traditional assembler. .w or @w are the other common methods, .w being a "They programmed for the 68K" identifier ;)

Quote:
Quote:
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

Hum... that's interesting.

Quote:
It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.

I like this kind of automation. Have you seen this implemented anywhere?

Thanks for the tips.

I've implemented it as a post process static anslyser. The hardest part was me reversing all of the TASS64 output to get back to the original code, building a call tree and once armed with this info, it was trivial to check. Then I started to get fancy with my coding and it just broke my parser. I've just spent a good week or so getting my new debugging format working so I have source code that shows the whole function as I step and it shows the me the 'local' variables. I'm going to add this "analyzer" code to it again, it was kind of handy.
Re: Writing my own assembler
by on (#227244)
Banshaku wrote:
for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.

I use anonymous labels very responsibly... I don't want to litter my code with labels for obvious things like zero checks, loop points and the like, so that's what I use anonymous labels for. I have little blocks like this:

Code:
  ;clear page RAM $03
  ldx #$00
  txa
: sta $0300
  inx
  bne :-

This is such a small piece of logic that the intent is beyond obvious, so there's no point in littering the place with dumb labels like "@Loop", "@Skip" and the like. I'll hardly jump more than 5 lines to an anonymous label, it's all very compact and with a clear comment at the top explaining what the whole block of code below is for.

Quote:
As for Bank, I guess it's just a concept with no actual size?

Yeah, it's just a number that gets attached to labels so I can easily know what bank to map in to access something. For example, I can do this for CHR banks:

Code:
  .bank $20
  .org $0000
PlayerRunning:
  .incbin "player-running.chr"

  .bank $21
  .org $0000
PlayerJumping:
  .incbin "player-jumping.chr"

And then I can just use .bank(PlayerRunning) whenever I need to reference the bank that contains the player's running graphics (and PlayerRunning >> 4 to get the offset to the first tile), and I'm free to rearrange the tiles around and move them to different banks without the fear of breaking any references.

Quote:
With cc65 segments you know how big they are and knows how much is used with the map file, which is useful when looking in what used inside that segment/bank etc.

I did consider making the BANK directive more complex, where you could define the size of the bank in addition to its number, but in the end I figured that would kill some of the flexibility that I like so much. You don't need to set every size and every address in advance, if you just do your .ORGs and .BASEs right, everything will work just fine. With a multi-pass assembler you can use symbol math for almost anything, even calculating the amount of free space in each bank.

Quote:
As for local label, I don't think you should have to write the @ when writing the name of the scope too. In Ca65 you just write scope::localLabelName and it works fine. The @ seems superfluous and should only be used when defined the label so the parser knows that it's local.

I don'k know, I kinda consider the "@" as part of the name. It's true that in ca65 you access local labels like scope::localLabel, but in ca65, ALL labels are local inside a scope, not only those beginning with "@". I don't know if you can even access a cheap local label from the outside in ca65, but if you can, I bet the "@" is needed. But anyway, scopes in my assembler will be much simpler than those in ca65.

Quote:
I may have more comments later, when I'm less sleepy :D

Great!
Re: Writing my own assembler
by on (#227245)
Before I say anything else, I'm going to quickly plug flat assembler. I'm pretty sure there's more than one assembler by that name, so please check the link! It's a self-hosted x86 assembler, but the second version, fasmg, is a generic macro assembler - you implement your own instruction sets with macros. There's already at least one 6502 implementation on the forums.

I'm mentioning it because it's extremely well-designed. The code is a wall of commentless x86 assembly code, but the author has posted a description of its internal workings, which is well-worth the read if you're writing your own assembler. I'd link it directly but I'm kind of pressed for time right now.

A few highlights relevant to tokumaru's post:

MULTIPLE PASSES

Fasm performs multiple passes of everything but macro expansion. Conditionals, for loops, while loops, error messages and user-specified debug output all "just work" as though the values were right to begin with. This is useful for much more than symbol resolving.

MACROS

tokumaru wrote:
Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler.

I strongly disagree with this. Macros can automate a lot of tedium, not just repetitive code. Implementing a good macro system can save you having to implement a whole slew of other features at all.

Fasm's macros are fairly simple: they take a comma-seperated list of arguments. The last one may optionally be variable-length. The directives "common", "forward" and "reverse" cause the section afterwards to be emitted once, once for each variable argument, and once for each variable argument in reverse, respectively. So you can do this:
Code:
macro setall value, [dest]
{
common
lda value
forward
sta dest
}

setall #0, a, b, c, d

There is one more directive, "local", which makes all instances of the listed names local to each expansion of the macro. Fasm lets you redefine constants (at the cost of disabling forward references to that constant), and macros are often used to accumulate values. The results can then be assigned to a single-use label, allowing forward references. This can do the same thing as an "enum", without requring any extra code in the assembler.

OVERLAYS

Fasm generalizes "enum" functionality into "virtual" blocks (which is a much better name IMHO). These take a starting address and turn off the output for the duration, without affecting any labels or constants generated. The "named enums" you describe can be implemented with ordinary labels inside virtual blocks. Multiple passes take care of the rest.

LOCAL LABELS

Fasm starts local labels with a dot. Local labels are automatically visible outside their scope, as a suffix of the last global label. Clearly not what you want, but I find it convenient, so I mention the idea here for completion.

REPEATED LABELS

Fasm allows you to load data from any address at assembly time with the "load" operator. You can use this with a loop to copy code into as many places as you like, wrap it in a macro to automate it, put the first copy inside a virtual section so it's never emitted, XOR encode the copies etcetera, all without any explicit support from the assembler itself.

ZP ADDRESSING OVERRIDING

Fasm lets you override the type of a variable, the length of a jump and so on by simply prefixing it with the new type - it would look something like lda zeropage $03 or lda absolute $03 for 6502 code. When the size is not specified fasm defaults to the smallest instruction length.

FUNCTIONS

Macros and redefinable labels/variables can do anything functions can do, except recursion.

TEXT OUTPUT

Fasm has a "display" directive to output arbitrary text. It runs at assembly time and thus prints (or doesn't print) according to the result of the final pass.


I have more to say (and I've been ninja'd) but I'm about to lose power again. Sorry in advance if I messed up.
Re: Writing my own assembler
by on (#227246)
tokumaru wrote:
Banshaku wrote:
for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.

I use anonymous labels very responsibly... I don't want to litter my code with labels for obvious things like zero checks, loop points and the like, so that's what I use anonymous labels for. I have little blocks like this:

Code:
  ;clear page RAM $03
  ldx #$00
  txa
: sta $0300
  inx
  bne :-

This is such a small piece of logic that the intent is beyond obvious, so there's no point in littering the place with dumb labels like "@Loop", "@Skip" and the like. I'll hardly jump more than 5 lines to an anonymous label, it's all very compact and with a clear comment at the top explaining what the whole block of code below is for.

we are not talking about that case, yes in that case you use - and no issues there. We are talking about the
Code:
  ldx #$00
- txa
  and #40
  bne +
  sta $0300,x
+
-
  inx
  sta $0300,x
  dex
  bpl -
  bmi --
case you were referring to. To which my point was do this
Code:
  ldx #$00
- txa
  and #40
  bne @l
  sta $0300,x
@l
  inx
  sta $0300,x
  dex
  bpl @l
  bmi -
Re: Writing my own assembler
by on (#227247)
Nicole wrote:
When it comes to local labels, I'm personally partial to the style used in NASM and FASM for x86. Local labels look like .foo, but can also be referred to as bar.foo globally, where bar is the first global label before .foo.

That's actually where I got he idea from, but since I absolutely need to be able to define global labels from a local scope, I went with the explicit creation of scopes via a dedicated directive, rather than implicitly creating scopes with each global label.

I don't see anything wrong with the "." notation per se, but since I'm used to having leading dots in assembler directives, that might make the code look confusing and harder to parse... I'm really used to dots meaning directives and @s meaning local labels.

Quote:
So for instance, a subroutine with two entry points could look something like this:
Code:
entry1:
    ...
    ...
    bne entry2.loop
    ...
entry2:
    ...
.loop:
    ...
    ...
    beq .loop
    rts

I guess it's doable, but it's weird. It's not a matter of being annoying to type, it's just that this is supposed to be a single block. Semantically, it doesn't make much sense.
Re: Writing my own assembler
by on (#227249)
Oziphantom wrote:
cscript.exe will run JS as well without most people having to install something is all.

It's also about 128 times slower, it seems. I've used cscript.exe in the past, but was less than impressed with its file system library (I remember having to use hacks in order to work with binary files!) and its performance. I also think it's badly outdated. Plus it's Windows only. Node.js on the other hand has the newest JavaScript features, tons of libraries, and I can even run it on my phone and develop anywhere. On Windows it's just a 15Mb download, and you don't even have to install anything, just decompress the .zip and use it. In today's world, where everything has to be installed and configured after GBs of downloads, and every piece of software thinks it owns your entire PC, I consider that a win!

Quote:
To which my point was do this
Code:
  ldx #$00
- txa
  and #40
  bne @l
  sta $0300,x
@l
  inx
  sta $0300,x
  dex
  bpl @l
  bmi -

Yeah, but that break is ugly, it makes a single task look like 2 tasks, it affects readability for me. I'd much rather do this:

Code:
  ;do something
  ldx #$00
: txa
  and #40
  bne :+
  sta $0300,x
: inx
  sta $0300,x
  dex
  bpl :-
  bmi :--

That's just a matter of personal preference.
Re: Writing my own assembler
by on (#227250)
Rahsennor wrote:
Before I say anything else, I'm going to quickly plug flat assembler.

Thanks for the link, I'll check it out. Seeing what other assemblers do really helps.

Quote:
tokumaru wrote:
Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler.

I strongly disagree with this. Macros can automate a lot of tedium, not just repetitive code. Implementing a good macro system can save you having to implement a whole slew of other features at all.

I agree with you in general, what I said was just describing how macros will be in my assembler, very bare-bones. While I agree that macros can be incredibly useful for a number of purposes, I don't have the time to make a complex assembler. I experimented a lot with ca65 macros, and have done a lot with them, but nearly all of that work was to implement the features I'm describing here, and since they'll be built-in this time around, I don't need a complex macro system right now.

Quote:
Fasm's macros are fairly simple: they take a comma-seperated list of arguments. The last one may optionally be variable-length. The directives "common", "forward" and "reverse" cause the section afterwards to be emitted once, once for each variable argument, and once for each variable argument in reverse, respectively.

Sounds interesting, but not very intuitive to read!

Quote:
OVERLAYS

Fasm generalizes "enum" functionality into "virtual" blocks (which is a much better name IMHO). These take a starting address and turn off the output for the duration, without affecting any labels or constants generated. The "named enums" you describe can be implemented with ordinary labels inside virtual blocks. Multiple passes take care of the rest.

Sounds just like ENUM! I was not a big fan of the name "enum", but I guess it makes sense, since it's effectively just incrementing a counter after each symbol.

Quote:
LOCAL LABELS

Fasm starts local labels with a dot. Local labels are automatically visible outside their scope, as a suffix of the last global label. Clearly not what you want, but I find it convenient, so I mention the idea here for completion.

I kinda like this system, but for multiple entry points, I prefer having explicit control over when scopes start and end, rather than let the global labels define that.

Quote:
REPEATED LABELS

Fasm allows you to load data from any address at assembly time with the "load" operator. You can use this with a loop to copy code into as many places as you like, wrap it in a macro to automate it, put the first copy inside a virtual section so it's never emitted, XOR encode the copies etcetera, all without any explicit support from the assembler itself.

This is really cool. Being able to replicate the binary data is a pretty interesting idea.

Quote:
ZP ADDRESSING OVERRIDING

Fasm lets you override the type of a variable, the length of a jump and so on by simply prefixing it with the new type - it would look something like lda zeropage $03 or lda absolute $03 for 6502 code. When the size is not specified fasm defaults to the smallest instruction length.

That's a good approach, similar to how ca65 does it.

Quote:
FUNCTIONS

Macros and redefinable labels/variables can do anything functions can do, except recursion.

But certain things are way more convenient when done inline, like lda #BANK(SomeLabel).

Quote:
Sorry in advance if I messed up.

That's fine. Thanks for bringing in new ideas.
Re: Writing my own assembler
by on (#227251)
Got a few more minutes.

tokumaru wrote:
I don't have the time to make a complex assembler.

That's exactly my motivation for wanting good (!= complex) macros in my assembler. Along with the multipass stage, they can cover many of the features I'd otherwise have to hardcode.

tokumaru wrote:
Sounds interesting, but not very intuitive to read!

You get used to it, and it looks better with proper formatting. But yes, even Tomasz has expressed regret at the syntax and changed it in fasmg. I don't remember how the new version works though, and I didn't have time to write a proper example or copypaste tabs.

tokumaru wrote:
But certain things are way more convenient when done inline, like lda #BANK(SomeLabel).

Oh, I see what you mean now. I'd just build banks into the assembler and not worry about general-purpose functions. Fasm has no functions, only operators, and a few platform-specific features, like allowing label addresses to be relative to a register (very useful for the stack). Banks would fall into that category, I would think.
Re: Writing my own assembler
by on (#227252)
tokumaru wrote:
Quote:
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

Hum... that's interesting.

That's possible in rgbasm from RGBDS, an assembler targeting the Game Boy CPU.

Oziphantom wrote:
tokumaru wrote:
I'm only using Node to run the .js file locally as a command line application

cscript.exe will run JS as well without most people having to install something is all.

Last I checked, cscript.exe was exclusive to Microsoft Windows. I don't run Windows on my primary dev machine; nor does calima. Are there tips for writing a script to make it work on both cscript.exe (for users of Windows) and Node.js (for users of GNU/Linux and macOS)?

Oziphantom wrote:
tokumaru wrote:
Quote:
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

Really? I've never seen that...

NES assemblers seem to have been written in vacuums by people who haven't used a traditional assembler. .w or @w are the other common methods, .w being a "They programmed for the 68K" identifier ;)

ca65 has lda $00 for zero page, lda a:$00 for absolute, and lda f:$00 for 65816-exclusive absolute long. In my opinion, 68000's .w is completely different because it specifies data size, whereas $00 vs. a:$00 vs. f:$00 is about address size. And I'd recommend against ~$00 notation because ~ is already in use to mean one's complement.
Re: Writing my own assembler
by on (#227254)
You're never going to finish a NES game if you spend all your time making tools :P

With that said, I'd like it if all labels used the anonymous label +/- syntax. For example, if you have multiple labels with the same name, you can use +/- to distinguish them:

Code:
jmp foo:+
foo:
jmp foo:++
foo:
foo:
jmp foo:---


In other words, have labels and anonymous labels behave the same. Don't special case either.
Re: Writing my own assembler
by on (#227255)
pubby wrote:
You're never going to finish a NES game if you spend all your time making tools :P

With that said, I'd like it if all labels used the anonymous label +/- syntax. For example, if you have multiple labels with the same name, you can use +/- to distinguish them:

Code:
jmp foo:+
foo:
jmp foo:++
foo:
foo:
jmp foo:---


In other words, have labels and anonymous labels behave the same. Don't special case either.


That seems like it makes the code very hard to follow. I don't even fully understand which jmp goes to which foo in your example :)
Re: Writing my own assembler
by on (#227256)
Quote:
The problem with that is that all these extra labels will get exported to label files along with the ones that are actually relevant for debugging. Maybe the solution is not to change ENUM, but to take a cue from ca65 and implement two forms of assignment for symbols, one that marks the symbol as a label (:=) and one that doesn't (=).


I had a similar problem and ended up forking asm6 and adding RAM/ENDRAM (as well as WRAM/ENDWRAM/SRAM/ENDSRAM, since Mesen seems to differentiate) which is behaves just like ENUM/ENDE does. That way, anything defined with EQU, =, or in an ENUM is not exported to the label file.

Years ago I attempted my own assembler, and I took hints on the macro system from (I think) nesasm, where you could do something like this:

Code:
MACRO addvtop
   lda pos@0, X
   clc
   adc vel@0, X
   sta @1
ENDM

...and then in code...

   addvtop _x, t0
   addvtop _y, t1

...which would resolve to...

   lda pos_x, X
   clc
   adc vel_x, X
   sta t0

   lda pos_y, X
   clc
   adc vel_y, X
   sta t1


It did limit your arguments to 10 (@0 ... @9), but as far as I've tried, asm6 cannot handle something like this.

edit:

The other thing you might consider supporting is reading in a rom file first, overwriting it with the code your assembler produces, and then writing that to the outfile. This is useful for rom hacking, and the only reason I still use my assembler from time to time.
Re: Writing my own assembler
by on (#227257)
tokumaru wrote:
- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).


I _love_ this. I always waste time on a project building a python script to handle converting text to whatever character mapping my game is using. Having that built-in to the assembler sounds great.
Re: Writing my own assembler
by on (#227259)
gauauu wrote:
I always waste time on a project building a python script to handle converting text to whatever character mapping my game is using.

For simpler projects, I understand how preprocessing text with a script in a scripting language might feel wasteful. But for bigger projects, it's anything but.

In theory, an assembler could use this sort of mapping from multiple characters to one character to support UTF-8 input, where the multiple code units (that is, bytes) that represent a character get translated to a single code unit. It could also apply a dictionary, where commonly encountered groups of letters get translated into shorter groups. But it can't very easily calculate an appropriate dictionary given only a (suitably long) text. I have a preprocessor written in Python to do that; my NES and GB ports of 240p Test Suite and the next versions of Thwaite and the Action 53 menu all use a byte pair encoding (BPE)/digram tree encoding (DTE) engine that I originally wrote for my port of robotfindskitten.

A preprocessor also allows non-programmers, such as the translator you hired to prepare versions in other languages, to edit the text without breaking invariants that your program expects. And you'd need a pretty rich macro system to handle line breaking, pagination, stage directions for NPCs, hyperlinks for your dialogue tree, and other things that tend to get interleaved into your text. One "meant for consolidating repetitive assembly code, not for extending the functionality of the assembler" can't handle it alone.
Re: Writing my own assembler
by on (#227261)
FWIW - The people in my office that work in JS have switched our projects to using TypeScript. From what I understand it allows you to be more explicit about your intent when writing code, which enables much better compile time type checking. I have never done any real projects in JS, but every time I use it I think that it must be a nightmare to keep things clean on a large project, and I think TypeScript helps with that.

If you're taking feature requests.... :D
Something I've been struggling with as I learn assembly is that I haven't found a "nice" way to do if/else, it seems like it just turns in to a mess of label soup. Maybe this could be smoothed over with some syntactic sugar?
Re: Writing my own assembler
by on (#227264)
pubby wrote:
You're never going to finish a NES game if you spend all your time making tools :P

True.

Quote:
In other words, have labels and anonymous labels behave the same. Don't special case either.

I kinda like this idea! For anyone who thinks this is confusing, just don't use the feature.

never-obsolete wrote:
I had a similar problem and ended up forking asm6 and adding RAM/ENDRAM (as well as WRAM/ENDWRAM/SRAM/ENDSRAM

That feels way too platform-specific to me, I'm trying to keep things as generic as possible.

Quote:
Years ago I attempted my own assembler, and I took hints on the macro system from (I think) nesasm

You know, NESASM gets a bad rap around these parts, but it actually has some very interesting features that are often overlooked. Too bad it has some quirks that make it unusable to me.

Quote:
The other thing you might consider supporting is reading in a rom file first, overwriting it with the code your assembler produces, and then writing that to the outfile. This is useful for rom hacking, and the only reason I still use my assembler from time to time.

This is an interesting feature, and not hard to implement at all. Maybe an INCBIN that doesn't update the PC is all it takes.

samophlange wrote:
If you're taking feature requests.... :D

Unless they're really simple to implement or seem really useful to me, no, I'm not! :lol:

Quote:
Something I've been struggling with as I learn assembly is that I haven't found a "nice" way to do if/else, it seems like it just turns in to a mess of label soup. Maybe this could be smoothed over with some syntactic sugar?

One of the reasons I like assembly so much is that it isn't bound to the constructs of high-level languages. You can jump anywhere you want, take shortcuts, bypass instructions by making them look like operands, all sorts of convenient little things to get get that extra power from these limited pieces of hardware we write code for. So for me particularly, simulating high-level constructs isn't appealing at all. I'm pretty sure this can be done with macros in most assemblers, though.
Re: Writing my own assembler
by on (#227270)
tokumaru wrote:
That feels way too platform-specific to me, I'm trying to keep things as generic as possible.


I would advise to write a quick and dumb assembler first, even platform specific, for your own purpose. Without trying to be platform independent. Simple multi pass assembler without complex macroses (like recursion) can be written relatively easily without complex parser/tokenizer. Because you can do it fast, you can finish it, you can test and see how it works for you. Simple single pass assembler can be done in a week or less including research.

And only after start from scratch with abstractions from platform and more advanced features. Otherwise you may end up in the process of never ending perfection of the tool and never finish it. Doing prototype will allow you to try your ideas dirty and quickly, instead of spending week on the feature you will later find useless or flawed.
Re: Writing my own assembler
by on (#227293)
tepples wrote:
Oziphantom wrote:
tokumaru wrote:
I'm only using Node to run the .js file locally as a command line application

cscript.exe will run JS as well without most people having to install something is all.

Last I checked, cscript.exe was exclusive to Microsoft Windows. I don't run Windows on my primary dev machine; nor does calima. Are there tips for writing a script to make it work on both cscript.exe (for users of Windows) and Node.js (for users of GNU/Linux and macOS)?
Seeing as people use JS for so much I had assumed it had evolved enough to be practical, but I see that is mostly incompetent still and you need CS extensions or node or insert a bunch of others to get stuff done. I was thinking one could just write JS and you could use cscript to run neat JS, and then linux/mac could use node if need be or some other js engine. Seems you can't.


never-obsolete wrote:
The other thing you might consider supporting is reading in a rom file first, overwriting it with the code your assembler produces, and then writing that to the outfile. This is useful for rom hacking, and the only reason I still use my assembler from time to time.

Basically any assembler can do this, TASS64 for example does it.
Code:
*=$0801
.binary "original.bin"

*=$1000
<patch code goes here>

*=$1800
.byte $2c ; skip command here

samophlange wrote:
Something I've been struggling with as I learn assembly is that I haven't found a "nice" way to do if/else, it seems like it just turns in to a mess of label soup. Maybe this could be smoothed over with some syntactic sugar?
You might want to try the https://github.com/Museum-of-Art-and-Di ... nt/macross assembler its the assembler for people who don't like assembly. You can't really do IF ELSE in an assembler as that is a structure system, so at some point it has to know where to jump to, to skip the "else" which is not something assemblers really do, it needs a label. However I would think if one tried you could get it to work with Macros in TASS64, but maybe not with nesting....
Re: Writing my own assembler
by on (#227297)
gauauu wrote:
tokumaru wrote:
- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).


I _love_ this. I always waste time on a project building a python script to handle converting text to whatever character mapping my game is using. Having that built-in to the assembler sounds great.

This is hardly anything new, I use this feature in WLA-DX and other assemblers probably already have it.
Re: Writing my own assembler
by on (#227298)
ca65 and cc65 support remapping also, although not in the condensed format above. See the .charmap pseudo-op and #pragma charmap
Re: Writing my own assembler
by on (#227299)
Bregalad wrote:
gauauu wrote:
tokumaru wrote:
- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).


I _love_ this. I always waste time on a project building a python script to handle converting text to whatever character mapping my game is using. Having that built-in to the assembler sounds great.

This is hardly anything new, I use this feature in WLA-DX and other assemblers probably already have it.

The problem with WLA-DX is it only lets you have one definition.. sub par.
Re: Writing my own assembler
by on (#227303)
pubby wrote:
You're never going to finish a NES game if you spend all your time making tools :P

I'll drink to that. :beer:
Now back to my tools...
Re: Writing my own assembler
by on (#227304)
The Wikipedia article on lexical analysis seemed helpful when I started writing an assembler.

Also, it would be nice if the assembler supported the bit shift operators in expressions. (Ophis seems to lack them, perhaps because the characters "<" and ">" have other uses.)

Edit: removed a feature request
Re: Writing my own assembler
by on (#227308)
qalle wrote:
The Wikipedia article on lexical analysis seemed helpful when I started writing an assembler.

Back in the days when I rolled my own assembler for NESICIDE I used Lex/Yacc. Looking back on it, it was so much fun I might try to replicate the experience with ANTLR. I see ANTLR already has a contributed 6502 grammar file.
Re: Writing my own assembler
by on (#227311)
cpow wrote:
Back in the days when I rolled my own assembler for NESICIDE I used Lex/Yacc. Looking back on it, it was so much fun I might try to replicate the experience with ANTLR. I see ANTLR already has a contributed 6502 grammar file.


Writing new assembler will require new grammar. But this looks like a good starting template.

I still think it worth to hack around and write assembler without parser. Just because it allows to prototype fast, and see results and scrap ideas that don't work. Like this dumb one pass assembler I wrote in 2 hours yesterday. It is limited, no forward labels, but it works and can compile hello world (http://wiki.nesdev.com/w/index.php/Prog ... 22_program).

But it is also ~200 lines of code and easy to manage and change. Adding/removing a feature from the project is more tedious task.
Re: Writing my own assembler
by on (#227314)
yaros wrote:
I still think it worth to hack around and write assembler without parser.

I'll never understand the logic of "I can write a better parser from scratch than any of the parser generators available to me." Like any tool, a parser generator allows you to focus on the meat and potatoes, not the plate.
Re: Writing my own assembler
by on (#227315)
cpow wrote:
I'll never understand the logic of "I can write a better parser from scratch than any of the parser generators available to me." Like any tool, a parser generator allows you to focus on the meat and potatoes, not the plate.


Please, I never said one should write parser from scratch. I said it is possible to prototype without parser, because assembler is simple language. And without concrete plan it worth to have something within a day and test the approach.

Proper compiler should be properly parsed with defined grammar. As I said in the previous post.

yaros wrote:
I would advise to write a quick and dumb assembler first, even platform specific, for your own purpose.
...
And only after start from scratch with abstractions from platform and more advanced features. Otherwise you may end up in the process of never ending perfection of the tool and never finish it.
Re: Writing my own assembler
by on (#227319)
I don't plan on using any libraries right now besides the ones I'm required to in order to do basic tasks like interacting with the file system. I am taking a lot of shortcuts in this first moment though... For example, I'm not parsing expressions manually, I'm using JavaScript's eval() for this (don't judge me, this is for my personal use!), after using regular expressions to convert expressions like $8004 + <MyLabel into 0x8004 + executeFunction("<", getSymbol("MyLabel")). The functions "executeFunction" and "getSymbol" will be responsible for returning the correct values or throwing errors when appropriate.
Re: Writing my own assembler
by on (#227320)
cpow wrote:
I'll never understand the logic of "I can write a better parser from scratch than any of the parser generators available to me." Like any tool, a parser generator allows you to focus on the meat and potatoes, not the plate.

Generators are cool and all, but I don't think many languages use them nowadays. The big players tend towards hand-written recursive descent, as it's so much more flexible and provides better error handling. Plus they're easy to write.
Re: Writing my own assembler
by on (#227327)
When I wrote my CompressTool utility which is basically an assembler without opcodes (only .db statements are supported) (*), I programmed everything from scratch. It would have been more complex to learn to use a parser correctly than to write my own, for the simple things I needed to do.

(*) That assembler also has the utility that you can compress data on the fly which is quite useful, especially for pointers/references within data. It also supports changing the character mapping, which is a rather easy feature to support, although supporting UTF-8 -> 8-bit character maping is a bit more tricky.
Re: Writing my own assembler
by on (#227332)
Bregalad wrote:
It would have been more complex to learn to use a parser correctly than to write my own, for the simple things I needed to do.

I'll stop dropping my opinions on random threads. :beer:
Re: Writing my own assembler
by on (#227335)
Opinions are valuable, even if everyone disagrees with them. :wink:

I agree that the amount of parsing in an assembler is fairly minor, and for those of us who don't already know how to integrate an existing parser, it may actually be faster to code our own. It doesn't have to be "better" than what's available, it just has to the work you need to be done.
Re: Writing my own assembler
by on (#227392)
Darn, this thread is just making me itch to take a stab at making my assembler as well. I just might give it a go. :D

I've had some ideas floating around for a long time, so I figure I might share them here if you are interested. I'm not sure if all of these ideas are realistic, I haven't tried to implement them myself anywhere yet.

1. By far my most common label is a @Return: label in front of a nearby RTS statement. Maybe it's my style of coding, but I find that subroutines always has some branching conditions that exits early. So I realized it would be pretty nifty if I could just write a branch jump like this:

Code:
LDA MyVar
BEQ RTS
STA MyVar2


And the assembler would just treat a nearby RTS statement as a on-the-fly label destination (or throw an error if there are none within range), so you wouldn't have to put a @Return: label there. It's not really any new kind of functionality, just a sort of "auto-label" thing to make the code less verbose.

2. Sometimes as a programmer you can do things that the assembler can actually see is stupid. Like putting code in a bank, that uses a label from another bank (which is on the same page). Or if you have an absolute instruction with an Int instead of a label, which I'm guessing in 99.99% of cases is just the programmer forgetting a # symbol before the Int. It would be neat if the assembler could tell you about such mistakes.

3. Labels do a LOT of different jobs in asm code. They act as entry-points for subroutines. They act as starting offsets for data tables. And they act as holders of constant gameplay values. Sometimes I wish there was a way to mark a label as to what kind of job it does, and have the assembler throw an error at me if I'm trying to use it in a different way. So you can't JSR GHOST_ID since the label holds a constant value and not an address to a subroutine, and you can't LDA GHOST_INIT since the label holds an address to a subroutine. (Obviously sometimes you need to do tricky things like a RTS trampoline so there need to be a way to tell the assembler to not go bananas over it on specific lines).
Re: Writing my own assembler
by on (#227393)
Drakim wrote:
Code:
LDA MyVar
BEQ RTS
STA MyVar2


Wouldn't one label anywhere in your code suffice?
Code:
AlwaysJustReturnFromThisLabel: RTS
...
LDA MyVar
BEQ AlwaysJustReturnFromThisLabel
STA MyVar2

What am I missing? [Must be something obvious.] Of course, your label could be _RTS: or something. But that depends on [dare I go back to it] whether your parser is able to ignore keywords as part of symbols.
Re: Writing my own assembler
by on (#227396)
cpow wrote:
Wouldn't one label anywhere in your code suffice?


Relative branches can only jump a certain distance in your code. -126 to +129 bytes worth of opcodes I believe? You definitely won't be able to have one "global" RTS that you reuse all the time. That's why different branches all need to find their own nearby RTS instruction, which is something the assembler could do for you.

You can go anywhere with a vanilla JMP though, so it could use a global RTS.
Re: Writing my own assembler
by on (#227398)
Yeah, branches have extremely limited reach.

Drakim wrote:
You can go anywhere with a vanilla JMP though, so it could use a global RTS.

Why would you JMP to an RTS and waste 3 bytes and 3 cycles if you can RTS on the spot with 1 byte?
Re: Writing my own assembler
by on (#227401)
tokumaru wrote:
Why would you JMP to an RTS and waste 3 bytes and 3 cycles if you can RTS on the spot with 1 byte?


Haha, good point! I was only thinking in terms of cpow's idea for a global RTS, which is possible for a JMP, but as you say, utterly pointless.
Re: Writing my own assembler
by on (#227402)
Drakim wrote:
1. By far my most common label is a @Return: label in front of a nearby RTS statement. Maybe it's my style of coding, but I find that subroutines always has some branching conditions that exits early. So I realized it would be pretty nifty if I could just write a branch jump like this:

Code:
LDA MyVar
BEQ RTS
STA MyVar2


And the assembler would just treat a nearby RTS statement as a on-the-fly label destination (or throw an error if there are none within range), so you wouldn't have to put a @Return: label there. It's not really any new kind of functionality, just a sort of "auto-label" thing to make the code less verbose.


What would be less ideal about an opposite branch over RTS right there (maybe hide it in a macro)?
Re: Writing my own assembler
by on (#227403)
I have tons of "return" labels myself, but I don't think I'd create this kind of exception just to save a little bit of typing.

I'm not opposed to verbosity in general, I'm opposed to error-prone verbosity and redundant verbosity.
Re: Writing my own assembler
by on (#227404)
Hangin10 wrote:
What would be less ideal about an opposite branch over RTS right there (maybe hide it in a macro)?


I'll write an example to demonstrate how things become less verbose:

Code:
DoTheMario:
  LDA DancerId
  CMP #MarioId
  BNE @Return
  ; Lots of dancing code here
  @Return:
  RTS


Turns into...

Code:
DoTheMario:
  LDA DancerId
  CMP #MarioId
  BNE RTS
  ; Lots of dancing code here
  RTS


Or if the dancing code is so big that we can't get there in one jump, we might have to have the RTS above our label, which is super annoying depending on your assembler. Some assemblers can make it easier, but this is the best you can get:

Code:
  -Return:
  RTS
DoTheMario:
  LDA DancerId
  CMP #MarioId
  BNE -Return
  ; Too much dancing code here for a branch jump
  RTS


Turns into...

Code:
  RTS
DoTheMario:
  LDA DancerId
  CMP #MarioId
  BNE RTS
  ; Too much dancing code here for a branch jump
  RTS


It's no big revolution, but it eliminates some "braindead" labels. I'm not sure if you could implement the same with a macro?
Re: Writing my own assembler
by on (#227405)
Nevermind me. I get it now and could see it going either way.
Re: Writing my own assembler
by on (#227407)
Drakim wrote:
I'm not sure if you could implement the same with a macro?

Fasm can use its load operator to hunt for a suitable byte within range.
Re: Writing my own assembler
by on (#227410)
Hangin10 wrote:
What would be less ideal about an opposite branch over RTS right there (maybe hide it in a macro)?

Loss of 1 byte, plus loss of 1 cycle if the no-return case is less likely than the return case. An assembler-level "find the nearest RTS instruction" feature would make it almost as convenient and efficient as conditional return on an 8080, LR35902, Z80, or ARM.
Re: Writing my own assembler
by on (#227412)
yeah, I was confusing his assembler feature with the talk of the global jmp/label ret.
Re: Writing my own assembler
by on (#227414)
I use Node.js for command-line programs too (it doesn't need to be used for servers or GUI; as mentioned before, it is just another programming language and you could also use Python, Perl, or PHP). I think that Windows Script Host does not implement many ES6 features though? If you are writing a assembler in JavaScript you will likely want byte arrays.

I like the relative labels in Knuth's MIXAL and MMIXAL. If a label name is a digit and then H then you can access the nearest such label backward or forward by the number and then B or F respectively. (MMIXAL is strange and uses only a single pass though; forward references are resolved at load time instead. However, the same relative label format could be used in multi-pass assemblers too.)

I also like the nonstandard syntax used in NESASM/MagicKit (indirect addressing uses square brackets, and zero-page addressing is explicit), but maybe you prefer the standard syntax.

I also tend to use macros to define jump tables and so on, rather than doing them manually, meaning a simple macro system might not do (although if the assembler is written in JavaScript, it would be possible to support extensions also written in JavaScript without too much difficulty).
Re: Writing my own assembler
by on (#227417)
zzo38 wrote:
it is just another programming language and you could also use Python, Perl, or PHP).

Exactly. You download the interpreter, and run your script through it, it's the same thing.

Quote:
I think that Windows Script Host does not implement many ES6 features though?

That's what I meant by "outdated" a few posts ago. I think it's an old version of JavaScript, without little support for binary data and file manipulation.

Quote:
If you are writing a assembler in JavaScript you will likely want byte arrays.

Not only that, but being a popular tool that's actively maintained, there are several modules for all kinds of things you might need.

Quote:
I also like the nonstandard syntax used in NESASM/MagicKit (indirect addressing uses square brackets, and zero-page addressing is explicit), but maybe you prefer the standard syntax.

While square brackets for indirection makes a lot of sense in assembly (more than parentheses, I agree), there's just too much 6502 code out there using a standard that's probably as old as the CPU itself, and a change like that causes unnecessary confusion IMO.

Quote:
I also tend to use macros to define jump tables and so on, rather than doing them manually, meaning a simple macro system might not do

I often use macros to help with this kind of thing too.

Quote:
(although if the assembler is written in JavaScript, it would be possible to support extensions also written in JavaScript without too much difficulty)

Yeah, I'm considering allowing user-defined JavaScript funcions. Nothing is more versatile than a full programming language at your disposal.
Re: Writing my own assembler
by on (#227423)
tokumaru wrote:
Yeah, I'm considering allowing user-defined JavaScript funcions. Nothing is more versatile than a full programming language at your disposal.


I like that idea a lot. So many times I'm torn between trying to wrestle macros into doing something that would be easier with a full programming language, and saying "forget it" and just running my own custom pre-processor (written in perl or python or something) over my code. This could be the best of both worlds.
Re: Writing my own assembler
by on (#227424)
Drakim wrote:
Darn, this thread is just making me itch to take a stab at making my assembler as well. I just might give it a go. :D

I've had some ideas floating around for a long time, so I figure I might share them here if you are interested. I'm not sure if all of these ideas are realistic, I haven't tried to implement them myself anywhere yet.

1. By far my most common label is a @Return: label in front of a nearby RTS statement. Maybe it's my style of coding, but I find that subroutines always has some branching conditions that exits early. So I realized it would be pretty nifty if I could just write a branch jump like this:

Code:
LDA MyVar
BEQ RTS
STA MyVar2


And the assembler would just treat a nearby RTS statement as a on-the-fly label destination (or throw an error if there are none within range), so you wouldn't have to put a @Return: label there. It's not really any new kind of functionality, just a sort of "auto-label" thing to make the code less verbose.

2. Sometimes as a programmer you can do things that the assembler can actually see is stupid. Like putting code in a bank, that uses a label from another bank (which is on the same page). Or if you have an absolute instruction with an Int instead of a label, which I'm guessing in 99.99% of cases is just the programmer forgetting a # symbol before the Int. It would be neat if the assembler could tell you about such mistakes.

3. Labels do a LOT of different jobs in asm code. They act as entry-points for subroutines. They act as starting offsets for data tables. And they act as holders of constant gameplay values. Sometimes I wish there was a way to mark a label as to what kind of job it does, and have the assembler throw an error at me if I'm trying to use it in a different way. So you can't JSR GHOST_ID since the label holds a constant value and not an address to a subroutine, and you can't LDA GHOST_INIT since the label holds an address to a subroutine. (Obviously sometimes you need to do tricky things like a RTS trampoline so there need to be a way to tell the assembler to not go bananas over it on specific lines).

I've been asking Soci for 1 for a year and a half, but we didn't really come up with a nice way to do it.. that looks perfect :D

Tass64 does 2 and 3 already, see -wImmediate and -wShadow warning options ;)
Re: Writing my own assembler
by on (#227426)
tokumaru wrote:
zzo38 wrote:
I also like the nonstandard syntax used in NESASM/MagicKit (indirect addressing uses square brackets, and zero-page addressing is explicit), but maybe you prefer the standard syntax.

While square brackets for indirection makes a lot of sense in assembly (more than parentheses, I agree), there's just too much 6502 code out there using a standard that's probably as old as the CPU itself, and a change like that causes unnecessary confusion IMO.

Yeah 65816 uses [] as well so you have
lda (zp),y
lda [zp],y
and those mean different things, so not wise to change the brackets, as it may cause issues and get people confused with other 65(X)XX lines.

tokumaru wrote:
zzo38 wrote:
(although if the assembler is written in JavaScript, it would be possible to support extensions also written in JavaScript without too much difficulty)

Yeah, I'm considering allowing user-defined JavaScript funcions. Nothing is more versatile than a full programming language at your disposal.
See KickAss Assembler it is kind of a scripting language/assembler hybrid nobody really knows what it is, it kind of became a mess, but there are people who swear by it.
Re: Writing my own assembler
by on (#228039)
Are you planning on making this open source? :)
Re: Writing my own assembler
by on (#228042)
I don't know, I first have to finish writing the thing, and I still have a long way to go. But being JavaScript, it'd probably be simpler to share the source than to package the native code generated by the V8 engine. I heard that there are tools to do that, but the performance is worse than simply using Node.js. Anyway, if there's any demand for it, I'll definitely consider it.
Re: Writing my own assembler
by on (#228254)
One thing that's been keeping me from moving forward with this is that I can't think of a good syntax to right-align code. In ASM6 you can do it with .ORG and label math, which's a bit cumbersome and pollutes the label table with stuff you don't need, so I really wanted to come up with a dedicated solution. One of the things that came to mind was changing the way .ORG works, so that not only it sets the PC for what comes after it, but also for what comes before it, if the PC is undefined at that point. If that was the case, you could write the following at the beginning of your source file:

Code:
Label:
   jmp Label
   .org $10000

And you'd get this:

Code:
$fffd: jmp $fffd

Then, to be able to right-align code anywhere, all you'd need is a directive to "forget" the PC, so it can be defined by a future .ORG statement:

Code:
   .org $8000
   ;code starting at $8000 goes here
   .forgetpc
   ;code to right-align to $10000 goes here
   .org $10000

To me that's as clean as it gets, but I don't know what would happen if .BASE, another directive that changes the PC is used while the PC is undefined. I guess .BASE can also set the PC for the preceding code, but without padding. But what would .PAD do when the PC is undefined? Maybe I should get rid of .PAD and only use .ORG for padding if I really need to.

Anyway, can anyone think of better syntax for right-aligning code?
Re: Writing my own assembler
by on (#228259)
I did think of simple solutions for other problems though:

Repeated labels: I will simply allow labels to repeat if they're defined with two colons rather than one (i.e. SomeLabel:: instead of SomeLabel:). This seemed like a good solution because regular labels will keep working the same way, and users can choose which labels can be reassigned. This is also really easy to implement. You just have to be careful when using these labels, because the assembler will not check if the multiple addresses are the same.

Local label scope: The only real problem I have with local scopes being delimited by global labels is that sometimes you need part of a subroutine to be above the global label that defines the entry point. To solve this in a non-intrusive way, I decided to create a directive that explicitly starts a new scope, but the name of that scope is defined by the next global label that's found. It works like this:

Code:
  .scope ;starts a new scope, but we don't know what the parent label is yet

.return:
  rts

Ignore45: ;oh, so this is the parent label in this scope

  cmp #45
  beq .return
  ;rest of subroutine

If you don't use the new directive, global labels will continue to start new scopes, as usual.
Re: Writing my own assembler
by on (#228260)
I don't know what "right-aligning code" means in this context. To me that just looks like padding used for some form of alignment.

Have you looked at x816's documentation? The implementation/model there should alleviate some of your concerns/issues here, and relieve you of your blocker regarding what to do if someone specifies code before the very first .org directive:

Code:
.ORG
Define origin address.
Sets the starting address of the source file.  X816 will
not assemble any code until this directive is found.

This is really the best choice. Honest. The proposal you have (to allow code specified before the first .org, but based on what that .org line says) makes no sense and will confuse everyone who uses this tool. Likewise, .forgetpc makes absolutely no sense -- there's no need for it, just let .org dictate things, and don't allow people to write actual code before the first .org statement. Problem solved.

A copy of x816's manual is here, along with several other manuals from assemblers. Just remember that x816 was intended for 65816 (which supports 24-bit addressing and native banks), but it should give you some good ideas on how to do things (like .base and how to handle some scope-related bits): https://www.dropbox.com/sh/15z9w4v0s6h7 ... Knina?dl=0
Re: Writing my own assembler
by on (#228261)
Right aligning means aligning code to an upper address, useful when you use a mapper that swaps the entire 32KB and you need to simulate a fixed bank near the CPU vectors, containing a reset stub, trampoline routines and other things.

No assembler I know of is equipped to do this easily, so people either use cumbersome hacks, or definine a constant size for their "fixed" banks, solutions that are far from optimal.

Also, I disagree that my proposed solutions are confusing, because I'm intentionally trying to think of solutions that don't affect the common ways of doing things. Don't like the new directives? Don't use them, and things will behave as they always did (as much as there is a standard for these things, anyway). But even if I was changing things radically, I made it very clear since the beginning that this isn't meant to please anyone, this is mostly for my own use.
Re: Writing my own assembler
by on (#228262)
tokumaru wrote:
Right aligning means aligning code to an upper address, useful when you use a mapper that swaps the entire 32KB and you need to simulate a fixed bank near the CPU vectors, containing a reset stub, trampoline routines and other things.


This sounds pretty nice, and I could definitely see myself using it. That said, why does the simulated fixed bank need to be at the upper-end near the vectors? I just always put mine first-thing (ie left-aligned). Is there some disadvantage of how I'm doing it? (asking in good faith, not trying to pick nits and argue)
Re: Writing my own assembler
by on (#228263)
gauauu wrote:
That said, why does the simulated fixed bank need to be at the upper-end near the vectors? I just always put mine first-thing (ie left-aligned). Is there some disadvantage of how I'm doing it?

To me personally, it makes sense to put the fixed stuff up there because of the CPU vectors, which are in the same category (i.e. thing that must be present in all banks), but what seals the deal for me is that I use the beginning of the bank for subroutines with timed code, or data that has to be aligned to memory pages for timing reasons, because it's easier align code/data to page boundaries there.
Re: Writing my own assembler
by on (#228264)
tokumaru wrote:
Repeated labels: I will simply allow labels to repeat if they're defined with two colons rather than one (i.e. SomeLabel:: instead of SomeLabel:).

That might be confused with RGBDS's double colon export syntax. man 5 rgbasm says these are equivalent:
Code:
SomeLabel::
;is the same thing as this
SomeLabel:
  export SomeLabel
Re: Writing my own assembler
by on (#228268)
Yes, I realize that double colons have been used for other purposes, and I don't care. If I cared about how every assembler has used every symbol, there'd be nothing left for me to use.

Repeated colon, repeated label, it just makes sense to me.
Re: Writing my own assembler
by on (#228270)
If it's something you specify in the file like .org, I would think that a right align would have two directives, one that opens it with padding, and a second one that closes it with an address to meet.

Though TBH I think a CC65 linker config property on a segment would be a lot better way to express it.

tokumaru wrote:
Right aligning means aligning code to an upper address, useful when you use a mapper that swaps the entire 32KB and you need to simulate a fixed bank near the CPU vectors, containing a reset stub, trampoline routines and other things.

I'm a little curious how often this code needs to change for you that you consider setting a fixed starting address not a good enough solution. Do those listed things change frequently, or is this more about the "other things"?
Re: Writing my own assembler
by on (#228271)
Quote:
... The reason I'm making this thread is not to advertise my assembler, create hype (as if!), take requests, or anything like that. I'm actually a little insecure about my design ideas and ways to implement them, seeing as I've never written a program like this before, so I'd like to discuss some of these ideas with you guys first, to make sure I'm not missing anything important and making bad decisions left and right. ...

If you're going to simply do whatever you want to do regardless of what feedback people give you, including materials that might give you some ideas or let you think about things differently, then how do we give you constructive feedback, or feedback that you'll find useful?

Rephrased: I'm 100% for people making tools for themselves (incl. not sharing them if they don't want to -- folks should do whatever they wish!) but it's strange to ask for advice/feedback/insights/etc. and then basically say "I don't care, I'm doing what I want, for me". Why should the community provide feedback if that's the modus operandi? How do we give you helpful and constructive feedback?

(I'm asking this with a tremendous amount of respect BTW, and not to start an argument or to get things off-topic. I'm about at a point where I'm probably going to have to make my own disassembler for similar reasons (for the 2nd time in my life nonetheless!). It's so strange how we have better tools today in some regards, but worse in others.)
Re: Writing my own assembler
by on (#228272)
koitsu wrote:
I'm about at a point where I'm probably going to have to make my own disassembler for similar reasons (for the 2nd time in my life nonetheless!).

The last one being Tracer?
Re: Writing my own assembler
by on (#228278)
rainwarrior wrote:
If it's something you specify in the file like .org, I would think that a right align would have two directives, one that opens it with padding, and a second one that closes it with an address to meet.

I'm intentionally trying to avoid making this look like a block directive, but I guess that it kinda is anyway, with the PC clearing and the subsequent required .ORG.

Quote:
Though TBH I think a CC65 linker config property on a segment would be a lot better way to express it.

That'd be great, but I can't do it myself and I can't rely on other people doing it for me. Plus this is far from from being the only ca65 drawback that affects me.

Quote:
I'm a little curious how often this code needs to change for you that you consider setting a fixed starting address not a good enough solution. Do those listed things change frequently, or is this more about the "other things"?

Trampolines are added as I write the banked subroutines, which takes a while. Another thing that makes this more complicated is that different banks have different configurations of "fixed" content: there's the stuff that goes in every bank, but banks containing level maps, for example, have subroutines for accessing that data, testing collisions and the like. Banks containing graphics have decompression subroutines. Banks containing object definitions have code to load those definitions. These things aren't supposed to change all the time, but they do sometimes, and the different configurations make this even more annoying to manage.

It's not impossible to do it the "conventional" way, since I'm making a tool that deals with all the things that bother me, why not making this particular thing automatic so I can worry about managing 1 less thing?
Re: Writing my own assembler
by on (#228281)
koitsu wrote:
If you're going to simply do whatever you want to do regardless of what feedback people give you, including materials that might give you some ideas or let you think about things differently, then how do we give you constructive feedback, or feedback that you'll find useful?

I do value good criticism, but I ignore what I find irrelevant. I know you were trying to help, but like you said in your post, you didn't know what I was talking about (right-aligning), and didn't seem to acknowledge my effort in preserving the typical/classic functionality of ORG, so what you said didn't help at all.

ASM6 for example allows code/data before the first ORG, as long as the value of the PC isn't needed. I've used that to write the NES header, for example. That's semantically better than doing ORG $7ff0 or whatever, since the header doesn't really get mapped to that or any other address. That would keep working with my proposed solution.

The link you provided to that assembler's documentation was very relevant though, since I have in fact been reading the documentation for various assemblers, taking some cues from here and there, so thanks.

Quote:
I'm 100% for people making tools for themselves (incl. not sharing them if they don't want to -- folks should do whatever they wish!)

I'm not opposed to sharing my tools, but they're hardly ever "release-ready", and I don't want to spend the time necessary to make them that way. This assembler is no different... It's written in JavaScript and has the functionalities *I* think she gonna speed up *MY* workflow when coding, but if anyone thinks it can be useful for them, great, I'll share, I'm just not going to be making changes I don't think are useful to me personally.

That may sound selfish when put like that, but this forum is essentially meant for people to ask for help with their projects, and my current project is an assembler. If someone doesn't feel like helping because they're not getting anything out of it, then they can simply not help. I don't expect anything back when I help people with their games here, for example.

Quote:
but it's strange to ask for advice/feedback/insights/etc. and then basically say "I don't care, I'm doing what I want, for me".

Doesn't everyone keep the ideas they like, and toss the rest?

Quote:
How do we give you helpful and constructive feedback?

The most important thing is: understand what the problem I'm trying to solve is. Even if it's not a problem for you, due to your workflow being different, try to see why it's a problem for me.

Quote:
It's so strange how we have better tools today in some regards, but worse in others.)

I guess that people have different backgrounds, workflows, expectations and goals, so what's better for one person isn't necessarily better for the other.
Re: Writing my own assembler
by on (#228283)
tokumaru wrote:
Quote:
I'm a little curious how often this code needs to change for you that you consider setting a fixed starting address not a good enough solution. Do those listed things change frequently, or is this more about the "other things"?

Trampolines are added as I write the banked subroutines, which takes a while. Another thing that makes this more complicated is that different banks have different configurations of "fixed" content: there's the stuff that goes in every bank, but banks containing level maps, for example, have subroutines for accessing that data, testing collisions and the like. Banks containing graphics have decompression subroutines. Banks containing object definitions have code to load those definitions. These things aren't supposed to change all the time, but they do sometimes, and the different configurations make this even more annoying to manage.

Hmm, it's a little surprising to me how much stuff you want in there. Like for me with BxROM it only ever contained a few lines of code that write the bank register, because that had to be in a fixed location to work. For things that were trampolines, the rest of that trampoline didn't need to be in any particular place.

Of course I don't really know what kind of mapper you're dealing with, or what other schemes you're using to organize your work. Would having a decompression routine somewhere else in the bank cause a problem?

For my last game I did move some segments around just for the sake of visualization, or in one or two cases to facilitate potentially cleaner IPS patches. For the vast majority of this though the final segment order/position didn't really matter in any functional way. Is this generality different for you?

Actually that's one of the reasons I like CC65's segment/linker system so much. This was really easy to reorganize in the .CFG with almost no impact on the rest of the code.
Re: Writing my own assembler
by on (#228284)
rainwarrior wrote:
Would having a decompression routine somewhere else in the bank cause a problem?

Since there's more than 1 bank with graphics, the decompression routine needs to be in the same place in all of them. Actually, that's not true: since there's a trampoline in each bank, each trampoline could jump to a different location, but honestly, having copies of the same routine scattered around different addresses sounds like a mess to me, and in some cases I might need the timing to remain consistent across the different copies, so I wouldn't want them to be aligned differently relative to the memory pages.

Anyway, in this particular case I guess I could put the routine in the beginning of the banks, since there are no alignment requirements for compressed graphics, but for other kinds of data there might be, so my default is to put stuff at the end.

Quote:
For the vast majority of this though the final segment order/position didn't really matter in any functional way. Is this generality different for you?

I like to know where things are, for the most part, but I often have a significant amount of timing-sensitive code and data (mostly related to vblank updates - I often use every last cycle of vblank time) for which I absolutely have to control the page alignment, and I find that easier to control if I can just look at the INCLUDES that are near an ORG statement with a readable address.

Quote:
Actually that's one of the reasons I like CC65's segment/linker system so much. This was really easy to reorganize in the .CFG with almost no impact on the rest of the code.

I reorganize stuff by moving INCLUDEs around. Every subroutine and data table that I write has its own file, and my main file is just a bunch of ORGs (or SEGMENTs, if using ca65) and INCLUDEs, so I can quickly look at that file and see the entire ROM structure right there, I can see where everything goes, and I can move stuff around by cutting and pasting whenever necessary.
Re: Writing my own assembler
by on (#228285)
koitsu wrote:
A copy of x816's manual is here, along with several other manuals from assemblers. Just remember that x816 was intended for 65816 (which supports 24-bit addressing and native banks), but it should give you some good ideas on how to do things (like .base and how to handle some scope-related bits): https://www.dropbox.com/sh/15z9w4v0s6h7 ... Knina?dl=0

It looks like ASM6 was *heavily* inspired by this assembler! I always wondered where the ideas used in ASM6 came from.
Re: Writing my own assembler
by on (#228286)
x816 was also Norman Yen's 3rd assembler. His earlier works were snesasm, followed shortly after by trasm (Tricks Assembler, which was like snesasm but better). All 3 were written in Turbo Pascal, and source is available for snesasm and trasm but not x816 (hence my quest to get it, but Norman was kind enough to state that it's been lost). Also, don't confuse snesasm with another 65816 cross-assembler from that time *also* called snesasm, which was written in C, nor confuse any of those with SNASM, which was also in C and was a 2-pass assembler (but pretty awful).

I rank x816 quite highly because I dealt with a lot of crappy cross-assemblers over the years (I think I tried a total of 7 or 8 on PC), and x816 was the cream of the crop -- it's what I used to do 6502 code for the Neo Demiforce FF2e intro (when DOS was still a thing), actually. It felt very natural and made a top-notch general-purpose assembler that made starting out doing 65xxx easier, and making SNES-oriented ROMs even easier.

For native 65xxx assemblers, I can really only speak about Merlin 8 (Apple II series), and later Merlin 16 and ORCA/M (Apple IIGS). ORCA/M really spoils you, but was very much a IIGS-oriented assembler (though some people did use it for some general SNES development). Otherwise for older stuff, I started with the built-in Apple II ROM mini-assembler and Monitor, which were convenient for small programs but tedious as hell for long ones (especially if you made any mistakes); if people were to show you it today, you'd probably lose your mind wondering how people tolerated it... but they were built-in to the Apple II, thus essentially "free".

A lot of the general pseudo-ops you see used in all of these programs, and in asm6 and others, tend to be similar because those pseudo-ops date back to even older assemblers (early 90s, late 80s, and beyond). For the SNES-oriented assemblers above, you'll find many from completely different authors who all seem to share similar pseudo-ops -- and that's because everyone wanted to be as compatible as possible with one another (so you could move your source to/from assemblers without too much pain). Else there's a pretty common set of directives that people have gotten used to over the years, but there's no reason those have to be retained as long as the assembler comes with good documentation (ca65 is a great example).

So in short, I'm fairly certain asm6 was inspired by a lot of previous assemblers that Loopy had used. I don't know what systems he grew up working on, but everyone seems to have their own preference.

Footnote: I'm in the process of trying to put together a Dropbox directory of old assemblers as well, since a lot of the old stuff seems to be lost over time as sites go down and the like. There's a lot of old tools/assemblers/disassemblers that are essentially lost because of that. Not everything from the days of floppies got transferred to the Internet. :)
Re: Writing my own assembler
by on (#228287)
tokumaru wrote:
Quote:
For the vast majority of this though the final segment order/position didn't really matter in any functional way. Is this generality different for you?

I like to know where things are, for the most part, but I often have a significant amount of timing-sensitive code and data (mostly related to vblank updates - I often use every last cycle of vblank time) for which I absolutely have to control the page alignment, and I find that easier to control if I can just look at the INCLUDES that are near an ORG statement with a readable address.

Hmm, I wasn't including alignment under order/position because that's separately guaranteed for me with either .align directives or segment alignment (or both). Yes, the alignment definitely matters but there's no code that needs to run on any specific page. I can still reorganize those simply by moving the segment in the CFG without any worry, except when the bank becomes extremely full and I need to manage segmentation (but that's kind of a one time task).

How does right alignment figure into this though? Does that solve a technical problem, or is this just an organizational thing for you? Do you sort of manage your code as two stacks, one growing up from the bottom, and on growing down from the top, with a big hole in the middle? Where does the need to fill a space from the right come in? ("I just want it there" is a valid reason, BTW, I don't want you to feel like I'm demanding that you justify it.)

The one case where I could think I might want it would be trying to fit in DPCM without having to split the 32k space into two halves. (...but I also probably wouldn't try to use DPCM with 32k banking.)
Re: Writing my own assembler
by on (#228288)
rainwarrior wrote:
that's separately guaranteed for me with either .align directives or segment alignment (or both).

Doesn't that result in wasted ROM space?

Quote:
Do you sort of manage your code as two stacks, one growing up from the bottom, and on growing down from the top, with a big hole in the middle?

That's exactly it.

Quote:
Where does the need to fill a space from the right come in?

It's mainly because I have two cases that require careful positioning of code/data: a) stuff that needs to be in multiple banks and therefore must be in the same address every time (CPU vectors, trampolines, data-processing subroutines, etc.) and b) stuff that needs careful alignment relative to the memory pages for timing reasons. I find it very convenient to put each of these groups in opposite ends of the bank, so they don't interfere with each other, and they both grow inward as I add new stuff.

If everything was left-aligned, the fixed stuff would have to come first (or I wouldn't be able to map all instances to the same address), and it'd interfere with the alignment of the code/data following it, or I'd have to waste ROM space with padding to guarantee the alignment.
Re: Writing my own assembler
by on (#228289)
tokumaru wrote:
rainwarrior wrote:
that's separately guaranteed for me with either .align directives or segment alignment (or both).

Doesn't that result in wasted ROM space?

Not really... at least I've never found it significant. However, your proposed alternative is just setting the address with .org, right? Doesn't this have the exact same padding problem?

.align will create padding, but only as much as needed. You can always rearrange stuff around the code to take up that padding. (In general I don't actually start worrying about the small amount of bytes lost to padding until I actually run out of space, though.)

If you have a short routine, you can use .align 32, or .align 16, etc. to get the guarantee you need with less potential padding.

You can also put all your alignment requiring routines in one segment (or after one .align). Like the real problem is just branch crossings, right? Guarantee those with an assert, and you can put as many pages worth of code as you want after a single guaranteed page alignment, and just manually stick a few bytes in between your routines whenever one of those asserts triggers.

Sort of an extension of that idea, you can use generally asserts instead of .align to guarantee it for compilation without any implicit padding, and just add it as needed.
Re: Writing my own assembler
by on (#228291)
Yeah I use asserts too a lot and never had an issue with alignment.

You can use this macro instead of .align to verify you're not wasting too much space:

Code:
.macro align size max_segmentation
  .local label
  label:
  .align size
  .if (* - label) > (max_segmentation)
    .error ".align creates excess segmentation."
  .endif
.endmacro