This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

posting code in hopes of peer reviews.

posting code in hopes of peer reviews.
by on (#238404)
I plan to post code occasionally to see if i can get feedback on what to do better, what won't work, if you can spot any bug, or remarks about better practices. They might as well be collected into one thread.

So here it goes, first function:

A function to interface with a custom peripheral. It uses 4 daisy-chained serial-to-parallel 8bit registers, which in turn feeds 8 BCD to decimal converters, in turn relaying 8x10 nixie tube cathodes.

Now, the hardware side of the NES is always a bit hard to wrap my head around.. so let's check if all my assumptions are OK, and if my code actually corresponds well enough to those assumptions. What do you think?

Code:
;assumptions:
;-port 2 clk can be used to clock the shift registers (rising edge type).
;-port 2 data 1 is used to feed serial data to shift registers (in turn feeding bcd to decimal decoders)
;-port 2 data 2 is used to enable input on shift registers
;-port 2 data 0 is unused as to not strobe controller 1. maybe pedantic.
;-instead, we may tie data 0 to "output enable", ie. results are not shown on shift register outs until controller is strobed once per frame as an automated quantizer to reduce update rate on the nixie tubes.
;-this means we might be able to update score at any point in the loop; results are still updated on the displau itself on a per frame basis.
;which could save us the "displayPending" var.
;-we're only storing one BCD per array entry. We could bitpack one in each nybble, but i don't care to at this time.
;-cable is known to have all three data lines.

;======

lda displayPending ;might not be warranted to keep a pending var.
beq skipDisplay
dec displayPending ;is now 0 again

ldx #0
ldy #0

displayBigLoop:
lda myScore,y ;y can be 0-7; asserted in loop. each value holds just 0-9 in the lower nybble.
sta temp ;now holds our score buffer

displaySmallLoop:
lda temp
and #1
asl ;moves from bit 0 to bit 1; which we use to feed the shift register with data.
ora #%00000100 ;we use data 2 as input enable on the shift registers.
sta $2016 ; doesn't affect controller 1, and reversely, controller 1 strobes don't affect nixie display
lda #0
sta $2016 ; input enable goes low.
lsr temp 
inx 
cpx #32 ;done with all 8 digits (4 bits * 8 places) in our binary coded decimal?
beq skipDisplay:
txa ;used to check for absence of modulo.
and #7
tay ;preparatory for fetching the next myScore,y
and #3 ;check for absence of modulo - if there is none; we're done with one binary coded decimal.
beq displayBigLoop
jmp displaySmallLoop

skipDisplay:    
;i might tie data 0 to "output enable" on the shift registers so that score changes only switch at max once per frame, each time controller code strobes the ports.
;else, output enable is meant to be tied high.
rts


note:
the shift register i'm using:
http://www.ti.com/lit/ds/symlink/cd4094b.pdf


What do you think? Anything i should work on improving?

Edit: added in "and #7" after posting

Edit 2: I suppose that jmp at the bottom is replacable with bne.
Re: posting code in hopes of peer reviews.
by on (#238407)
FrankenGraphics wrote:
Now, the hardware side of the NES is always a bit hard to wrap my head around.. so let's check if all my assumptions are OK, [...]
  • ;assumptions:
  • port 2 clk can be used to clock the shift registers (rising edge type).
  • port 2 data 1 is used to feed serial data to shift registers (in turn feeding bcd to decimal decoders)
  • port 2 data 2 is used to enable input on shift registers
  • port 2 data 0 is unused as to not strobe controller 1. maybe pedantic.
The NES only has three general purpose outputs, and only one is available on the controller port. If you make an expansion port peripheral, however, this is fine. Do note that "OUT0" isn't specific to the port, but is instead shared among both jacks.

Quote:
sta $2016 ; doesn't affect controller 1, and reversely, controller 1 strobes don't affect nixie display
$4016.

Quote:
txa ;used to check for absence of modulo.
**
and #7
tay ;preparatory for fetching the next myScore,y
and #3 ;check for absence of modulo - if there is none; we're done with one binary coded decimal.
beq displayBigLoop
This won't do the thing you want. You'll end up emitting 4 bits from byte 0, and then 4 bits from byte 4, and repeat that three more times.
I think just inserting "LSR / LSR" where I put the ** should make it work?
Re: posting code in hopes of peer reviews.
by on (#238408)
I think I'd want OUT connected to your shift register input, and CLK connected to your shift register clock?

1. store the bit you want to write to $4016 (OUT holds $4016.0 at both ports)
2. then read $4017 to shift that bit into your shift register (generates CLK low pulse, second port only)
3. Repeat for as many bits are in your shift register chain.

Afterward (or before) you can separately use $4016 to strobe and then read the first controller port without affecting your shift registers, since they will be ignoring OUT unless it gets a CLK.
Re: posting code in hopes of peer reviews.
by on (#238411)
Just an example of making the code slightly more readable... with the comments all lined up and commands indented from labels. And getting rid of a few of the "magic numbers". Well everything is not lined up perfectly in my example because using the code tag is a bit wonky but you get the idea.


Code:
;assumptions:
;-port 2 clk can be used to clock the shift registers (rising edge type).
;-port 2 data 1 is used to feed serial data to shift registers (in turn feeding bcd to decimal decoders)
;-port 2 data 2 is used to enable input on shift registers
;-port 2 data 0 is unused as to not strobe controller 1. maybe pedantic.
;-instead, we may tie data 0 to "output enable", ie. results are not shown on shift register outs until controller is strobed once per frame as an automated quantizer to reduce update rate on the nixie tubes.
;-this means we might be able to update score at any point in the loop; results are still updated on the displau itself on a per frame basis.
;which could save us the "displayPending" var.
;-we're only storing one BCD per array entry. We could bitpack one in each nybble, but i don't care to at this time.
;-cable is known to have all three data lines.

;======

NES_JOY1                                 = $4016
BCD_BITS_PER_DIGIT      = $04
BCD_NUMBER_OF_PLACES      = $08

lda displayPending                   ;might not be warranted to keep a pending var.
beq skipDisplay
dec displayPending                   ;is now 0 again

ldx #0
ldy #0

displayBigLoop:
   lda myScore,y                   ;y can be 0-7; asserted in loop. each value holds just 0-9 in the lower nybble.
   sta temp                        ;now holds our score buffer

displaySmallLoop:
   lda temp
   and #1
   asl                             ;moves from bit 0 to bit 1; which we use to feed the shift register with data.
   ora #%00000100                  ; we use data 2 as input enable on the shift registers.
   sta NES_JOY1                    ;doesn't affect controller 1, and reversely, controller 1 strobes don't affect nixie display
   lda #0
   sta NES_JOY1                    ; input enable goes low.
   lsr temp 
   inx 
   cpx #(BCD_BITS_PER_DIGIT * BCD_NUMBER_OF_PLACES)
   beq skipDisplay:

check_modulo:   
        txa                            ;used to check for absence of modulo.
   and #7
   tay                            ;preparatory for fetching the next myScore,y
   and #3                         ;check for absence of modulo - if there is none; we're done with one binary coded decimal.
   beq displayBigLoop
   jmp displaySmallLoop

skipDisplay:   
                                  ;i might tie data 0 to "output enable" on the shift registers so that score changes only switch at max once per
                                  ;frame, each time controller code strobes the ports.
                                  ;else, output enable is meant to be tied high.
rts
Re: posting code in hopes of peer reviews.
by on (#238449)
Thanks, everybody.
lidnariq wrote:
The NES only has three general purpose outputs, and only one is available on the controller port. If you make an expansion port peripheral, however, this is fine.


At some point i thought i'd use the expansion port, but then didn't feel like modifying the unit, so that approach might rest on the shelf until/if a proper expansion port connector ever becomes available. I completely missed the two other outs aren't available on the controller ports!

lidnariq wrote:
I think just inserting "LSR / LSR" where I put the ** should make it work?

Nice catch! I think i need to restore A afterwards for the beq BigLoop to function, like so?

Code:
txa
lsr ;divide by 4
lsr
and #7
tay ;preparatory for reading the next myScore,y.
txa ;refresh a after the above division.
and #3


rainwarrior wrote:
I think I'd want OUT connected to your shift register input, and CLK connected to your shift register clock?

Ohh, i never figured CLK:s were separately signalled, but that makes total sense for controllers to function. :lol: That should simplify things considerably.

So, basically:

Code:
displaySmallLoop:
lda temp
sta $4016 ;latches 0st bit to both controller ports, but we're only going to clock the second port.
lda $4017 ;generates a clock on the second port; feeds one bit into the shift register.
lsr temp
[...]


only thing is i have no way to control the output enable, but i don't think that matters. I'll just tie it high and see how that works for me.

Also in that case i'll probably keep the pending thing so it is always called right after/before controller reads, in order to not have the nixie display output reset controller 1 in the middle of a joypad read.

hackfresh wrote:
And getting rid of a few of the "magic numbers".

thanks. using constants would for sure have saved me from the double typo lidnariq catched.
Re: posting code in hopes of peer reviews.
by on (#238452)
BIT $4017 might be appropriate for generating a CLK pulse from the read without clobbering A.
Re: posting code in hopes of peer reviews.
by on (#238453)
nice! it gets overwritten in a few lines, but it seems a better practice in general. thanks!

New version:
Code:
;assumptions:
;-it seems okay in theory to call this routine at will. If an NMI interrupts,
;there doesn't seem to be a way joypad reading and nixie output could interfere.
;-all this is assuming we just want a simple score printout of 8 digits, and not some other mixture of HUD vars.
;hypothetically, one or two digits could be tied to output "0" as a score base and relieve logic to output lives, keys, a thematic radiation meter or whatever else.
;-a 1 player solution. a version could be made to use expansion port to allow for 2 players and a score display.

;========================================
;only use these if i need to keep it last in the NMI.
;---------------------------------------
;lda nixiePending ;might not be warranted to keep a pending var.
;beq skipNixie
;dec nixiePending ;is now 0 again
;========================================

   ldx #0
   ldy #0
nixieBigLoop:

   lda myScore,y ;x can be 0-7. each value holds just 0-9.
   sta temp ;holds our score buffer

nixieSmallLoop:
   lda temp
   sta $4016 ;moves 0st bit into both port out latches, but we're only clocking the second port.
   bit $4017 ;generates a clock on the second port; feeds one bit into the shift register.
   lsr temp 
   inx 
   cpx #32 ;done with all 8 digits in our score printout?
   ;- (4 bits * 8 places = 32 shift register bits -modify this circumstances change)
   beq skipNixie
      txa      ;+
      lsr      ;|
      lsr      ;|-- preparatory for getting the next myScore,y.
      and #7   ;|
      tay      ;+
      
      txa                ;+
      and #3             ;|
                         ;|- check if we're done with a binary coded decimal. 
      beq nixieBigLoop   ;|
      bne nixieSmallLoop ;+

skipNixie:    
   rts
Re: posting code in hopes of peer reviews.
by on (#238454)
A more hardware related question - would it be reasonable to ask the NES +5v to supply 4 shift registers and 8 bcd decoders? I'm using a separate supply but feel a bit like i'm wearing both belt and braces.

If i read the spec correctly, the bcd decoders are using 1mA each for supply and a neglible ammount for the data in:s. So that's a little above 8mA. That seems to be the case for the shift registers as well, so a little above 12mA, then.

https://tubehobby.com/datasheets/k155id1.pdf
Re: posting code in hopes of peer reviews.
by on (#238469)
The datasheet says that the typical draw per 7441 (or Soviet equivalent) is 11 (74141) or 21 (7441) mA, not 1mA... Eight of those is ... not certain. Probably ok? I mean, the NES has a 1A regulator, and my measurements show only about 400mA are consumed internally.

I would be a little paranoid about the 7441 / K155s handling high voltage and would add a some high voltage protection on the NES side... Probably something like:

NES+5V ---|>|--- external +5V (any diode that has a peak reverse voltage greater than the supply used for the nixies)
NESsignal ---3k--- external circuitry

The NES controller ports already have over-/under- voltage protection diodes inside (from ground to signal and from signal to +5V), but there's no current limiting for them.
Re: posting code in hopes of peer reviews.
by on (#238625)
This is what i use to get individual bits from 32 byte chunks. I think it is fine, but could use a critical look. I have this particular problem where i use this routine to get a semi-persistent state for any screen ID, but some particular screens on my map don't register a change. I've gone through this piece a few times and don't think the problem is here... but it's also my first bitwise handler and you might know of better practices. And if i'm proven wrong, that's great too.

myTemp1 could just as well be myTemp+1, but anyway...

Code:
Prepare_32byteBitwise:
;===================================================================
;prepare the prerequisites for looking up/writing to a single bit
;in a 32-byte array of flags from a single byte ID. 
;-------------------------------------------------------------------
;Written to be called mostly at screen loads
;or to help with occasional object decisionmaking
;-------------------------------------------------------------------
;expects an ID of 0-255 in A.
;returns a bitmask in myTemp1, and an index ($0-$1f) in X.
;tampers with A,X,Y,myTemp,myTemp1.
;===================================================================
sta myTemp ;used to restore A a few lines down


ldy #1 ;prep first bitmask option
sty myTemp1


and #$7 ;used to get the 3 least significant bits.
tay
lda myTemp ;used to get the 5 most significant bits
lsr
lsr
lsr
tax
 
 
;this loop makes bitmasks out of the last 3 bits of the 1-byte ID

@loop:
cpy #0
beq @skip ;if the three bits are 0, the correct bitmask is already loaded in myTemp
    asl myTemp1 ;shifts bitmask up
    dey
   
    bcc @loop ;brA
    @skip:
   
    ;ldy myTemp1 ;uncomment if you want the bitmask loaded into Y as well.
Prepare_32byteBitwiseEnd: ;Safety label to limit the scope of "@" labels in asm6
rts
Re: posting code in hopes of peer reviews.
by on (#238636)
Your code looks fine to me.

I'd be tempted to replace the loop with a lookup table, which would be faster and I think might be clearer, but it shouldn't matter.
Re: posting code in hopes of peer reviews.
by on (#238644)
lidnariq wrote:
Your code looks fine to me.

I'd be tempted to replace the loop with a lookup table, which would be faster and I think might be clearer, but it shouldn't matter.


I don't think dey affects condition flags, so the loop exit condition is wrong.

Also, if y holds a value 0 to 7 and the accumulator holds a byte, the loop:
Code:
loop:
  lsr
  dey
  bpl loop

will exit with the value of bit "y" in the carry flag, with no further masking or tests required.
Re: posting code in hopes of peer reviews.
by on (#238648)
supercat wrote:
I don't think dey affects condition flags, so the loop exit condition is wrong.

DEY affects zero flag and sign/negative flag, but not carry. (All the INC/DEC instructions work this way.)

http://www.obelisk.me.uk/6502/reference.html#DEY
Re: posting code in hopes of peer reviews.
by on (#238650)
Oh, i see dey only affects n and z, according to http://www.6502.org/tutorials/6502opcodes.html#DEY

but

asl myTemp1

sets carry,
which is why it has seemingly worked for the most time. but it seems i can end up with a fully negating bitmask that way on every 8th incoming ID that way - that would explain my troubles.
Re: posting code in hopes of peer reviews.
by on (#238652)
FrankenGraphics wrote:
asl myTemp1 sets carry,
It should have ended up terminating always with myTemp1 = 0 and C=1...
Re: posting code in hopes of peer reviews.
by on (#238655)
except that at the top of each loop, we check for y to be 0, in that case: we escape the loop to @skip. but if myTemp1 sets carry before that, i think we end up with an empty bitmask.

so to correct myself, it's its the cpy / beq that makes it work, seemingly, and it seems to me the bcc messes this up.


edit. i might be wrong abut the chicken-egg condition.

this seems to work identically. but it's simpler to read. thanks everybody.

@loop
asl myTemp1
dey
bpl @loop

it seems my troubles are elsewhere:

-certain items respawn on certain screens when they ought to be permanently collected, but it seems more related to a specific item type that misbehaves occasionally.
-when altering a bit related to a certain room, revisiting that room teleports the player to a wholly different screen and position unintentionally. - more likely something is looking for a var in the wrong place. maybe even a missed bank switch or something. i'll keep searching.
Re: posting code in hopes of peer reviews.
by on (#238666)
Are you sure about the BPL? That will jump when Y is positive (=including 00). The code in your pre-previous post looked more as if you wanted to stop looping when Y=00, if yes, then better use BNE.
Re: posting code in hopes of peer reviews.
by on (#238670)
hm... for all intents and purposes it doesn't seem to matter? watching RAM in the hex editor, the same bit in the same byte gets picked.
Re: posting code in hopes of peer reviews.
by on (#238676)
No, BPL loops once more than BNE. Unless you still have the CPY+@skip inside of the loop (as in your old code version)?
Then yes, then the @skip would exit upon Y=00 no matter of BPL or BNE (or then you could even use non-conditional JMP @loop).
I would move the @skip stuff outside of the loop, and use BNE at loop end:
Code:
Prepare_32byteBitwise:
 sta  myTemp    ;-memorize value with lower 3bit
 lsr            ;\
 lsr            ; get the 5 most significant bits
 lsr            ;
 tax            ;/
 ldy  #1        ;\init bitmask
 sty  myTemp1   ;/
 lda  myTemp    ;\
 and  #$7       ; isolate lower 3bit and skip if zero
 beq  @skip     ;/
 tay            ;\
@loop:          ;
 asl  myTemp1   ; shifts bitmask up
 dey            ;
 bne  @loop     ;/
@skip:
 rts
Or, replace the whole loop by a single table look-up:
Code:
Prepare_32byteBitwise:
 sta  myTemp    ;-memorize value with lower 3bit
 lsr            ;\
 lsr            ; get the 5 most significant bits
 lsr            ;
 tax            ;/
 lda  myTemp    ;\
 and  #$7       ; isolate lower 3bit and skip if zero
 tay            ; and look-up bitmask table
 lda  @table,y  ;
 sta  myTemp1   ;/
 rts
@table:
 db   1,2,4,8,16,32,64,128   ;(or "dcb" or "defb" or whatever your assembler wants for defining data bytes)
Re: posting code in hopes of peer reviews.
by on (#238677)
i think if the incoming ID to the subroutine is #0 or otherwise has no modulo left from AND #7, a BNE would have the Y-loop wrap around for 255 more iterations.

on the other hand, using bpl, an incoming ID of #7 can push the bitmask into carry, missing every 8th trigger bit.

so this:
@loop:
beq @skip ;true conditional
asl myTemp1
dey
bne @loop ;could be jmp since beq will get us out either way
@skip

seems necessary, after all, short of using the LUT.

edit: Nope, that pushes the problem forward. Not sure what i do wrong here and will try to examine it more carefully at some point, but i think i'll simply resign to using the LUT for now.


All screen triggers + item triggers seem to work just fine now, except for the few rooms that have the teleporting bug which seems to be an isolated issue.
Re: posting code in hopes of peer reviews.
by on (#238679)
FrankenGraphics wrote:
i think if the incoming ID to the subroutine is #0 or otherwise has no modulo left from AND #7, a BNE would have the Y-loop wrap around for 255 more iterations.

Yes. But to prevent that you do have the CPY+BEQ @skip in your code.
Looking at your current example, you have only BEQ, without CPY? That looks wrong!
Or alternately, use the AND+BEQ @skip from my example (AND affects zeroflag too, so you don't need CPY).

You need to pre-check for Y=00 only once before the loop.
Doing it inside of the loop would be useless and slow (instead you can just rely on DEY+BNE at the loop end).

If the table method works is fine, too. And it's faster. Though you'll probably need to get familar with loops sooner or later, too : )
Re: posting code in hopes of peer reviews.
by on (#238685)
FrankenGraphics wrote:
i think if the incoming ID to the subroutine is #0 or otherwise has no modulo left from AND #7, a BNE would have the Y-loop wrap around for 255 more iterations. [


There are two approaches one can take here: plan on the loop running 1-8 times, or handle the zero-case specially and then have it loop 1-7 if the value wasn't zero.

Unless code space is at an absolute premium, the approach of putting a value 0-7 into X and the value with the bit to be tested in A, and then doing:
Code:
    and bitTable,x

will offer an excellent size/speed trade-off since the table will only be eight bytes, and the code to access it three, for a total of 11. If one wants to test the value of bit X of the accumulator without a table, the loop:
Code:
@lp:
    lsr
    dex
    bpl @lp

will do the job in four bytes, leaving the result in C. The result could be moved into the Z and N flags via "ror" instruction if desired. If one wants to convert a value 0-7 into a bit mask in A, the table approach would be 11 bytes, or one could use:
Code:
    lda #0
    sec
@lp:
    rol
    dex
    bpl @lp

If one wanted the bit mask in a location which had been initialized to 1, without disturbing A, one could use:
Code:
    dex
    bmi @done
@lp:
    asl temp
    dex
    bpl @lp
@done:

noting that "dex" is a byte cheaper than "cpx #0". If it's neccessary to have X end up at zero, one could use:
Code:
    cpx #0
    beq @done
@lp:
    asl temp
    dex
    bne @lp
@done:

but note that the code there is 8-9 bytes (depending upon whether "tmp" is in zero page) and unless "temp" needed to be initialized to 1 for some other reason, the 11-byte table approach could be more efficient.