This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Taken branch delays interrupt

Taken branch delays interrupt
by on (#63094)
I've discovered that a taken non-page-crossing branch ignores IRQ/NMI during its last clock, so that next instruction executes before the IRQ. Other instructions would execute the NMI before the next instruction in this case. This doesn't occur for non-taken branch, or one that crosses a page. It also doesn't occur for JMP. The cpu_interrupts_v2 test on the Wiki now tests this behavior.

I encountered this while improving the new PPU synchronization scheme. I was using a HERE: BCC HERE wait loop for NMI, and was having my NMI occur later than expected. When I changed it back to JMP HERE, it worked fine. It made absolutely no sense, as I thought they were identical. I made sure there was no page crossing, that the carry flag wasn't being set, etc. and finally realized that its timing must actually be different. This behavior is probably already known in 6502 circles, maybe even here, but it was definitely news to me.

The test has an IRQ occur at each cycle within a test sequence, starting at some arbitrary point, and shows how many clocks delayed the IRQ was. T+ is how many clocks since the arbitrary starting point the IRQ was requested, and CK is how many clocks delayed it was, also relative to some arbitrary value. Only the relative values of these matter. PC is the saved PC of the next instruction that was on the stack within the IRQ handler, relative to some starting point. The example code has comments showing the offsets, so you can see where the IRQ was actually vectored.

The first three tests show nothing out of the ordinary, but not the fourth:
Code:
        nop
        ; 04
        jmp :+
        ; 07
:       nop
        ; 08
:       jmp :-

test_jmp
T+ CK PC
00 02 04 NOP
01 01 04
02 03 07 JMP
03 02 07
04 01 07
05 02 08 NOP
06 01 08
07 03 08 JMP
08 02 08
09 01 08

        clc
        ; 04
        bcs :+
        ; 06
        nop
        ; 07
:       lda $100
        ; 0A
:       jmp :-

test_branch_not_taken
T+ CK PC
00 02 04 CLC
01 01 04
02 02 06 BCS
03 01 06
04 02 07 NOP
05 01 07
06 04 0A JMP
07 03 0A
08 02 0A
09 01 0A JMP

        clc
        ; 0D
        bcc :+
        ; 0F
        nop
        ; 00
:       lda $100
        ; 03
:       jmp :-

test_branch_taken_pagecross
T+ CK PC
00 02 0D CLC
01 01 0D
02 04 00 BCC
03 03 00
04 02 00
05 01 00
06 04 03 LDA $100
07 03 03
08 02 03
09 01 03

        clc
        ; 04
        bcc :+
        ; 06
        nop
        ; 07
:       lda $100
        ; 0A
:       jmp :-

test_branch_taken
T+ CK PC
00 02 04 CLC
01 01 04
02 03 07 BCC
03 02 07
04 05 0A LDA $100 *** This is the special case
05 04 0A
06 03 0A
07 02 0A
08 01 0A
09 03 0A JMP


The timing looks similar to the NOT taken branch. Note how the IRQ being requested during the last cycle of the BCC doesn't cause an IRQ immediately after (07), but rather after the LDA (0A). So you get a 5-cycle delay for this case, even though there are no 5-cycle instructions in the test sequence.

by on (#63538)
I further just found/realized that this effectively increases the number of cycles the next instruction takes. It behaves just as if the taken non-page-crossing branch was a two-cycle instruction, but then the instruction branched to is one cycle longer. This means that if the instruction branched to is an ROL $1234,X, then interrupts will be delayed longer than you thought possible; it means that you must consider the longest instruction 8 cycles with regard to calculating maximum interrupt latency, rather than 7. This is very significant when doing critical timing, and makes me wonder whether the 6502 suffers from it as well, and not just the NES CPU.

by on (#63547)
Oh I guess this make sense altough it's weird. The 3rd cycle (that is adding the 2nd fetched byte to PC) is considered part of the next instruction. But does this apply as well to branch which cross pages ?

You should ask this question to 6502.org I think.

by on (#63557)
A page-crossing taken branch doesn't have this oddity; it acts like a normal 4-cycle instruction. See timing results in first post. Apparently it only applies to taken non-page-crossing branches.

by on (#63589)
- What should be the correct output for test 4-nmi_and_dma ? I don't know an emulator that passes ok.

by on (#63616)
I updated the cpu_interrupts_v2test to include the correct output, and also renamed 4-nmi_and_dma to 4-irq_and_dma, since it wasn't NMI that it was testing. If you have further questions about this test, start a new thread, since 4-irq_and_dma isn't related to this branch timing issue (5-branch_delays_irq is the one that is).