This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

Elapsed clock timer technique

Elapsed clock timer technique
by on (#30087)
I've finally gotten the ultimate clock timing code working on the NES. With it, you can find out how many clocks a piece of code took, only executing it once. It uses the DMC as a timebase, and can measure from 0 to around 3380 clocks. nes_code_timer.zip (ca65 source only)

For example, this measures how long sprite DMA takes:
Code:
jsr time_code_begin
lda #1
sta $4014
jsr time_code_end
jsr print_dec_xa     ; prints 520

I've found some interesting things with it:

* Sprite DMA takes 513/514 clocks, depending on whether it's started on an even or odd clock.

* DMC DMA read normally takes 4 clocks, but only 3 if a memory write is occurring at a critical moment (the write probably cuts the DMA read short).

* Starting a DMC sample can add 3 or 4 clocks if the sample byte needs to be read immediately. It seems to depend on whether the write is on an even or odd clock.

It's also a convenient way to time a short routine (assuming you can quickly upload new code to your NES, that is). Don't expect it to work on emulators yet, as it probably depends on some aspects of the DMC not yet fully documented.

The timing technique is fairly simple. It is composed of two parts: synchronization and elapsed time determination.

Synchronization: Let's say there's a digital clock that only shows minutes (no seconds), and you want to be looking at it right when the minutes change, but you can only look at it for a second at a time, and must wait at least 10 seconds between checks. How do you efficiently arrange it so you're looking at it the moment the minutes change? First, make a note of the current time, then check it every 10 seconds until the time has changed sometime between checks. At that point, begin checking it every 59 seconds. Within ten minutes you will be observing it the moment the minutes change.
Code:
Check   ------C----------C----------C----------C
Minutes ---M-----------M-----------M-----------M


Elapsed time: Now that you looked at the clock the moment the minutes changed, you can go do some short task that takes under a minute, then come back to the clock and determine how long the task took, to an accuracy of one second, even though you can only look at the clock no more often than once every 10 seconds. How? Check the clock once every 59 seconds until the minutes change as you're watching. The number of times you check is the number of seconds the task took.
Code:
Check   -------C----------C----------C----------C----------C
Minutes ---M-----------M-----------M-----------M-----------M


The key concept here is that even though we can't check every second, we can effectively check every second by checking every 59 seconds, since the 59th second is equivalent to the previous second, relative to a change in minutes.
Re: Elapsed clock timer technique
by on (#30103)
Interesting discoveries. A few questions just to make 100% sure I'm on the same page.

Quote:
Sprite DMA takes 513/514 clocks, depending on whether it's started on an even or odd clock.

Would that be 514 clocks if the $4014 write-cycle is on an odd clock, and the usual 513 clocks otherwise?

Quote:
* DMC DMA read normally takes 4 clocks, but only 3 if a memory write is occurring at a critical moment (the write probably cuts the DMA read short).

Any idea on which clock(s) are the deciding factor, or is it too early to say?

Quote:
* Starting a DMC sample can add 3 or 4 clocks if the sample byte needs to be read immediately. It seems to depend on whether the write is on an even or odd clock.

So a write to $4015 that turns ON an unbuffered DMC channel will execute for at least 6 clocks, meaning 3/4 clocks for some preparation work and 3/4 clocks for the actual DMA? Or am I misunderstanding and these clocks are just the result of a possible DMA occuring right away?

by on (#30104)
Quote:
Would that be 514 clocks if the $4014 write-cycle is on an odd clock, and the usual 513 clocks otherwise?


I'm not blargg, but for my best, 512 cycles to transfer data, plus 1 cycle. An extra cycle is added if we're on an odd cycle.

by on (#30127)
Wow! This is a pretty clever idea!

Say a routine takes 12 seconds. At the very moment you're done with that routine, check the clock. 59 seconds later, check it again, and keep doing so every 59 seconds until the clock changes. It will change after 12 checks:

12 + 59 = 11
11 + 59 = 10
10 + 59 = 9
...
1 + 59 = 0

Is this what you're describing? But I am confused about one thing: You say check the clock every 59 seconds, and only look for a second. Wouldn't that actually make the durations each 1 minute (59 + one second = 60), and because of that you'd be checking forever?

by on (#30132)
Quote:
Would that be 514 clocks if the $4014 write-cycle is on an odd clock, and the usual 513 clocks otherwise?

That's what it seems. Basically I get one timing if I have a delay of N before it, the other timing if I increase that to N+1. The question is, what defines even and odd?

Quote:
Quote:
* DMC DMA read normally takes 4 clocks, but only 3 if a memory write is occurring at a critical moment (the write probably cuts the DMA read short).
Any idea on which clock(s) are the deciding factor, or is it too early to say?

Not sure yet. I think it's only one clock of the four, not just having the CPU write fall on any of them. Don't take any of the findings as certain, just some interesting preliminary findings.

Quote:
Say a routine takes 12 seconds. At the very moment you're done with that routine, check the clock. 59 seconds later, check it again, and keep doing so every 59 seconds until the clock changes. It will change after 12 checks: [...] Is this what you're describing?

Yes. I guess I left out the most important thing in my description: the motivation for it. If the CPU could check something every clock, then it could just wait until some periodic event occurred, run the code to be timed, then see how many clocks until the event occurs again, and subtract that from its period. But the CPU can only check every 7 clocks at best (load absolute, branch), so this alternate approach is necessary.

Quote:
You say check the clock every 59 seconds, and only look for a second. Wouldn't that actually make the durations each 1 minute (59 + one second = 60), and because of that you'd be checking forever?

"Do X every T" means that the time between the beginnings of Xn and Xn+1 is T, not that the time between the end of Xn and the beginning of Xn+1 is T, though that's what the computer is told in this case (delay loop between checks, which takes into account the time required to check).

by on (#30138)
Well, the pAPU has an even/odd issue related with cycle counting. Perhaps there's a common even/odd, since they're on 401X..? Just guessing...

by on (#30139)
Quote:
Well, the pAPU has an even/odd issue related with cycle counting. Perhaps there's a common even/odd, since they're on 401X..? Just guessing...

The more I think about this, the more likely it seems. The DMC inserts wait states while it accesses memory, so it would make sense to use the same logic for sprite DMA, and put in the APU's I/O space. The even/odd timing for $4014 was always synchronized the same with the DMC timer as well. So, what happens when sprite DMA and DMC DMA occur at the same time... apparently it only adds 2 clocks in this case, rather than the usual 4. When I do more detailed testing, I'll probably find that in these cases the DMC DMA is getting a junk byte.

by on (#31180)
I just wrote another timing routine which uses the APU length counters as a timebase. This allows timing a piece of code that takes up to around 2 seconds, to an accuracy of 16 clocks. If anyone's interested I'll post the code. I figure it could be useful for timing routines, finding the minimum, average, and maximum time taken.

by on (#31186)
My understanding about how DMA units work (using my limited SNES knowledge as a basis) is as follows:

1. Halt the CPU
2. Synchronize with the DMA unit's own clock
3. Perform the transfer
4. Synchronize with the CPU's clock
5. Allow the CPU to resume

Suppose the DMA unit runs at 1/2 the CPU clock speed (that is, 1 DMA clock = 2 CPU clocks). Also, suppose the beginning of each DMA cycle occurs halfway into a CPU cycle:
Code:
+---+---+---+---+---+---+
|   |   |   |   |   |   | <-- CPU clocks
+-+-+---+-+-+---+-+-+---+-+
  |       |       |       | <-- DMA clocks
  +-------+-------+-------+

When a DMA event occurs in this scenario, the CPU must first be halted. However, it halts in the middle of a DMA cycle (either 1/4 or 3/4 into the cycle). Thus, the DMA unit must wait until the cycle finishes. This means either 1/2 a CPU cycle elapses, or 1-1/2 cycles, depending on which CPU cycle the event occurred on. The DMA now performs its job, then when it finishes it waits until the next CPU cycle finishes before resuming. This delay is always 1/2 a CPU cycle, regardless of when the DMA starts/ends. Thus, the total latency (not including the actual DMA transfer time) is either 0.5 + 0.5 = 1 CPU cycle, or 1.5 + 0.5 = 2 CPU cycles.

On the NES, memory accesses occur when M2 is high. As I understand, this corresponds to the second half of a CPU cycle. Thus, if the DMA unit works as shown above, there would be two memory access periods per DMA cycle - one at the very beginning, and one at the halfway point of the cycle. This might explain why bunnyboy had problems with his PowerPak interfering with sprite DMA - if the memory read occurred too early in the DMA cycle, the address lines might've been unstable, thus causing the PowerPak to get confused.

Just a thought. No idea how accurate this might be.