This page is a mirror of Tepples' nesdev forum mirror (URL TBD).
Last updated on Oct-18-2019 Download

# techniques for tracking world position?

For large scrolling games that include vertical scroll, how do you all handle keeping track of the world position of objects?

I was originally thinking I'd just use 2-byte values for position (worldX, worldY), and just track position relative to the whole large level. But I keep finding myself wanting/needing to divide by 240 when translating from world coordinates to screen coordinates, which is a pretty slow operation (unless somebody knows a math trick that I haven't thought of?)

I'm now wondering if I should instead think in terms of discrete screens, so I instead track which screen I'm on as well as X,Y position within that room. So each level would logically made up of many screens, and I only have to keep track of the player's position within the screen (which I believe is how metroid handles it?). It seems like this way I'll use a lot more ram tracking every object's screen as well as position within the screen, and have to do more bookkeeping dealing with moving between screens.

I'm curious if you folks with more experience have ideas of the pros and cons of both ways of handling it (or are there other techniques I haven't thought of?)
Dont allow Y > 239. It's a simple subtract 240 from low byte. No division needed.

Changed my mind.
Not using values 240-255 for the lower byte of the Y coordinate complicates collision between objects near the bottom of one "screen" and objects on the "screen" below it.

Tokumaru had an idea of making hardware screen coordinates separate in some way.
gauauu wrote:
But I keep finding myself wanting/needing to divide by 240 when translating from world coordinates to screen coordinates, which is a pretty slow operation (unless somebody knows a math trick that I haven't thought of?)

I used to have the exact same problem back when insisted on having fixed synchronization between world coordinates and screen coordinates, and I solved this by using dynamic synchronization instead.

I basically have a CameraY variable which's in world coordinates, and an NTCameraY variable in screen coordinates. Whenever the camera moves, both coordinates are updated by the same amount, and NTCameraY has some extra logic to skip values 240-255. All operations are then relative to these coordinates, and there's no hardcoded synchronization between the level and the screen. If for example I need to render a new column of metatiles, I'll use CameraY to calculate the column's source address in the level map, but I'll use NTCamaraY to calculate the target VRAM address for the column.

The link between world space and screen space is created by giving each one its own anchor, so that the relationship between everything else and the anchors is the same, regardless of the absolute position of each anchor. I hope I'm getting the idea across.

In theory you could do the same in the X axis and make world and screen spaces completely independent, but there's little reason to do that when the name table has such a nice width as 256 and you can save some time by using a single coordinate for both spaces.
dougeff wrote:
Dont allow Y > 239. It's a simple subtract 240 from low byte. No division needed.

Yeah, that was my thinking with option #2 that I mentioned above

tepples wrote:
Not using values 240-255 for the lower byte of the Y coordinate complicates collision between objects near the bottom of one "screen" and objects on the "screen" below it.

And that was one of the bookkeeping issues that I was referring to.

tokumaru wrote:
The link between world space and screen space is created by giving each one its own anchor, so that the relationship between everything else and the anchors is the same, regardless of the absolute position of each anchor. I hope I'm getting the idea across.

Absolutely. This is the solution I started mulling around in my head after posting this. I had gotten as far in my head as having a camera anchor, but hadn't worked out whether I needed a separate anchor for each object, or if they could somehow reuse the relative distance between the world camera and NT camera anchor. (Which really I guess is just a tradeoff of RAM vs CPU time)
Just for fun, here's my post from 2007 when I was trying to solve the same problem: viewtopic.php?f=2&t=3230

I'm kinda surprised to realize after all these years that the solution was given to me by someone else (dvdmth)... I always thought I had the idea myself!

gauauu wrote:
This is the solution I started mulling around in my head after posting this. I had gotten as far in my head as having a camera anchor, but hadn't worked out whether I needed a separate anchor for each object, or if they could somehow reuse the relative distance between the world camera and NT camera anchor. (Which really I guess is just a tradeoff of RAM vs CPU time)

You just need two anchors, one for everything dealing with the game world (objects, collisions, level map) and another one for dealing with the name tables (setting the scroll through \$2000/5 and updating VRAM through \$2006/7). The idea is that each anchor marks the same point (i.e. the top of the camera) in their own space, so a block that's, say, 5 blocks below the world space anchor, will be rendered 5 blocks below the NT space anchor. As long as the relative distance to the anchors is the same, you get all the synchronization you need.
I'd just LUT the division, it's such a common operation.
I don't see why you'd use a LUT for something you can have ready at all times at no performance cost, but that's just me.

Unless you don't understand the other possible solutions well enough, in which case I can understand a programmer using something they're more comfortable with even if it's not optimal. I'm definitely like that, sometimes people here (tepples, mostly) come up with crazy ass algorithms that are faster/smaller than those I can come up with, but I don't use anything I don't fully understand. At least when it comes to NES programming, with more modern stuff I can tolerate a moderate amount of blackboxing.
I use "logical" 256x256 pixels screens and separate screen logics when I do vertical scrolling. It's just a matter of updating both things (level position and screen position) as you scroll. That way you can use byte coordinates and rely on just a circular collision buffer (mine is 256 bytes big as metatiles are 16x16).

A simple situation, when the engine just scroll upwards "pixels" pixels, I just...

Code:
scroll_y -= pixels;
if (scroll_y < 0) scroll_y += 480;
scroll (0, scroll_y);

cam_pos -= pixels;
cam_pos_lsb = LSB (cam_pos);

cam_pos is world-coordinates. scroll_y is screen coordinates. To keep them synchronized, you just update both at the same time.
Division of a variable by a constant, in particular a divion of a variable by 240, is extremely simple and reasonably fast to do on a 6502.

I still think tokumaru's solution is better, but this is the second best.
Did someone write a 16-bit division by 240 routine? (I don't see one in the reference you linked, but it would probably be good to have one around.)
Do you mean this one...

http://6502org.wikidot.com/software-math-intdiv

(unrelated multiplication routine also here)
http://6502org.wikidot.com/software-math-intmul
dougeff wrote:
Do you mean this one...

No, I didn't mean generic division routines, I meant one optimized for division by a constant 240. Though, you can apply the technique on the wiki page bregalad linked, I was just wondering if someone had already worked one out for 240, like the others in Omegamatrix's thread (linked in that wiki article).
Just shift right by 4 then divide by 15.

Pro tip: Store y-values in fixed-point format with the lowest 4 bits as the fractional component. This simplifies a ton of nametable operations, among other things. For x-values, use a whole byte for the fractional component.
One practical problem with 12.4 fixed point, where 16 units represent one pixel, is that your objects' diameters are closer to 256. This means you have to maintain 16-bit precision throughout more of your collision detection.
pubby wrote:
Just shift right by 4 then divide by 15.

Ah, yes that's correct for divison!

Realizing I actually meant "modulo" and not division. Sometimes I forget that you don't necessarily get both from the same calculation. (Though there's probably a similar useful composition of modulos.)
For modulo I'd probably just use repeated subtraction (along with 12.4 fixed point)

Code:
.repeat 5, i
cmp #15 * (%10000 >> i)
bcc :+
sbc #15  * (%10000 >> i)
:
.endrepeat

Takes about 25-30 cycles.
Edit. Removed. It's too late for me to do math.

Edit 2.

I'm starting to think that the data should be 256x256 sized rooms (16x16 tiles). You will only be displaying 256x240 at most, but the data should be the full 256 (16 tiles high). Then, no math required, except to figure out how to shift each room.

I retract my earlier response, pending more contemplation.
The problem is that having "dead zones" in your level map complicates physics and collisions. You have to constantly account for that phantom row when calculating distances, moving objects, and so on. Not to mention it can restrict your usage of metatiles larger than 16x16 pixels, since they'll have part of them cut off when used at the bottom of a room. That's a world of little things I'd rather not worry about. Your solution does help with attribute handling though, I'll give you that!

Being a huge fan of the MVC model of software development, I consider it poor form to let hardware aspects influence the architecture of your game worlds as much as this (i.e. pretending parts of your level map don't exist). IMO, having 2 separate anchors, one for the model and another one for the view is a much classier and hardware-agnostic solution, but to each his own.
This all comes down to
How do you store your levels?
How do you need your objects to behave?
What is the game?

Do you care if an enemy resets once it goes off screen?
Do your levels mostly follow 1 direction but you swing out to the other direction, I.e Mario 3 is mostly L and R but you can go up and down, Radian is mostly U and D but you can scroll L and R a bit?
In those cases you can store all your entities in an array from L to R. Then you keep a Head and Tail index. And each ent def holds a number of chars till next ent.
so
Type 00,14,02,05
Dist 00,05,04,01
So this way you add active entities from the Head, and when Tail moves over one, you remove the entity from being active( unless your entities can move and follow you across screens, like a boo for example at which point you only cull them once they are "out of screen" where out of screen might be screen size + extra)
If you count Chars or Blocks or Screens is up to the game design.
Which is basically the Anchor system tokumaru mentions but you have a Left Anchor and Right Anchor portion to the "window" rather than just a start.

Once you have entities you keep them in Screen Space, and cull as needed. If you are moving Left then you spawn the new enemy off screen on the L, if you are moving R you spawn them off screen on the R, so you don't need to work out where they are, no maths For speed you can keep a OnScreen List and then a OffScreenButActiveList, the first gets full update and animation, the 2nd gets position only and no animation logic, you can also if your system handles it lower their update and collision rate to keep the frame rate rolling.

For something like Gauntlet that won't work, and you start to get into a Quad Tree like activation zone tables, or you carve up things in to large blocks and then add to a active block list, then you keep indexes into the position of those blocks, updating the list as you cross block boundaries.

Or you go with "volumes"/"zones" where you place down some special chars, or an AABB list that you can quickly detect against. Then you put triggers on the zones, so when the player enters the Zone, you spawn A,B,C and X,Y,Z etc then as the player moves around they enter the volumes/zones and that controls what objects are made. Used in things like RPGs where you have doors and towns and save points etc
rainwarrior wrote:
pubby wrote:
Just shift right by 4 then divide by 15.

Ah, yes that's correct for divison!

Yup, and if your Y coord doesn't go above 4096, then you can use a simple LUT for the div-by-15. Otherwise a binary search or the wiki's decomposition.
Quote:
The problem is that having "dead zones" in your level map

I think you misunderstood me. In my concept of 256x256 'rooms'...that is just how the data is stored. The screen is only 256x240, so if you're in the top most of the top 'room' the bottom is cut off. But if you start scrolling down, you see it. Nothing is 'dead zone'.

I'm thinking of a top-down all-direction game like Crystalis.

I almost feel like slapping together a demo, but I don't have time.
Sorry, I completely misread what you wrote. I agree with you on levels built of 256x256-pixel screens/blocks/whatever, but then we're back to the original problem of finding the equivalent position of a block from the level map in screen space. The problem of "figuring out how to shift each room" is what can be solved either by a division by 240 (or 15) or the use of 2 synchronized anchors for the camera.
Quote:
Not to mention it can restrict your usage of metatiles larger than 16x16 pixels, since they'll have part of them cut off when used at the bottom of a room.

Actually Castlevania III does that with it's vertical scrolling screen. It seems to work fine. Mega Man games does that too but the verticall scroll is only "quick" scroll without gameplay.

Quote:
Did someone write a 16-bit division by 240 routine? (I don't see one in the reference you linked, but it would probably be good to have one around.)

I was going to write something, but then I undersood it would be better if whoever whants to implement that actually understood everything. In particular, it depends on many screen heigh you have in total. When supporting a heigh of up to 16 screens for instance, doing modulo is as simple as substracting 240*8, 240*4, 240*2 and 240 to a 16-bit integer whenever doing so will not result in a negative number. Quite simple really.

The trick to first shift right by 4 is clever, I didn't think about that. Does it work for modulo too, or only for division ? If the initial value fits in 12-bits then it's definitely a must, as it will save the substracting the hassle of doing it on 16-bit values, and doing it on a value that fits in the A register will lead to much shorter/faster simpler code. However, if the scroll values do not fit in 12 bit anyway, it's not worth it as it will not simplify the division operation itself.
You can do mod 240 by shifting right by 4, reducing mod 15, shifting left by 4, and taking the 4 bits from the original value.

Or assuming Python 3 division semantics, where // rounds toward negative infinity:
Code:
def mod_240(x):
return (x % 16) + ((x // 16) % 15) * 16

Or in 6502 (untested):

Code:
.proc get_camera_y_mod_240
tmp0 = \$0000

; Load camera_y bits 11-4 into A
lda camera_y_lo
asl a
sta tmp0
lda camera_y_hi
rol a
asl tmp0
rol a
asl tmp0
rol a
asl tmp0
rol a

cmp #240
bcc lt240
sbc #240
bcs not30
lt240:

cmp #120
bcc lt120
sbc #120
lt120:

cmp #60
bcc lt60
sbc #60
lt120:

cmp #30
bcc lt30
sbc #30
lt30:

cmp #15
bcc lt15
sbc #15
lt15:

; Now that we're under \$0F, shift that to the high nibble
asl a
asl a
asl a
asl a
eor camera_y_lo
and #\$F0  ; keep upper 4 bits; take lower 4bits from camera_y_lo
eor camera_y_lo

rts
.endproc

But with all the shifting I'm not sure if this'd be faster than a mod 240 that uses 16-bit arithmetic.
There's a really good algorithm in the thread tokumaru already linked. I'm using a technique very similar to it in my own engine, and as tokumaru also points out in that thread, it's not really bad to just call that once per frame (which makes my code feel more solid than trying to constantly sync up two variables).

I still do keep pixel scrolling and nametable variables separately, of course. It helps that my collision map is stored completely independent from my nametable map (even if they seem very closely related), as 2-bit values stored in groups of 16x16 tiles (= 256x256px). I pretend like the missing rows don't exist for everything other than loading in nametables, and it's working fine for me.
Sumez wrote:
There's a really good algorithm in the thread tokumaru already linked. I'm using a technique very similar to it in my own engine, and as tokumaru also points out in that thread, it's not really bad to just call that once per frame (which makes my code feel more solid than trying to constantly sync up two variables).

So basically, this thread is an exact duplicate of that thread, everything that has been discussed here had already been discussed there, and this was a complete waste of time of ours. Why don't people use the search function.
Bregalad wrote:
So basically, this thread is an exact duplicate of that thread, everything that has been discussed here had already been discussed there, and this was a complete waste of time of ours. Why don't people use the search function.

I attempted a search but didn't find anything. Probably more attempts with different search terms would have found it eventually, but I had no way of knowing whether or not such a thread existed.

I don't think it's really a waste of time to have a similar discussion 10 years later...
Agreed, this is an extremely relevant issue for any games wanting to do vertical scrolling on the NES, so it shouldn't necessarily be hidden away in a 10 year old archive. In fact this is a subject that could easily warrant an entire wiki article.

The title of the old thread doesn't exactly make it easy to find either. You need to pinpoint the exact problem (which can be really hard to envision until you're already knee deep) and figure division by 15 is actually a solution for that problem (which works for me, but the repeated suggestion of synchornizing two individual values is just as good to me).
Sumez wrote:
Agreed, this is an extremely relevant issue for any games wanting to do vertical scrolling on the NES, so it shouldn't necessarily be hidden away in a 10 year old archive.

As opposed to the WWWthreads board datings from before the 2004 switch, it's not an archive, it's the same message board as here.

Quote:
The title of the old thread doesn't exactly make it easy to find either.

Indeed, neither is the title of this very thread.

Quote:
I attempted a search but didn't find anything. Probably more attempts with different search terms would have found it eventually, but I had no way of knowing whether or not such a thread existed.

Alas, the search function is indeed weak, one single misspellings and the results collapse to none.
Bregalad wrote:
So basically, this thread is an exact duplicate of that thread, everything that has been discussed here had already been discussed there, and this was a complete waste of time of ours. Why don't people use the search function.

Well, although the underlying problem was the same for both threads, the first question was very different ("how do I divide by 240" vs "pros and cons of different methods of handling vertical position") Both ended up getting into the same answers, but unless you know they're going to get to the same answers, they are both legitimate separate questions.

Quote:
Agreed, this is an extremely relevant issue for any games wanting to do vertical scrolling on the NES, so it shouldn't necessarily be hidden away in a 10 year old archive. In fact this is a subject that could easily warrant an entire wiki article.

It seems like a common enough problem that an article would make sense.
You can probably use a LUT of where each row of tiles maps into VRAM.
It's true that a LUT is the simplest and fastest solution possible, since you wouldn't even have to manually wrap around any values, but when you have levels up to 128 "rooms" tall, like my engine supports, you'd need a 4KB LUT. Not terrible if you have 256 or 512KB of total PRG-ROM, but still a significant amount to worry about depending on the type of bankswitching you're using and when you need that table to be mapped in.

EDIT: Actually, now that I think of it, a LUT wouldn't work for me, because my engine can do vertically looping levels, which causes the level map to repeat while the screen coordinates can keep going indefinitely.
I just noticed your divide by 240 code from 2007. That's some beautifully written code.
My code? Thanks, I have no idea how I came up with it!

EDIT: I think it's based on the algorithm in slide 5 of this lecture, but with a few optimizations (no restoring because of preliminary CMP, no initial shifting because I'm using half of the divisor).
Quote:
EDIT: Actually, now that I think of it, a LUT wouldn't work for me, because my engine can do vertically looping levels, which causes the level map to repeat while the screen coordinates can keep going indefinitely.

Division wouldn't work either, as the VScroll value will eventually warp around. This is a further proof that having a separate counter is a nice idea (even though division would work for non-looping levels, if you know the maximum size in advance).
I guess you're right. At first I thought that I could safely wrap around at 61440, which's a multiple of 240 and 256 and just past what my division routine from back in the day could handle, but now that I think of it, that would screw up the repetition of the level map since this value is not a power of 2.

Bregalad wrote:
This is a further proof that having a separate counter is a nice idea

Well, I just love this approach... The only drawback I can see is that for everything you need to find the NT position of, you need to calculate the distance between that thing and the level anchor, then add that difference to the screen anchor. Not a big deal when rendering rows and columns of tiles for scrolling, since the offsets from the anchor are fixed (i.e. a row of tiles at the bottom of the screen always starts at anchor + 240, the offset is constant), but if you needed to destroy a block or something in the middle of the screen, you'd need to calculate the offset first.

Dvisions would certainly be slower, because you'd need a different division for each thing you needed the position of, in addition to the divisions needed to find the scroll and the target address for new columns/rows of tiles. You could easily end up with 4 or 5 divisions in a single frame.
All the more reason for keeping your background graphic data completely separated from everything else in your game. You shouldn't need to do this synchronization thing for anything other than scrolling your nametables. Of course, that's easier said than done when you don't have a lot of PRG-ROM space to work with, and I know a lot of people keep the collision data as a part of their nametable data, but I'm glad I decided to split up the two quite early in my project.

My biggest reason for always syncing the "nametable scroll" to the "logical scroll" value via division by 15 is that I want to be able to "spawn" the stage at any given coordinate, so I need this routine anyway, and since it's still only called once per frame as a worst case scenario, there's no real loss for only using that. Keeps my code nice and tidy.
Sumez wrote:
You shouldn't need to do this synchronization thing for anything other than scrolling your nametables.

How do you propose we update the middle of a name table after destroying​ a block or collecting an item then? Characters interact with the level map and items in level space, so that's what you have, and when you need to change or erase them from the name tables you need to convert their positions into screen space, don't you?

Quote:
My biggest reason for always syncing the "nametable scroll" to the "logical scroll" value via division by 15 is that I want to be able to "spawn" the stage at any given coordinate

Totally doable with the 2 anchors approach. In fact, since there's no hardcoded relationship between the 2 separate spaces, you can start the primary anchor anywhere you want in the level, and the secondary anchor anywhere in the name tables, only the lower bits have to match, for obvious reasons. I always initialize my secondary anchor (NTCameraY) to CameraY & 15 (since my meta tiles are 16x16 pixels), no division necessary. Just start on the first NT row every time and it works just fine.

Quote:
so I need this routine anyway, and since it's still only called once per frame as a worst case scenario, there's no real loss for only using that. Keeps my code nice and tidy.

If you think you need the division and you somehow find a division tidier than an addition (to keep 2 anchors synchronized), that's your call, but I honestly think you haven't analyzed the alternate approach in depth yet, and are simply more comfortable with what you already have.
Well, it's purely situational. I won't say there's ever a correct or incorrect way to do it. I actually started out trying to update two variables side by side like you are describing, but I ended up having to do a lot of weird fixes throughout my code because everything was aligned to the nametable data (as in your example where parts of the map are destructible). That's not saying it's a bad approach at all, much more likely my ASM code was just too clumsy to begin with.

What you said confuses me. How does keeping two separate variables updated every time you change the Y scroll give you an advantage when working with logic that's tied to nametable addesses (destructible map tiles), compared to updating the "NTCameraY" once per frame through division?
As far as I see, the end result is the same (ie. the rest of my logic doesn't have to know how NTCameraY was generated), but I think I'm probably misunderstanding how your solution works, which would also explain why I had trouble implementing it, when you didn't.
Sumez wrote:
Well, it's purely situational. I won't say there's ever a correct or incorrect way to do it.

That's true. We usually give a lot of thought to these problems and design everything around the decisions we make, so we can't simply change our methods because someone said something else works better.

Quote:
What you said confuses me. How does keeping two separate variables updated every time you change the Y scroll give you an advantage when working with logic that's tied to nametable addesses (destructible map tiles), compared to updating the "NTCameraY" once per frame through division?

Are you saying that the only difference is the division itself, and from then on everything works the same for both approaches? If so, I agree with you, and the advantage is that you only need and addition to update NTCameraY, instead of a division.

But it's also possible that someone chooses to use multiple divisions, one for each coordinate that needs converting, as opposed to doing things relative to the anchors.
tokumaru wrote:
Sumez wrote:
You shouldn't need to do this synchronization thing for anything other than scrolling your nametables.

How do you propose we update the middle of a name table after destroying​ a block or collecting an item then?

Unless you're routinely doing huge set pieces where you animate huge parts of the background map, destruction should be less common than scrolling, making slightly slower execution of a destruction acceptable in a soft real time[1] system. Large background animations tend to be more for the collapse phase of a falling block puzzle game.

[1] Wikipedia's article defines "soft real time" as "the usefulness of a result degrades after its deadline, thereby degrading the system's quality of service."
I think you should have a register holding the y-position of the top of the name tables, and find breakable blocks relative to it.