So how do you accomplish this on the NES? Well, the sprites are drawn in a prioritized order from the OAM (Object Attribute Memory). Whichever sprite comes first in the OAM will be drawn on top of the ones after. The simplest way to achieve this is to keep a sorted list of all the objects, sorted by their Y coordinate. Then I can copy the sprites to the OAM buffer in this order every frame.
I settled with a simple insertion sort, since it is a very simple algorithm, and performance is very good on almost sorted lists. Since the objects don't warp around the map at high speed, chances are the order of the list won't change much from frame to frame. So I spent this week writing a special sorted buffer and an insertion sort routine. I say this week, since I ended up with some stupid little bug that had me debugging forever. I finally just deleted the routine and rewrote it, and bang! Now it works. Assembly can sure make the simple things crazy hard.
Now on to the next problem. I mentioned in my last post that the NES can only handle 8 sprites at once on a scanline. After that the sprites with lower priority simply disappear. Since I now have 15 characters potentially on the same scanline at the same time, I had to figure out how to handle that. I can't do a simple random order of all the sprites since that would undo all the work of sorting them in the first place. So what to do?
For now I have a simple solution based on the PPU sprite overflow flag. This flag is automatically raised by the PPU when it detects more than 8 sprites on a scanline. It's supposed to be buggy and unreliable, but I think it works OK so far. If this flag is raised, I cycle all objects in groups of four. That way, most overlaps stay fine, but some glitches might occur (and do). They are very small, however and are overshadowed by the flickering anyways.
Here is a little video showing 15 characters animated and walking around randomly on the screen. Youtube can only show 30fps, so the flickering looks a bit strange with the frame blending. Anyways, I think it looks OK.
My plan is to scan through the sorted list and find every occurrence of 5 or more objects that are closer than 32 pixels from each other in the Y direction. That means those objects will have to be cycled and I can leave the rest alone. That way I can minimize the errors in "Y ordering" and only sacrifice it for the ones that are really necessary.