How not to build a perception system pt 2

In pt 1, we looked at a very naive perception system built from first principles, where every agent independently checks all potential targets for visibility each frame. Today we’re going to look at building a system that is more performant. The key concept, is to move the code into a single, centralized system, where we can control how many perception checks are done each frame.

Why is this necessary? Well, if we do everything separately, we effectively end up with some pseudo-code like this:

For each Perceiver:
  GetAll EntitiesThatCanBeSeen
    For each EntityThatCanBeSeen:
      if Perceiver->WantsToSee(Entity) and
         Perceiver->IsEntityInViewCone(Entity) and
         RayCast from Perceiver to EntityThatCanBeSeen is clear then
        TargetIsVisible = true

A Perceiver here is an enemy agent or sensor and an EntityThatCanBeSeen is pretty much every character entity in the game, and potentially more, if you want your agents to be able to see things like mines or bombs etc. The WantsToSee function here, checks if the entity is in fact an enemy or a danger or something the agent cares about.

If we have 2 players and 8 NPCs, that gives 8 Perceivers and 10 EntitiesThatCanBeSeen, which means that inner loop is running 8 * (8+2) = 80 times every frame. If things scale up, and we have 8 players and 60 NPCs, we now run the inner loop 60 * (60+8) = 4080 times every frame! This is what we call O(N * M) scaling – ie the total number of checks is N (number of Perceivers) * M (number of EntitiesThatCanBeSeen). In our case, since every Player and NPC is a character, M (number of EntitiesThatCanBeSeen) is always bigger than N (number of Perceivers). And since the player count is small in relation to the number of NPCs, we can approximate this as O(N * (N + PlayerCount)), which is worse than O(N * N) – which in comp-sci talk is short-hand for “very bad performance”.

This may not be quite as dire as it seems at first glance. Perceiver->WantsToSee() is a cheap function – essentially just comparing 2 enums. Perceiver->IsEntityInViewCone() is another cheap function – just a dot-product and distance check. The real expensive function call is the RayCast – this has to check collisions in the physics system. And while CPUs and physics systems are constantly improving, this is still going to involve a lot more calculations than the other checks. And lastly, GetAll EntitiesThatCanBeSeen can potentially be another expensive function. If this function queries the world to build the list of entities each time, it’s going to be expensive.

Best Practices: When you have multiple tests that need to be performed, it is almost always a good idea to do the cheap tests first so that you can skip the expensive tests if they are not required.

So if we look at our examples above, if all the NPCs are enemies and only care about seeing the players (ie no player ally NPCs), we’re only actually going to be doing 8 * 2 = 16, or 60 * 8 = 480 ray casts. That is far more reasonable, but 480 ray casts per frame at 60fps is 28,800 ray casts per second — which is absolutely not fine on most games!

But in order to really get this under control, we need to build a single system to do the checks, rather than each agent checking independently. But why is this better?

Firstly, if we have a single, central system, we can register Perceivers and EntitiesThatCanBeSeen with the PerceptionSystem. This means we can completely eliminate the GetAll EntitiesThatCanBeSeen calls, as we simply add/remove Perceivers and EntitiesThatCanBeSeen as they are created and destroyed, and do not need to check every frame or multiple times every frame. That handles one of our potentially expensive functions.

Using a centralized system also allows us to control how many tests we do every frame. Doing the checks independently means we do not know how much time our perception checks are going to take each frame. The game may run fine with only a handful of agents, but because of the O(N*N) nature, adding more agents can cause the system to scale exponentially and you could easily go from happy 60 fps to crawling at <10 fps just when things get tough for the player and they really need that responsiveness! But by keeping track of which checks we have done, we can limit the number of checks we perform each frame and thereby control the performance impact and keep things running smoothly.

So our PerceptionSystem class would now look something like this:

class PerceptionSystem
  Array<Perceiver> Perceivers
  Array<Entity> EntitiesThatCanBeSeen
  const int MaxRayCastsPerFrame
  int PerceiverIndex
  int EntityToCheckIndex

func RegisterPerceiver(Perceiver):
  add Perceiver to Perceivers

func DeregisterPerceiver(Perceiver):
  //if we remove a Perceiver before PerceiverIndex, 
  //we need to adjust the Index in order to not skip 
  //a Perceiver after the array has been resized.
  if index of Perceiver in Perceivers < PerceiverIndex:
    PerceiverIndex -= 1
  remove Perceiver from Perceivers. 

func RegisterEntityThatCanBeSeen(Entity):
  add Entity to EntitiesThatCanBeSeen

func DeregisterEntityThatCanBeSeen(Entity):
  //if we remove an Entity before EntityToCheckIndex, 
  //we need to adjust the index in order to not skip 
  //an Entity after the array has been resized
  if index of Entity in EntitiesThatCanBeSeen < EntityToCheckIndex :
    EntityToCheckIndex -= 1
  remove Entity from EntitiesThatCanBeSeen

func Update:
  int NumRayCastsThisFrame = 0
  while PerceiverIndex < Perceivers.Num():
    Perceive = Perceivers[PerceiverIndex]
    while int EntityToCheckIndex < EntitiesThatCanBeSeen.Num()
      Entity = EntitiesThatCanBeSeen[EntityToCheckIndex]
      if Perceiver->WantsToSee(Entity) and
         Perceiver->IsEntityInViewCone(Entity):
        if RayCast from Perceiver to Entity is clear then
          Perceiver→SetTargetIsVisible = true
        else:
          Perceiver→SetTargetIsVisible = false
        NumRayCastsThisFrame += 1
        if NumRayCastsThisFrame ≥ MaxRayCastsPerFrame:
          return
        EntityToCheckIndex += 1
      //reset the index so the next Perceiver starts 
      //at the beginning of the Entity list
      EntityToCheckIndex = 0 
    PerceiverIndex += 1
  //Next frame restart from the beginning
  PerceiverIndex = 0

In this centralized system, MaxRayCastsPerFrame allows us to limit the expensive RayCast checks to a manageable limit that we can set. We will now have a consistent limit on how much CPU time our perception system can take each frame. The two indices, PerceiverIndex and EntityToCheckIndex, keep track of our place in the Perceivers and EntitiesThatCanBeSeen arrays, so that in the next Update, we can continue from where we left off. We effectively spread out the checks over as many frames as necessary to ensure all Perceivers check all Entities.

This does mean that we are introducing a bit of latency into our perception system. Instead of checking a specific Perceiver vs Entity every frame, it may take several frames for the system to cycle through until we check the same Perceiver vs Entity again. Considering that human visual reaction times are typically 100-300ms, running at 60fps (~16ms per frame), if it takes 6-8 frames to loop back around, that’s only about 100-130ms, which is well within human perceptual reaction times.

But that’s not the end of the story. Stay tuned for pt3, where we will go deeper and analyze some issues with this approach, including the latency issue just mentioned, and talk about how to improve it further!

The views, thoughts, and opinions expressed on this website are my own and do not reflect those of my employer or any affiliated organizations.

How not to build a perception system pt 2

Dev-Log 1