Visual scoping operations for physical assembly

Planning is all in the head, right? In a paper published in CogSci (preprint out now), Judith Fan, Marcelo Mattar, David Kirsh and I looked into how people might exploit the way objects are physically arranged to plan better: https://arxiv.org/pdf/2106.05654.pdf

How can we think about planning? Say we want to get from La Jolla to Cathedral City. We can think of this as an abstract graph over states (being in a place) and actions (driving in a direction). map_graph

Planning then is just searching this graph from a path from the start node to the finish node. This is the classical view of planning—and indeed, that’s how machines like your GPS are actually planning your routes.

However, a map is not just an abstract graph, it is also a physical object that we interact with. We fold maps, or zoom in, or use our hands to bound a region of it. map_affordances

This carves out a region of space to focus on. How can doing that make planning easier?

We use the block construction task, in which you need to build a tower of a certain shape. task

We can make the same abstract planning graph here. The green are goal states, the red ones are dead ends. graph

What happens when we bound off a regular region of the image of the tower environment? We have selected a a region of the planning space, too! scoping_1

Now we only need to search in the marked out part of the planning space—a much easier problem than searching for a path along the entire graph!

After we have found a solution, we can focus on a different part of the structure. And because we have committed, the search starts anew on a much smaller graph. scoping_1

This is visual scoping: deciding which part of the problem to plan next by choosing which region of space to focus on next.

To investigate this, we ran a series of simulations in which AI agents were building the towers either using visual scoping or not.

On the task presented here, visual scoping can reduce the time of searching for actions by half, and double the rate of success! results_1

However, while visual scoping reduces how much time we need to search for actions, it takes additional time to perform visual scoping itself, because we also have to think about which region to select.

Visual scoping allows us to balance the cost of searching for actions and the cost of breaking the problem down by doing visual scoping depending on how important it is to the planner to reduce the time spent searching the state-action graph.

So this might help explain why people can solve difficult planning problems seemingly easily: we don’t just search across the abstract planning space, we can make use of the actual physical environment we’re in to break it into more manageable chunks.

Find the paper here: https://arxiv.org/pdf/2106.05654.pdf, and come see my talk at CogSci2021.