PredRecon: 3d Reconstruction by Purposeful Movement Based on Surface Prediction

June 4, 2023

This article studies the RAPTOR motion planner which improves a local planning for better risk-awareness and active exploration for the navigation in unkown environments.

What is this article about

Today, we will cover this (opens in a new tab) ICRA2023 paper: PredRecon: A Prediction-boosted Planning Framework for Fast and High-quality Autonomous Aerial Reconstruction. Before start, let us you have a 3d scanner, and you want to have 3d mesh model of a chair.

Here is what a human moves to full scanning:

We know how a chair looks like. This will somewhat guide our scanning movement.
We keep track of covered and uncovered region by watching the real-time 3d scanning result. Of course, we move toward the uncovered region to choose the next viewpoint.

If you can understand the above, you might not have much difficulty in understanding this paper (although there are many technical terminologies, you can just study them)

Knowledge helpful for understanding

PCN (opens in a new tab) point completion network
Hierarchical path planning: 2 step planner = global + local
PointNet backbone (opens in a new tab)
Huber loss (opens in a new tab)
Chamfer distance (opens in a new tab) (measure of reconstruction perforamance)
GHPR (opens in a new tab): hiddle point removal from pointcloud
Traveling salesman problem (opens in a new tab)
B-spline representation
Dijkstra algorithm (the way this paper used this is very similar to my previous paper (opens in a new tab))

Structure of the paper

This paper contains two topics: 1) surface prediction module (SPM) and 2) viewpoint planning for a good MVS (multi-view stereo) performance. (here, MVS was used as an synonym for 3d reconstruction)

Section 1: Surface prediction module (SPM)

First, we are interested in predicting the structure of the target. The idea is that having a rough guess of the structure can be helpful when planning the future viewpoints (next section).

You can think of SPM as a kind of PCN combined with a scale estimator network (to assist better performance of PCN). But the authors did several variations.

Step 1: Scale estimation network

Input: incomplete pointcloud
output: normalized incomplete pointcloud

For any network, scale normalization is important. This step estimates the scale (x,y,z) of total point cloud. Although you can think of this as a 3d bounding box (BB) of a given pointcloud, it has the ability to derive BB even for missing regions. As a very simplified example, if we have only half of a circle, it can deduce its total scale as 2*radius.

Step 2: Pointcloud completion network (PCN)

Input: normalized incomplete pointcloud
output: prediction (completed) pointcloud (inverse normalized) + inner space / surface discrimination

With the aid of the step 1 network, this network receives the normalized version of the incomplete point cloud and fill the missing region. Although it has a very similar role with PCN, the authors removed convolution operation of real time feasibility. This network is trained by reducing the Chamfer distance. After filling the missing regions, a post-process step is involved to discriminate internal space (hidden space from outside) and surface. In the following viewpoint planning step, we will not sample view points inside the internal region.

Section 2: Viewpoint planning for multi-view-stereo

In the previous section, we predicted the surface of structure from SPM along with internal space where a viewpoint should not be selected. Using the prediction and the history of the covered region by a robot, we now plan the view path.

Global coverage planning: pick the best views and plan the order to visit them

First step here is to generate a viewpoint skeleton which can cover each uncovered cluster (assume that we computed clusters for surface points).

As can be seen in the above picture, each cluster has a region inside which we can observe the cluster. This region is depicted as gray 2d cones in the figure.
When we sample a set of view points inside each cone, we can choose the best view point which can observe the largest number of uncovered surface points.
Having them, we can compute a single-shot traveling order by ATSP (See eq (9)). The authors used Lin-Kernighan-Helsguan algorithm (opens in a new tab) to solve the TSP.

In this way, we can get a sequence of viewpoints which can observe uncovered region very wall while minimizing total travel distance and the deviation from the previous movement (the authors referred this as global consistency).

Local planning: generate local trajectory between views

Given two viewpoints from the previous step, the second step outputs a local trajectory represented with a B-spline. According to eq (10), the control points of the B-spline are determined so that:

surface points are observed with enough magnitude,
surface points observed with consistent distance along timesteps,
relative view angles along time steps are similar to desired triangulation angles.

The local control point sequence is obtained with Dijkstra algorithm.