YKHD

6/6/18, 1:55 AM: Computer Vision: Bundle Adjustment

In the application of computer vision in 3D reconstruction, bundle adjustment is the optimization of various parameters relating to the reconstruction. Most understandably, we might want to optimize the 3D locations of viewed features and camera positions from a stereo reconstruction.

Optimization
Optimization implies that there is an error to minimize in a system. In this case, the system is the 3D reconstruction, containing all of the reconstructed points, and for simplicity of explanation, two camera positions that viewed the reconstructed points. The error in the system is a little less straightforward to explain.

To compute an error in the system, we must first be able to make a measurement, and compare it to an observation. The measurement to be made here is the triangulated position of features viewed by both cameras. Via triangulation, a 3D position of a feature can be found, given the location estimates of both cameras, and the bearing of the 3D feature from the cameras, obtained using 2D image coordinates. Several of these 3D points are measured, by iteratively matching 2D image features, and then triangulating their 3D positions. The 3D point locations are computed assuming the matches, camera locations, and 2D image coordinates are accurate, which they are exactly not, and this is where the error arises. How the error is measured is fairly straightforward. The (probably inaccurate) estimates of camera poses are known, and their calibration matrices are known, so their projection matrix estimates are known. One may then project a reconstructed 3D point back onto the image plane of both cameras (one at a time for now).

Certainly, because of the assumptions that the triangulation process makes about the accuracy of the parameters passed to it, a 3D point reprojected back onto the image plane will not lie exactly on its parent 2D feature. This disparity is called the reprojection error and it is the error we seek to minimize. This reprojection error is summed up analytically for all the 3D features seen in all camera frames, a bundle of camera frames and hence it gets its name, bundle adjustment. The differential of this function is then the cost function, and is used in a minimization problem. For the curious, this is a task well suited for an optimization library, something like Ceres-Solver (BSD licensed), by google.

Parameters
In order to obtain the cost functor for one observation of a 3D feature, the following parameters are required:
1. Focal length
2. Distortion parameters
3. 3D View pose (translation, rotation)
4. 3D Feature position (translation)
5. 2D Feature position

The first four parameters are for computing the projected 2D feature position on the image plane that had observed the 3D feature. The last parameter is an immutable (unchangeable) parameter for comparison, and computation of the reprojection error. Detection of a 2D feature is the least unstable, and is thus used as some kind of ground truth to compare the measurement against.

A unique observation is made when a camera sees and recognizes a 3D point in the reconstruction from the 2D features in the image it contains. Hence, for each 3D point that is seen by each camera, an observation consisting of a unique combination of the above parameters exists.

For example, 3 3D features are seen and shared by two camera frames. Then, 6 different combination of the above 5 parameters exist.