Structure from Motion (SfM)

Here's a brief on Structure from Motion (SfM)

Structure from Motion (SfM) in general, as described in recent academic and professional discourses, refers to the ability to use imagery from cameras located at different positions (i.e. multiple perspectives = the Motion) to determine Structure (e.g. landscapes, monuments, architectural structures, etc.). The roots of SfM can be traced back to two key fields, photogrammetry and computer vision (see figure at right).

StereoPhotogrammetryis a relatively old technique for measuring and processing lengths and angles in photographs for mapping purposes. Reconstruction efforts were initially attempted using a pair of ground cameras separated by a fixed baseline. – from https://www.satpalda.com/blogs/concepts-of-photogrammetry.

The Computer Vision community has traditionally been driven from Bio-Mechanics and AI inquiries. Early achievements include recovery of 3D scene structure from stereo pairs where correspondence is established automatically from two images via an iterative algorithm.

The algorithm searches for unique match points between two images and recovers an intermediate form of 3D depth between them. SfM is an outgrowth of a combination of Photogrammetry and Computer Vision with the latter’s Simultaneous Localization and Mapping (SLAM), arguably one of the most important algorithms in Robotics, with pioneering work done by both computer vision and robotics research communities. – from: http://www.computervisionblog.com/2016/01/why-slam-matters-future-of-real-time.html.

SfM Origins

SfM's Origins

Stereo Photogrammetry, Computer Vision and the emergence of SfM algorithms and techniques are well described in the literature and so will not be reiterated in detail here. In brief, SfM processing involves a set of complex algorithms that evaluate image sets to estimate the 3D structure of a scene from multiple overlapping images (see Figure 2. at right).

SfMis a relatively new technique for processing large numbers of individual aerial images (Big data) and generating seamless image/map mosaics of very high resolution. This is a key innovation in mapping and monitoring landscapes from parcel-scale (hectares or smaller) up to km2.In cases where hundreds or thousands of individual images are processed together, this represents a true “Big Data” phenomenon. It is important to understand that SfM image mosaics and 3D models are not the result of blending and stitching the original images. SfM processing identifies and evaluates key elements (individual pixels) among and across adjacent images to generate a “point cloud” of many millions or billions of pixels. From this point cloud, a new image mosaic map and digital surface model are generated.

SfM processes

How SfM Works

SfM uses pairwise point matches to estimate the camera position of the current view relative to previous or succeeding views. It then links the pairwise matches into longer point tracks spanning multiple views. These tracks then serve as inputs to multi-view triangulation, refinement of camera positions, and the 3-D scene points using a bundle Adjustment function. The process consists of two major sequences: camera position/motion estimation and dense point cloud-based scene reconstruction. In the first step, the camera position for each view is estimated using a sparse set of points matched across the views. In the second step, the process iterates over the sequence of views again, using a “bundle block adjustment” to create/adjust a dense set of points across all views, and computes a dense 3D reconstruction of the scene. Software processing of scenes constructed from hundreds of images is a truly “big data” challenge as the point clouds can hold millions or even billions of key feature pixels. With such large data sets, processing times can take from a few hours up to 24, 48, 72 hrs, even many days, depending on the computing power. For smaller image data sets, including the small objects, microscope views and scanning electron microscopy scenes described here, the processing may take only a few minutes. The type of camera used for image acquisition is rather incidental in the novel cases of small objects and microscopic and smaller imaging. Where rendering landscapes-scale features and importing the results into a Geographic Information System are part of a further research process, the specific camera type, model and lens parameters, as well as GPS location information for each image is important.

It is both interesting and important to point out that the final image/mosaic produced is not a simple stitching and blending of the original images, but a pixel by pixel recreation of the composite scene using many millions of individual commonly recognized pixels in adjacent key image frames. The computed and rendered Orthomosaic and Digital Surface Model (DSM) are entirely new products that are based on an intensive pixel-by-pixel comparison among dozens or hundreds of individual high-resolution images collected. Until recently, computing and processing landscape-scale SfM scenes has been restricted to users with significant computational systems (workstations and mini-computers). With the ever-increasing power of micro-computers (laptops even) SfM processing can now be accomplished on a relatively small budget. SfM processing of large aerial scenes comprising hundreds of individual camera images, however, can still take many hours, even days, on a modestly configured laptop.

There are a variety of SfM Software solutions, but my favorites are Pix4D (which I use for Educational/non-profit work) and AgiSoft's MetaShape which I use for commercial purposes.

In my SfM Photo Gallery (below) I include samples from my Educational non-Profit Work. To-date, I have not used, and will not use Pix4D for commercial work until I have a "Commercial License" for that software