CS180/280A · Project 3A/3B

Overview & Pipeline

Part A starts with manually clicked correspondences to recover homographies with least squares, resamples the second image onto the reference canvas with both nearest-neighbor and bilinear interpolation, and composites them with feathered alpha masks. Part B swaps the manual correspondences for an automated pipeline: Harris responses are thinned with adaptive non-maximal suppression, each surviving corner gets a blurred 40×40 window that is subsampled to an 8×8 descriptor, Lowe-ratio matching proposes pairs, and four-point RANSAC estimates a homography whose inliers feed the same warping and blending code.

A.1: Shoot and Digitize Pictures

Each dataset was captured handheld with the camera pivoting about a fixed center of projection. I kept successive views closely spaced to preserve strong feature overlap.

A.2: Recover Homographies

I collect correspondences with a browser tool and have them in (row,column) coordinates. Image 1 is always the reference frame. For each correspondence I contribute two equations to the linear system: one enforcing the warped column and one enforcing the warped row. The eight unknowns live in the first two rows and last column of the homography, so the coefficients use the image 2 coordinates while the right-hand side stores the image 1 target. Rejecting the trivial zero solution means moving the target pixels to the right-hand side and solving an overdetermined Ah = b. I solve it with np.linalg.lstsq(A, b)[0], append a trailing 1 to fix the scale, and reshape the result into a 3×3 matrix. At least four correspondences are required, but I typically click eight or more to stabilize the fit.

\[ \underbrace{ \begin{bmatrix} \mathrm{im2x}_1 & \mathrm{im2y}_1 & 1 & 0 & 0 & 0 & -\mathrm{im2x}_1 \,\mathrm{im1x}_1 & -\mathrm{im2y}_1 \,\mathrm{im1x}_1 \\ 0 & 0 & 0 & \mathrm{im2x}_1 & \mathrm{im2y}_1 & 1 & -\mathrm{im2x}_1 \,\mathrm{im1y}_1 & -\mathrm{im2y}_1 \,\mathrm{im1y}_1 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \mathrm{im2x}_n & \mathrm{im2y}_n & 1 & 0 & 0 & 0 & -\mathrm{im2x}_n \,\mathrm{im1x}_n & -\mathrm{im2y}_n \,\mathrm{im1x}_n \\ 0 & 0 & 0 & \mathrm{im2x}_n & \mathrm{im2y}_n & 1 & -\mathrm{im2x}_n \,\mathrm{im1y}_n & -\mathrm{im2y}_n \,\mathrm{im1y}_n \end{bmatrix} }_{A} \underbrace{ \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} }_{\mathbf{h}} = \underbrace{ \begin{bmatrix} \mathrm{im1x}_1 \\ \mathrm{im1y}_1 \\ \vdots \\ \mathrm{im1x}_n \\ \mathrm{im1y}_n \end{bmatrix} }_{\mathbf{b}} \]

\(\mathrm{im2x}_i\) and \(\mathrm{im2y}_i\) are the column/row of correspondence \(i\) in the second image, while \(\mathrm{im1x}_i\) and \(\mathrm{im1y}_i\) are the matching coordinates in the reference image.

\[ H = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \]

I set the bottom-right entry to \(1\) after solving so the recovered vector reshapes cleanly into a homography matrix.

A.3: Warp the Images

When rectifying a single object I bound the warp to the reference quadrilateral supplied with the correspondences; otherwise I run all four original image corners through H to predict the warped bounding box. Each output pixel walks backwards through the transform by multiplying with H^{-1}, dividing by the homogeneous scale, and sanity checking the source coordinates. The nearest-neighbor branch rounds those coordinates, while the bilinear branch retrieves the surrounding four samples and blends them with the rectangle-area weights computed in bilinear_interp_helper. The output array is clipped to [0, 255] and converted to uint8 before saving. On the game box example the timings are roughly 0.32 s with nearest neighbor and 0.74 s with bilinear, illustrating the quality/speed trade-off. On the couch rectification the end-to-end warp spends about 0.31 s with nearest neighbor versus 0.74 s with bilinear. The bilinear version has much better visual quality (ex: smoother edges), so I utilized it for part B of this project.

A.4: Blend Images into a Mosaic

I read both images with their alpha channels intact, recover H, and warp the partner frame with bilinear interpolation. To align the two layers I push each image 2 correspondence through H, compare it with the reference correspondence, and take the mean row/column offsets as a global shift. That shift determines how both images are stamped onto a shared canvas. For feathering I feed each alpha mask to distance_transform_edt, normalize the two distance maps so their per-pixel sum is one, broadcast the weights across the color channels, and blend. Every recovered homography is also written to disk for inspection.

B.1: Harris Corner Detection

I reuse the provided Harris detector, then feed its coordinate list to my adaptive non-maximal suppression routine (ANMS). Corners are sorted by response strength, and each point records the distance to the nearest stronger corner whose response exceeds c × r_i (here c = 0.9). The 300 corners with the largest suppression radii survive, spreading the interest points across the frame. The heat map shows the raw Harris response; the overlays compare the dense detections with the ANMS selection. The parameter c controls how similar in strength a nearby corner can be before it eliminates the current one, so pushing c toward 1.0 keeps only contenders that are noticeably stronger. We can see that after ANMS, we have a much smaller number of interest points rather than the barrage we initially had.

B.2: Feature Descriptor Extraction

Around every ANMS corner I crop a 40×40 RGB window, apply a Gaussian blur with σ = 1, and subsample the result onto an 8×8 grid by stepping every five pixels. Each descriptor is individually normalized to zero mean and unit standard deviation so Euclidean distance compares structure rather than raw brightness. Below are sample window/descriptor pairs from the High Floor View of Berkeley sequence. The window and descriptor sizes trade off context versus descriptor length—40×40 retains surrounding texture, while 8×8 keeps comparisons lightweight.

B.3: Feature Matching

Each descriptor from image 1 is compared against every descriptor from image 2 with Euclidean distance. I retain both the closest and second-closest matches; candidates pass the Lowe ratio test when d\_1 / d\_2 < 0.8, meaning the best match is significantly better than the runner-up. The visualizations show both the surviving correspondences and a few of the paired windows that led to them. The Lowe ratio threshold itself measures how dominant the best neighbor must be—lower ratios demand greater separation and reject ambiguous matches.

B.4: RANSAC and Automatic Mosaics

Four-point RANSAC repeatedly samples match quadruples (6k iterations for the Berkeley datasets, 7k for the house), builds a candidate homography, and reprojects every correspondence from image 2 into the reference view to measure the forward reprojection error. Points whose error falls below the dataset-specific threshold (listed with each result) are marked as inliers. The densest inlier set triggers a final least-squares fit using all of its correspondences, and the resulting homography flows through the same bilinear warp and feathered blend from Part A. Each mosaic compares the manual baseline with the fully automatic output. The RANSAC iteration count dictates how long the algorithm hunts for a good consensus, while the epsilon threshold sets the maximum reprojection error—measured in pixels—that an inlier is allowed to have.

Reflection

Implementing part A highlighted to me the importance of having high quality correspondences over merely a high quantity of correspondences. The creation of an output grid where the reference image was placed in such a way that the warped image would overlap was also another point of challenge that I worked through. For part B, I had an issue with setting an error threshold that was too high, which caused the RANSAC algorithm to not find a good consensus. I learned the importance of tuning these thresholds carefully (erring on the side of prioritizing accuracy).

CS 180 Programming Project #3A/3B

Image Warping, Feature Matching, and Autostitching

Anirudh Kotamraju