project 4

[Auto]stitching Photo Mosaics


Part 1: Shoot the Pictures

I went to a store and took pictures of mannequins wearing outfits I liked.

Part 2: Recover Homographies

Applying a homography reduces to a matrix multiplication \[ w \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \] which can be rewritten as \[ \begin{aligned} wx' &= ax + by + c \\ wy' &= dx + ey + f \\ w &= gx + hy + 1 \end{aligned} \] \[ \implies \begin{aligned} (gx + hy + 1)x' &= ax + by + c \\ (gx + hy + 1)y' &= dx + ey + f \\ \end{aligned} \] \[ \implies \begin{aligned} x' &= ax + by + c - gxx'- hx'y\\ y' &= dx + ey + f - gxy'- hyy'\\ \end{aligned} \] But this is also just \[ \begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -xy' \\ 0 & 0 & 0 & x & y & 1 & -yx' & -yy' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\g \\ h \end{bmatrix} \] so, given n correspondences, we can write this as a linear system \[ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -x_1y_1' \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1x_1' & -y_1y_1' \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ x_n & y_n & 1 & 0 & 0 & 0 & -x_nx_n' & -x_ny_n' \\ 0 & 0 & 0 & x_n & y_n & 1 & -y_nx_n' & -y_ny_n' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\g \\ h \end{bmatrix} = \begin{bmatrix} x_1' \\ y_1' \\ \vdots \\ x_n' \\ y_n' \end{bmatrix} \] and we can easily solve for the homography parameters using least squares.

Part 3: Warp the Images

To get the dimensions of the warped image, we can apply the homography matrix to the corners of the image and compute the bounding box of the warped image with these points (or we can skip this step altogether and just use the shape of thesource image for the warped image).

However, border points of the warped image can be negative, in which case we must calculate the offset to ensure that all indices in the warped image are non-negative.

Next, we calculate the height and width of the destination image, and use these dimensions to create indices for the warped image. We then apply the inverse offset to these indices (to align with the bounds of the warped image), and apply the inverse homography matrix to these translated indices. This will map the non-negative indices of the destination image to the source image.

Finally, we interpolate the pixels of the source image to the destination image (I used scipy.interpolate.RectBivariateSpline), and apply a mask to the destination image to remove the parts that are not in the source image.

Part 4: Image Rectification

Here are some results I got from rectifying the images. [ note that I was able to avoid some inconveniences by using the shape of the source image for the warped image :D ].

Part 5: Blend The Images into a Mosaic

I was lazy ish and did not want to manually label correspondences between images. This led me to implement automatic non-maximal suppression, feature description extraction, and feature matching.

Here is an example of feature matching between two images.

Using the matched features, I computed the homography between the base image (chosen to be the top), and the query image (the bottom).

I warped the corners of the query image to the base image, then found the bounding box of the entire image after stitching. I computed the translation to apply to the final image to avoid negative indices.

This all boiled down to a mess of calling np.min and np.max on all sorts of corner points and I really hope I never feel so confused with something so simple ever again.

ok. cool.

I implemented blending with a two-level laplacian pyramid. The mask was generated by taking the distance transform of the polygon made by the four corners of the images. I used cv2.fillPoly to fill the polygon formed by vertices of img1 with 1, then used cv2.distanceTransform to get the distance transform of the polygon. I repeated this for img2, and took the ratio of the distance transforms to get the blending mask.

putting it all together

Part 5.X: Manual Correspondences

I thought I could get away with not manually labeling correspondences between images. I was wrong.

Here are some manually labeled correspondences and the mosaics they yielded.

(after I mentioned that I implemented automatic correspondences, Prof. Efros told me that I should still do this step manually so I can understand the tediousness that motivates automating this process. It seems this fate was unavoidable)

Part 6: Detecting Corner Features in An Images

We use the Harris corner detector that was provided in the starter code (thank you!).

However, this is too many points! So we implement automatic non-maximal suppression to get the top corners. K-D Trees were particularly helpful. Here are the results of taking the top 500 corners.

Part 7: Extracting Feature Descriptors

We take (8s)px x (8s)px patches centered around each corner returned by ANMS.

We then downsample each patch by taking every (s)th pixel and normalized each patch to have zero mean and unit variance.

Part 8: Matching Features Between Images

For every feature descriptor in image \(x\), we find the two most similar feature descriptors in image \(y\). In accordance to Lowe's test, we threshold the ratio \(r = \frac{err_{\text{1st match}}}{err_{\text{2nd match}}}\) as a test to remove bad matchings.

We cross-validate this by checking how many of the best matches from \(y\) are also the best matches for each feature descriptor in \(x\).

Part 9: RANSAC

Some bad correspondences are unavoidable, so we use RANSAC to remove outliers.

The intuition behind RANSAC is to randomly sample a small number (4) of correspondences, compute the homography with this sample of source and destination points, apply the homography to all source points, and count how many warped source points are within some error threshold of their corresponding destination points. The correspondences within this threshold are inliers.

We repeat this process many times and keep the largest set of inliers as our correspondences.

Here are some results of RANSAC.

Part 10: More Mosaics

Here are all of the outfits I took pictures of, stitched together.

Here is a comparison of the mosaics from manual and automatic correspondences:

What have I learned?

I though it was interesting how the math underlying perspective warps is so simple. Considering how drastically the image changes after warping, I thought the whole procedure would be much more complicated.

Ironically, the hardest part of this project was the blending step, which I had expected to be easy since I had done it for project 2. I did not expect to be so challenged by calculating the bounding box of the stitched image, the offset to avoid negative indices, and creating the blending mask.

I think the coolest and most important things I learned are:

  1. to not underestimate the simplicity of things that seem complex
  2. to not underestimate the complexity of things that seem simple

Also, now that I look at the submission requirements, I realize that I should have manually labeled correspondences between images. Another key takeaway is that I should always fully understand the task I have been given before I begin.

Extra: Ozan does not reinvent the wheel

The cv2.stitcher class is a wrapper for OpenCV's stitching functions. I thought it would be fun to use this existing code, along with an HTML scipt I found online, to create a 360° panorama viewer.