-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added notes for Machine Learning for 3D Geometry.
- Loading branch information
Showing
83 changed files
with
1,426 additions
and
0 deletions.
There are no files selected for viewing
169 changes: 169 additions & 0 deletions
169
...earning for 3D Geometry/01 - Geometric Foundations - Surface Representations.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
# About these notes | ||
These notes are my lecture notes on the Machine Learning for 3D Geometry course held in the summer term 2021 by Prof. Angela Dai. The notes are based on the course slides [^1]. Images are taken from the course slides and related research papers which are referenced in the notes. | ||
|
||
The notes are written in [Obsidian markdown](https://obsidian.md/) and are best viewed in Obsidian. | ||
|
||
[^1]: Angela Dai, Christian Diller, Yawar Siddiqui -- Machine Learning for 3D Geometry, lecture slides | ||
# 01 - Geometric Foundations: Surface Representations | ||
There are several ways to represent 3D shapes, each with their own advantages and disadvantages. Aspects to consider are the memory usage, how efficient operations are, and the constraints of the source data and target application. | ||
|
||
### Voxel grids | ||
Voxels = Pixels in 3D. Each Voxel stores an attribute like occupancy (boolean), distance from the object (SDF), colors, etc. | ||
|
||
Advantages: arbitrary topologies, easy to query, easy to operate on neighbors. | ||
|
||
Huge disadvantage: space requirement grows cubically. Sparse surfaces in occupancy grids lead to a lot of empty space (as the resolution grows, the ratio of occupied voxels goes to zero). | ||
|
||
### Point clouds | ||
Set of $(x, y, z)$ locations of points (optionally, additional attributes). Unordered by definition. Can be the result of raw scanning data capture. | ||
|
||
Advantage: More efficiently represents sparse surfaces. | ||
|
||
Disadvantage: No spatial structure; less efficient neighbor queries. | ||
|
||
### Polygon/Triangle meshes | ||
Collection of *vertices*, (*edges*), and *faces*. Represent a *piecewise linear* approximation of a surface. Can approximate very closely if fine enough, though. | ||
|
||
Advantages: arbitrary topologies, easy editing/manipulation, and easy rendering. | ||
|
||
![[circle-approximations.png]] | ||
|
||
Definition: a polygon mesh is a finite set of closed (i.e. end = start point) and simple (not self-intersecting) polygons. Each polygon defines a face of the polygonal mesh. | ||
|
||
*Boundary*: set of edges that belong to only one polygon. The boundary is either empty or consists of closed loops; if it is empty, the polygon mesh is closed. | ||
|
||
Triangular meshes: polygons are triangulated. This simplies data structures/algorithms/rendering, since only triangles need to be considered. | ||
|
||
Meshes can have additional attributes - for example, textured meshes. | ||
|
||
##### Mesh data structures | ||
###### STL format (*Triangle List* format) | ||
Binary format, simply a list of triangles described by their corner coordinates: three bytes per coordinate, 9 coordinates per triangle => 36 bytes per triangle. | ||
|
||
No connectivity information! | ||
|
||
###### OBJ format (*Indexed Face Set* format) | ||
First, list of point coordinates (three numbers per line preceded by a `v`); then, list of faces (three or more indices per line, which refer to the points specified above, preceded by a `f`). More primitives, e.g. lines, also possible. | ||
|
||
Other indexed face set formats: OFF, WRL | ||
|
||
### Parametric surfaces and curves | ||
Functions $p: \mathbb{R} \to \mathbb{R}^3$ (curves) or $p: \mathbb{R}^2 \to \mathbb{R}^3$ (surfaces). More advanced: | ||
- Bezier curves | ||
- Splines | ||
- Bezier surfaces | ||
- Bicubic patches | ||
|
||
![[bezier-surface.png]] | ||
|
||
Advantages of Bezier patch meshes: | ||
- requires fewer points than triangle mesh | ||
- easy to manipulate: just transform the control points under a linear transformation to transform the whole mesh | ||
- easy to sample points | ||
- easy to ensure continuity | ||
|
||
Disadvantages: | ||
- hard to determine if point is inside/outside/on surface | ||
- more complex rendering | ||
|
||
### Implicit surfaces | ||
Implicit surfaces are represented by functions that assign values to points which relate to the surface. | ||
|
||
##### Signed distance function | ||
*Signed distance function*: Function $f: \mathbb{R}^m \to \mathbb{R}$ s.t. $f(x)< 0$ on the inside, $f(x) > 0$ on the outside, and $f(x) = 0$ on the surface. Instead of using a function, one can also use a voxel grid with the SDF values filled in. | ||
|
||
If we mostly care about values close to the surface: use a *truncated signed distance field* with N/A values far from the surface. | ||
|
||
##### Operations on SDFs | ||
Very efficient: | ||
- union = $\min$ operation | ||
- intersection = $\max$ operation | ||
- subtraction = $\max(f, -g)$ operation | ||
|
||
![[sdf-operations.png]] | ||
|
||
Advantages: | ||
- easy operations | ||
- easy to determine if a point is inside/outside/on the surface | ||
Disadvantages: | ||
- hard to sample points on the surface! | ||
|
||
|
||
### Conversion between representations | ||
|
||
##### Point cloud -> Implicit: Poisson Surface Reconstruction | ||
Fit a function $f$ s.t. $f < 0$ on the inside, $f > 0$ on the outside. | ||
|
||
![[points-to-implicit.png|300]] | ||
|
||
Poisson surface reconstruction [Kazhdan et al. ’06]: uses *oriented points* (i.e. points + surface normals) as inputs, computes an indicator function $\chi_M$ (0 or 1). | ||
From the point normals, the *gradient* of $\chi_M$, a vector field $V$, is computed. Then find a function $f$ whose gradient approximates $V$: Solve | ||
|
||
$$\min_f ||\nabla f - V||$$ | ||
|
||
This can be transformed to a *Poisson problem* with the solutoin $\Delta f = \nabla * V$, then solved as least-squares fitting problem. (not more details given) | ||
|
||
##### Implicit -> Mesh: Marching Cubes | ||
Extract the surface belonging to $f = 0$ in form of a mesh. | ||
|
||
Marching cubes algorithm [Lorensen and Cline 1987]: Discretize space into voxel grid. For each cube, compute the implicit function at the 8 corners. This allows to approximate the *zero crossings* (i.e. where the surface crosses the cube surface). | ||
|
||
Lookup configuration in lookup table ($2^8$ possibilities depending on the 8 values at the edges). | ||
|
||
![[marching-cubes-lookup.png]] | ||
|
||
Improve by linearly interpolating the exact position on the cube edges (i.e. if the sdf is -1 on one corner and +10 on the other, the zero crossing should be closer to the first corner). | ||
|
||
Advantages: | ||
- widely applicable | ||
- easy to implement, trivial to parallelize | ||
|
||
Disadvantages: | ||
- can create skinny triangles | ||
- lookup table with many special cases; some ambiguities are resolved arbitrarily | ||
- No sharp features | ||
|
||
##### Mesh -> Point cloud | ||
...to solve problems with bad triangles in meshes, etc. | ||
|
||
Generate point cloud by sampling the mesh: Sample each triangle uniformly with barycentric coordinates, sample triangles with probability proportional to their area. | ||
|
||
If $r_1, r_2 \sim U([0, 1])$ are uniformly sampled, a random piont on the triangle is given by | ||
|
||
$$p = (1 - \sqrt(r_1)) A + \sqrt(r_1)(1-r_2) B + \sqrt(r_1) r_2 C$$ | ||
|
||
|
||
Alternatively: farthest point sampling (sample next point to be farthest from all previously sampled points). However, this depends on the notion of distance (on mesh: discrete geodesic distance = path along edges). | ||
|
||
### Geometric operators | ||
how to describe geometry of local observation (i.e. point + its neighborhood)? | ||
|
||
- tangent along a surface | ||
- curvature (limiting circle as three points come together) | ||
|
||
##### On discrete curves | ||
On discrete curves: problem that points have no well-defined tangent/normal. One reasonable definition: weighted average of incident edges' normals, weighted by the edge lengths. | ||
|
||
$$n_v = \frac{|e_1| n_{e_1} + |e_2| n_{e_2}}{|||e_1|n_{e_1} + |e_2| n_{e_2}||}$$ | ||
|
||
![[discrete-curves-normals.png]] | ||
|
||
|
||
##### On point clouds | ||
Estimate normal by approximating the plane tangent to the surface, as least-squares fitting problem. | ||
|
||
Find neighborhood around point, then estimate a plane by PCA of this neighborhood. (Watch out: Orientation of normal is ambiguous.) | ||
|
||
##### Mesh Laplacian | ||
Local descriptor that describes connectivity of nodes/edges and surface geometry. | ||
|
||
![[mesh-laplacian.png]] | ||
![[mesh-laplacian-2.png]] | ||
|
||
### Useful Software | ||
- Meshlab (for viewing/processing meshes) | ||
- OpenMesh (for processing meshes) | ||
- CGAL (for computational geometry) | ||
|
||
|
||
|
119 changes: 119 additions & 0 deletions
119
...ne Learning for 3D Geometry/02 - Shapes - Alignment, Descriptors, Similarity.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# 02 - Shapes: Alignment, Descriptors, Similarity | ||
|
||
### 3D Shape Alignment (*Registration*) | ||
Often needed: e.g. in | ||
- 3D scanning (combine different scanned parts) | ||
- SLAM = Simultaneous Localization + Mapping (essentially, robot navigation) | ||
- protein structure alignment | ||
|
||
Given: two shapes $A, B$ with overlap; register together by rigid transform s.t. distance is minimized: $\min_T \delta(A, T(B))$ for some distance measure $\delta$. | ||
|
||
Challenges: find both *point correspondences* and *transformation*. | ||
|
||
#### Rigid 3D Alignment: Procrustes | ||
|
||
Goal: find best alignment, given correspondences $(x_i)_i, (y_i)_i$ (even for different shapes). | ||
|
||
![[chair-alignment.png]] | ||
|
||
Namely, find $(R, t)$ that minimizes $\sum_i ||R x_i + t - y_i||_2^2$. Solved as *orthogonal Procrustes problem* (1966). | ||
|
||
Assume a coordinate system centered at the mean of the $(x_i)_i$: then the minimization term becomes | ||
|
||
$$\min_{R, t} \sum_i ||t-y_i||_2^2 - 2\sum_i \langle R x_i, y_i \rangle$$ | ||
|
||
The first sum is minimized by $t = (\sum y_i)/N =: \bar{y}$. To minimize the second sum, define the mean-centered $X = (x_0 - \bar{x}, \dots, x_n - \bar{x})^\top$, $Y = (y_0 - \bar{y}, \dots, y_n - \bar{y})^\top$ and compute the SVD of $XY^\top$: $XY^\top = UDV^\top$. | ||
|
||
Now replace the diagonal matrix $D$ by $S$: either by $S = I$ if $\det(U)\det(V)=1$, or $S = \text{diag}(1, \dots, 1, -1)$ otherwise. Then the minimizer $R$ is given by | ||
|
||
$$R = USV^\top.$$ | ||
|
||
Question: We had problems with this formula in Exercise 1, and had to slightly change it based on the original paper. Is it valid as written here or not? | ||
|
||
#### Obtaining point correspondences | ||
##### Iterative Closest Points [Besl and McKay ’92] | ||
Iterative algorithm: | ||
- assume that the currenty closest points correspond | ||
- align these correspondences (Procrustes) | ||
- recompute closest points, repeat | ||
|
||
Converges if the initialization is good enough. Optional steps: weight correspondences (by quality?), reject outlier correspondences before aligning. | ||
|
||
![[iterative-closest-points.png]] | ||
|
||
Runtime: $O(N_A \cdot N_B)$ to find closest points naively, $O(N_A)$ to compute optimal alignment and update. So $O(K N_A N_B)$ overall runtime (where $K$ is the number of iterations, $N_A$ and $N_B$ the number of points in shape $A$ and shape $B$). | ||
|
||
Better runtime with data structures like kd-trees. | ||
|
||
Improved correspondence selection: minimize not pairwise distance, but distance to tangent plane of the surface. This can make the point correspondences more evenly distributed. No closed-form solution anymore, but faster in practice. (No details on how to compute this). | ||
|
||
![[icp-tangent-plane.png]] | ||
|
||
##### Global Registration: Finding ICP Initialization | ||
General strategy: | ||
1. find a good initialization | ||
2. refine with ICP | ||
|
||
Approaches to find an initialization: | ||
|
||
###### Exhaustive search | ||
Just try out "all possible transforms" (or probably, a sufficiently dense subset of all transforms). Of course, this is extremely slow. | ||
|
||
###### Normalization with PCA | ||
Center shapes, use PCA and align such that the principal directions match up. Works well in some cases, but can also go wrong - problems are: | ||
- inconsistent orientation of principal directions (e.g. two cars, for one the principal direction points towards the front, for the other towards the back) | ||
- unstable axes (e.g. cup with handle => principal direction is not the expected top-bottom axis through the cup) | ||
- partial similarity (chair with back vs. barstool without) | ||
|
||
###### Random sampling (RANSAC) | ||
RANSAC: pick random pairs of points, estimate alignment (details a bit unclear). | ||
|
||
###### Matching by invariant features | ||
Identify feature points like corners that describe local geometry ("invariant" because they should be invariant under the transformation). Align these feature points. | ||
|
||
![[feature-points.png]] | ||
|
||
|
||
### Shape Descriptors | ||
Needed for the feature point matching approach: feature descriptors that can capture the information to answer "are these points similar?". | ||
|
||
##### Spin Images [Johnson and Hebert ’99] | ||
To describe a point, create a *spin image* associated with its neighborhood. Neighborhood point contributions are parametrized by a) their distance to the tangent and b) their distance to the normal. | ||
|
||
![[spin-image.png]] | ||
|
||
##### Point Feature Histograms [Rusu et al. ’09] | ||
Find neighbors $(q_i)_i$ of point $p$, compute histograms based on distances, normal, curvature etc. | ||
|
||
|
||
##### Global Shape Similarity and Global Shape Descriptors | ||
Capture models by high-dimensional shape descriptors, compare these descriptors with some similarity measure. | ||
|
||
###### Shape Histograms | ||
Histograms that capture how much surface area resides withing concentric shells of different radii. | ||
|
||
Can be made a local shape descriptor restricting the shell radii. | ||
|
||
### Non-Rigid shape matching | ||
Goal: find correspondences that preserve the *geodesic distance* on the shapes. In other words: even if the actual shape changes, pathes along the surface of corresponding points should stay the same. | ||
|
||
![[nonrigid-elephant.png]] | ||
|
||
One way to compute something like this: *near isometries preserve local structure*, so use descriptors of local regions and establish mappings between these. A problem: how to choose the scale of a local region? | ||
|
||
#### Intrinsic similarity measures | ||
##### Gromov-Hausdorff distance | ||
The *Hausdorff distance* between two point sets is the *maximum of all minimum distances* $\max_p \min_{q} d(p, q)$. | ||
|
||
The *Gromov-Hausdorff distance* is the infimum of Hausdorff distances over all mappings/correspondences. | ||
|
||
|
||
##### Heat kernel signature [Sun et al. ’09] | ||
Heat kernel $k_t(x, y)$: amount of heat transfered from $x$ to $y$ in time $t$. Advantage: invariant under isometric deformations, works at multiple scales. Difficult to use in real-world scenarios with partial/noisy data, though. | ||
|
||
![[heat-kernel.png]] | ||
|
||
### Shape Search | ||
Find shapes similar to given shape in shape search engine. Approaches: bag of geometric words, i.e. decompose shape into some parts. | ||
|
||
Retrieve similar shapes through embedding in descriptor space (also see [[05 - Shape Generation#Joint Embedding for Retrieval|Joint Embedding for Retrieval]] and [[05 - Shape Generation#Joint embedding of 3D scans and CAD objects Dahnert et al ‘19|Joint embedding of 3D scans and CAD objects]] in Lecture 5). |
2 changes: 2 additions & 0 deletions
2
Machine Learning for 3D Geometry/03 - Machine Learning Foundations.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# 03 - Machine Learning Foundations | ||
(I skipped this one because it's only a repetition of I2DL/ML topics) |
Oops, something went wrong.