Skip to content

Localization Pattern

wzli edited this page Aug 19, 2020 · 26 revisions

Symbol Choice

Although a visual pattern may consist of any set of symbols (such micro-dots of various offsets), binary pixel blocks were the obvious choice for this particular application, primarily because they can be sampled using very low resolution images, in turn significantly reducing real-time processing requirements.

De Bruijn Sequences

For a pattern to be usable for positioning, the minimum requirement is that every local window of that pattern must be unique, otherwise the mapping from local pattern to position will not be 1-to-1 leading to ambiguities.

In the 1-D case, a valid positioning sequence can be defined as any string where every substring of length n is unique. The longest of such strings for a given code length n are called De Bruijn Sequences, which are cyclic sequences of length 2^n that contain every possible code as substring once. There exist various methods to construct De Bruijn Sequences that will not be detailed here. The significant takeaway is that 2^n is established as the theoretical maximum number of positions that can be decoded from a n-bit code.

A 2D generalization also exists called the De Bruijn Torus. They are of size 2^(n^2) and densely contain every matrix of size n x n uniquely. It would seem that this makes the ideal basis for a 2D positioning pattern, but in practice, there are no known algorithms to efficiently decode them. Another shortcoming is that because the pattern is maximally dense, it has no buffer to detect or correct decoding errors, since every single bit flip will map to another position.

1D Separable Pattern

The structure of the 2D pattern is primarily inspired from a method by Bruckstien et al, where the Outer Product of two 1D sequences are taken using XOR as the composition operator. For example:

1 0 0 1
1 0 1 1 0
0 1 0 0 1
1 0 1 1 0
0 1 0 0 1

This mapping provides a simple and efficient way to convert between 1D sequences and 2D patterns while preserving the uniqueness property necessary for localization. Aside from the native performance of XOR operations, a massive benefit is that the problem of decoding a 2D pattern is reduced to separate 1D problems for each axis, which is individually much simpler. With this scheme, 2^2n positions can be decoded from a (n x n)-bit block. This is much less than the theoretical maximum density, but allows for high levels of error correction via redundancy across all rows and columns. However since the XOR operator is only invertible up to a parity, 2D to 1D conversion leaves ambiguity of whether the resultant 1D codes are bitwise inverted. This will later be addressed by extending the uniqueness property of the source sequence to bitwise inversions as well.

XOR Binary Matrix Factorization

To obtain 1D codes from a 2D block generated as mentioned above, it is sufficient to simply take any pair of row and column to be the solution directly (up to an unknown parity). In practice however, this provides no redundancy to bit errors that may exist, so it is desirable to account for all rows and columns to produce the most likely output.

One mathematically sound method for Matrix Factorization of the 2D block is through a Rank 1 Truncated Singular Value Decomposition. This can be performed on a binary matrix by mapping {0, 1} under XOR homomorphically to {-1, 1} under multiplication. It can be thought of as taking the inverse outer product operation while optimizing for least-square error. Although the approach outlines the desired flavor of such methods, it is not the one used because it is hard to justify the inclusion of a heavy linear algebra library for an implementation intended for micro-controllers.

The method used in the current implementation is instead based on binary vector clustering. Borrowing from machine learning terminology, the dataset (ie 2D matrix/block) consists of a set of samples (ie rows) each containing a set of binary features (ie columns). The goal is to assign each sample to one of two clusters such that total assignment error is minimized. The assignment of each row to binary classes then directly map to individual bits of the output 1D column code. To obtain the row code, the process is simply repeated with the input transposed.

The clustering algorithm is based on a fast and simple method called K-Means. Instead of euclidean distance, Hamming Distance is used since it is more efficient to perform bitwise operations on binary vectors. Another optimization is that the two cluster centroids are assumed to be bitwise complements, which halves the number of distance computations and clusters to keep track of.

Directional Sequences

A 2D localization system is required to deduce not only the x and y positions, but also orientation/heading. From purely image based alignment of rows and columns, a rotation estimate of the camera frame with respect to the localization pattern can be obtained up to the nearest 90 degrees. What remains ambiguous is the four cardinal directions.

An even lower level problem is that knowing which way is up/down/left/right is required to decode the binary pattern in the first place, because a block that is rotated 90 degrees will be decoded with x and y axis swapped, and a block that is rotated 180 will be decoded with the codes reversed/flipped.

Usually other visual codes have markers external to the binary encoding that indicate the correct direction, for example the large position markers on 3 corners of a QR code. This approach of marking 3 corners is obviously unsuited for a continuous sliding-window code where there are no boundaries. Another possible solution is to scatter markers akin to small arrows across the whole pattern at a density that ensures every sliding window of a fixed size contains at least one marker. This has the downside of requiring higher image resolutions and more processing to robustly recognize the marker as well as obscuring the background binary code.

The more elegant solution that is implemented instead, is to embed directionality into the underlying binary sequence itself, allowing the direction of the bit pattern to be decoded along with the position. This is done by extending the uniqueness property of the 1D sequence to bitwise reversals as well. For example, by ensuring if substring 001 exists, then 100 does not, and later when querying for 100 fails, the bitwise reversal is attempted instead, thereby returning the position of 001 with the knowledge that the input sequence is reversed. It is then possible to infer the correct quadrant of rotation from the direction of the sequences corresponding to each axis.

Sequence Required Properties

It is established from previous sections that the underlying 1D localization sequence is required to satisfy the following properties:

  1. Every substring of n-bits is unique.
    • Such that each code corresponds to a single position.
  2. If a substring of n-bits exists, then the bitwise complement of that substring does not exist.
    • To resolve the ambiguity of parity resulting from XOR binary matrix factorization.
  3. If a substring of n-bits exists, then the bitwise reversal of that substring does not exist.
    • To determine the direction the code is read from. (ie if the camera rotated by 180 degrees)

Another potential property to consider is the uniqueness of a substring at different scales in order to determine distance between camera and floor pattern. However, this is highly complicated since substrings of all lengths must be considered. The method used for scale estimation in actual implementation is instead based on statistically comparing the amount of bit errors when decoding the input scaled to various lengths.

Sequence Generation

With the required properties defined, the next question is how to find such a sequence. There are no known algorithms in literature that precisely addresses this, particularly the directional property as defined above (please contact author if reader knows otherwise). So we are left to design one, from an engineering approach rather than a theoretical approach since the concern is for practical application.

The method implemented takes inspiration from a related class of sequences called Maximum Length Sequences (MLS/M-sequences), which have the uniqueness property akin to a Debruijn sequence, where every code of length n occurs exactly once as a sub-string of the cyclic sequence (except the zero string is excluded). They can be generated from Linear-Feedback Shift Registers (LFSR), a simple iterative mechanism where a shift register of n bits initially contains a seed state, then a new bit is computed as a linear function of the current state, and used as input to the shift register to generate the next state. The correct choice of linear function ensures a "maximum length sequence", where all representable are states traversed once before falling into a cycle. They are widely used in digital hardware as a means to efficiently generate long pseudo-random binary sequences, since the outcome of the next bit appears random given a long enough sequence although the process is deterministic.

The idea is to modify the LFSR to generate binary sequences with all the properties required, namely uniqueness of each fixed length substring along with their bit-wise complement and reversal. This is done by formulating a function that generates the next bit of the sequence by actively filtering outputs that would otherwise violate the uniqueness properties. A valid sequence is thereby extended one bit at a time from an initial base sequence. Since the bit-length of codes for this application is relatively short, the uniqueness checks can be efficiently performed using a lookup table. In order to find the longest sequences, it would be desirable to avoid binary chains that have been explored before, but in practice the search space is too large to remember all previously traversed solutions, plus it is harder to parallelize the search if there is shared data. So instead a randomized approach is used, to generate the initial seed sequence along a guiding heuristic to select each next bit with probability inversely proportional to number of times the transition state has ever been visited. A minor additional implementation detail is that the sequence can be extended in both directions yielding longer solutions but the idea remains the same.

Sequence Decoding

Ideally, the sequence comes with an efficient decoding method as well. But since finding a sequence is hard enough, and is done via search methods

Inspiration from MLS sequences.

The problem is, the uniqueness property -> Pseudo random sequence. And if something is random, there is no reversible process, if something is pseudo random, then reversing the process is considered computationally infeasible.

calculate lookup table size binary search vs hash or perfect hash cannot be compressed due to pseudo random nature.

Clone this wiki locally