Embedding Projector: fix projector knn computation (#6269)

alicialics · web-flow · commit b72d751bd70e · 2023-04-26T10:13:09.000-07:00
## Motivation for features / changes Fix a bug with knn computation in projector ## Technical description of changes If we have 1000 points, and we sample 100 points, we cannot reuse the old knn computation because it could contain points that are not part of the sample. ## Screenshots of UI changes N/A ## Detailed steps to verify changes work correctly (as executed by you) 1. Build and launch [projector](https://github.com/tensorflow/tensorboard/blob/bbc9e4f29a55d48478c3f23a7d80221b5b1b1e3c/tensorboard/plugins/projector/README.md) 2. Use default demo tensor (Word2Vec 10K) 3. Change projection type from PCA to T-SNE. This should compute 10k sample points with 90 neighbors for knn. 4. Change projection type from T-SNE to UMAP. You'll see a "Initializing UMAP..." screen loading indefinitely. UMAP uses 5k sample points and 15 neighbors for knn by default Verify the above step results in successful UMAP rendering after the changes are applied ## Alternate designs / implementations considered
diff --git a/tensorboard/plugins/projector/vz_projector/data.ts b/tensorboard/plugins/projector/vz_projector/data.ts
@@ -87,7 +87,7 @@ export interface DataPoint {
   };
 }
 const IS_FIREFOX = navigator.userAgent.toLowerCase().indexOf('firefox') >= 0;
-/** Controls whether nearest neighbors computation is done on the GPU or CPU. */
+/** Maximum sample size for each projection type. */
 export const TSNE_SAMPLE_SIZE = 10000;
 export const UMAP_SAMPLE_SIZE = 5000;
 export const PCA_SAMPLE_SIZE = 50000;
@@ -459,20 +459,14 @@ export class DataSet {
       this.nearest && this.nearest.length ? this.nearest[0].length : 0;
     if (
       this.nearest != null &&
-      this.nearest.length >= data.length &&
+      this.nearest.length === data.length &&
       previouslyComputedNNeighbors >= nNeighbors
     ) {
       return Promise.resolve(
         this.nearest
-          // `this.points` is only set and constructor and `data` is subset of
-          // it. If `nearest` is calculated with N = 1000 sampled points before
-          // and we are asked to calculate KNN ofN = 50, pretend like we
-          // recalculated the KNN for N = 50 by taking first 50 of result from
-          // N = 1000.
-          .slice(0, data.length)
           // NearestEntry has list of K-nearest vector indices at given index.
           // Hence, if we already precomputed K = 100 before and later seek
-          // K-10, we just have ot take the first ten.
+          // K = 10, we just have ot take the first ten.
           .map((neighbors) => neighbors.slice(0, nNeighbors))
       );
     } else {