Node.JS package for computing the k nearest neighbors to an input vector using distance calculations.
Computations are implemented in Rust for high performance and parallelism.
- Parallelized distance computations
- Fast native system processing
- 14 popular distance functions
- Out of the box support on Linux, OSX, and Windows
- Support for Node 8, 10, 12, and 13
$ npm i unsupervised-knn-js
const { knn } = require('unsupervised-knn-js')
> const { knn } = require('unsupervised-knn-js')
> const neighbors = [
{ label: 'some name', vector: [1, 2, 4, 5] },
{ label: 'name 2', vector: [14, 4, 13, 2] },
{ label: 'another name', vector: [4, 4, 4, 5] },
]
> const target = [1, 2, 3, 4]
> const algo = 'euclidean'
> const k = 2
> knn(algo, k, neighbors, target)
[
{ label: 'some name', distance: 1.4142135623730951 },
{ label: 'another name', distance: 3.872983346207417 }
]
>
The knn function takes 4 parameters:
- Algorithm String
- This is the algorithm which computes distances between the target and all neighbors
- The current algorithms natively supported are:
'euclidean' // L2 Norm Difference 'cosine' // Cosine Distance 'mae' // Mean-Absolute-Error 'mse' // Mean-Squared-Error 'manhattan' // Sum of Absolute Differences 'ssd' // Sum of Squared Differences 'canberra' // Weighted Manhatten Distance 'hamming' // Sum of Binary Differences 'L3' // L3 Norm Difference 'L4' // L4 Norm Difference 'L5' // L5 Norm Difference 'L10' // L10 Norm Difference 'chebyshev' // L-Infinite Norm Difference 'pearson' // Pearson Correlation Distance
- K-Value
- The amount of closest neighbors to the target point to return
- So if k = 2, the 2 closests neighbors to the target vector will be returned.
- Neighbors
- This is an array of objects where each object represents a neighbor or point
- Each object should have a label and vector field as such:
{ label: 'name or id', vector: [1, 3, 4.5, -4] }
- The following is a valid array of neighbors:
const neighbors = [ { label: 'some name', vector: [1, 2, 4, 5] }, { label: 'name 2', vector: [14, 4, 13, 2] }, { label: 'another name', vector: [4, 4, 4, 5] }, ]
- Target
- This is the vector for which to find the closest or most similar points to
- This should be an array of numbers
The function returns an array of objects representing the closest points to the target.
Each object has a label field for identification and a distance field which represents it's difference from the target.
[
{ label: 'some name', distance: 1.4142135623730951 },
{ label: 'another name', distance: 3.872983346207417 }
]
This list is ordered in ascending order based on the distance field in each object.
Here is an example of the same data run against different distance functions
> const { knn } = require('unsupervised-knn-js')
> const neighbors = [
{ label: 'some name', vector: [1, 2, 4, 5] },
{ label: 'another name', vector: [4, 4, 4, 5] },
{ label: 'name 3', vector: [14, 4, 13, 2] },
]
> const target = [1, 2, 3, 4]
> // Euclidean
> knn('euclidean', 3, neighbors, target)
[
{ label: 'some name', distance: 1.4142135623730951 },
{ label: 'another name', distance: 3.872983346207417 },
{ label: 'name 3', distance: 16.64331697709324 }
]
> // Cosine
> knn('cosine', 3, neighbors, target)
[
{ label: 'some name', distance: 0.003993481192393733 },
{ label: 'another name', distance: 0.059777545024485734 },
{ label: 'name 3', distance: 0.35796589482505503 }
]
> // Mean-Absolute-Error
> knn('mae', 3, neighbors, target)
[
{ label: 'some name', distance: 0.5 },
{ label: 'another name', distance: 1.75 },
{ label: 'name 2', distance: 6.75 }
]
> // Mean-Squared-Error
> knn('mse', 3, neighbors, target)
[
{ label: 'some name', distance: 0.5 },
{ label: 'another name', distance: 3.75 },
{ label: 'name 3', distance: 69.25 }
]
> // Manhattan
> knn('manhattan', 3, neighbors, target)
[
{ label: 'some name', distance: 2 },
{ label: 'another name', distance: 7 },
{ label: 'name 3', distance: 27 }
]
> // Sum of Squared Differences
> knn('ssd', 3, neighbors, target)
[
{ label: 'some name', distance: 2 },
{ label: 'another name', distance: 15 },
{ label: 'name 2', distance: 277 }
]
> // Canberra
> knn('canberra', 3, neighbors, target)
[
{ label: 'some name', distance: 0.25396825396825395 },
{ label: 'another name', distance: 1.1873015873015873 },
{ label: 'name 3', distance: 2.158333333333333 }
]
> // Hamming
> knn('hamming', 3, neighbors, target)
[
{ label: 'some name', distance: 2 },
{ label: 'another name', distance: 4 },
{ label: 'name 3', distance: 4 }
]
> // L3 Norm Difference
> knn('L3', 3, neighbors, target)
[
{ label: 'some name', distance: 1.2599210498948732 },
{ label: 'another name', distance: 3.332221851645953 },
{ label: 'name 3', distance: 14.756054203376182 }
]
> // L4 Norm Difference
> knn('L4', 3, neighbors, target)
[
{ label: 'some name', distance: 1.189207115002721 },
{ label: 'another name', distance: 3.1543421455299043 },
{ label: 'name 3', distance: 14.016098305349052 }
]
> // L5 Norm Difference
> knn('L5', 3, neighbors, target)
[
{ label: 'some name', distance: 1.148698354997035 },
{ label: 'another name', distance: 3.0796116495812957 },
{ label: 'name 3', distance: 13.635466232760923 }
]
> // L10 Norm Difference
> knn('L10', 3, neighbors, target)
[
{ label: 'some name', distance: 1.0717734625362931 },
{ label: 'another name', distance: 3.0051723058500506 },
{ label: 'name 2', distance: 13.091355843137347 }
]
> // Chebyshev
> knn('chebyshev', 3, neighbors, target)
[
{ label: 'some name', distance: 1 },
{ label: 'another name', distance: 3 },
{ label: 'name 3', distance: 13 }
]
> // Pearson Correlation Distance
> knn('pearson', 3, neighbors, target)
[
{ label: 'some name', distance: 0.010050506338833642 },
{ label: 'another name', distance: 0.2254033307585166 },
{ label: 'name 3', distance: 1.5685785754425927 }
]
- Even more native distance functions
- Potential implementation of custom distance functions passed in by the user
Ideas and suggestions are welcome!
For changes please see the Changelog