-
Notifications
You must be signed in to change notification settings - Fork 0
Nca temp #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: nca
Are you sure you want to change the base?
Nca temp #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments!
sklearn/neighbors/nca.py
Outdated
Neighborhood Component Analysis (NCA) is a machine learning algorithm for | ||
metric learning. It learns a linear transformation in a supervised fashion | ||
to improve the classification accuracy of a stochastic nearest neighbors | ||
rule in the new space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformed space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
|
||
As NCA is optimizing a non-convex objective function, it will | ||
likely end up in a local optimum. Several runs with independent random | ||
init might be necessary to get a good convergence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a standard warning used in other sklearn classes? LMNN does not have it (and is also nonconvex when solving for the linear transformation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I just saw it for k-means as a note, and thought it could be useful, but warning is surely too much, and even a note may not be necessary, for consistency with lmnn.
sklearn/neighbors/nca.py
Outdated
"Neighbourhood Components Analysis". Advances in Neural Information | ||
Processing Systems. 17, 513-520, 2005. | ||
http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also add a reference to the Wikipedia page
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf | ||
""" | ||
|
||
def __init__(self, n_features_out=None, init='identity', max_iter=50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default init
for LMNN is pca
, it is probably better to have the same behavior for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
from ..externals.six import integer_types | ||
|
||
|
||
class NeighborhoodComponentAnalysis(BaseEstimator, TransformerMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original name in the paper (also used in the Wikipedia page) has an 's' at the end of component. probably to change the class name and other occurrences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/neighbors/nca.py
Outdated
soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares)) | ||
ci = masks[:, y[i]] | ||
p_i_j = soft[ci] | ||
not_ci = np.logical_not(ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just use ~ci
and avoid defining this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, I changed it
sklearn/neighbors/nca.py
Outdated
not_ci = np.logical_not(ci) | ||
diff_ci = diffs[i, ci, :] | ||
diff_not_ci = diffs[i, not_ci, :] | ||
sum_ci = diff_ci.T.dot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, sum_ci
and sum_not_ci
are distances
sklearn/neighbors/nca.py
Outdated
# for every sample, compute its contribution to loss and gradient | ||
for i in range(X.shape[0]): | ||
diff_embedded = X_embedded[i] - X_embedded | ||
sum_of_squares = np.einsum('ij,ij->i', diff_embedded, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variable contains distances (in embedded space), maybe it's good to have this in the variable name for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: I called them dist_embedded
sklearn/neighbors/nca.py
Outdated
sum_of_squares = np.einsum('ij,ij->i', diff_embedded, | ||
diff_embedded) | ||
sum_of_squares[i] = np.inf | ||
soft = np.exp(-sum_of_squares - logsumexp(-sum_of_squares)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again maybe find a name which makes it clear that these are exponentiated distances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: I called them exp_dist_embedded
sklearn/neighbors/nca.py
Outdated
X_embedded = transformation.dot(X.T).T | ||
|
||
# for every sample, compute its contribution to loss and gradient | ||
for i in range(X.shape[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a bit more comments describing the steps below would be useful, since this is the heart of the algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some, but I will continue to do so, maybe with big O notations etc ...
sklearn/neighbors/tests/test_nca.py
Outdated
y = random_state.randint(0, n_labels, (n_samples)) | ||
point = random_state.randn(num_dims, n_features) | ||
X = random_state.randn(n_samples, n_features) | ||
nca = NeighborhoodComponentsAnalysis(None, init=point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quoi sert ce premier paramètre None ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah oui effectivement il ne devrait pas être là... Je crois qu'il est apparu en refactorant
* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI
No description provided.