Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sentence-transformers.R #21

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tomazweiss
Copy link

Previous example didn't work.

Previous example didn't work.
@samterfa
Copy link
Collaborator

samterfa commented Jul 8, 2022

We can definitely change course, but the original example worked and seemed clear to me.
`
library(tidyverse)

Compute sentence embeddings

sentences <- c("Baby turtles are so cute!", "He walks as slowly as a turtle.","The lake is cold today.", "I enjoy swimming in the lake.")
model <- hf_load_sentence_model('paraphrase-MiniLM-L6-v2')
embeddings <- model$encode(sentences)
embeddings

Get distances between sentences

embeddings %>% dist() %>% as.matrix() %>% as.data.frame() %>% setNames(sentences) %>% mutate(sentence 1 = sentences) %>%
pivot_longer(cols = -sentence 1, names_to = 'sentence 2', values_to = 'distance') %>% filter(distance > 0)

Cluster sentences

embeddings %>% t() %>% prcomp() %>% pluck('rotation') %>% as.data.frame() %>% mutate(sentence = sentences) %>%
ggplot(aes(PC1, PC2)) + geom_label(aes(PC1, PC2, label = sentence, vjust="inward", hjust="inward")) + theme_minimal()
`

@farach
Copy link
Owner

farach commented Jul 8, 2022

I ran @samterfa code above and was successful after adding back ticks to "sentence 1". This matches what we have in the example so it should be good to go.

`library(tidyverse)

sentences <- c(
"Baby turtles are so cute!",
"He walks as slowly as a turtle.",
"The lake is cold today.",
"I enjoy swimming in the lake."
)

model <- hf_load_sentence_model('paraphrase-MiniLM-L6-v2')

embeddings <- model$encode(sentences)
embeddings

embeddings %>%
dist() %>%
as.matrix() %>%
as.data.frame() %>%
setNames(sentences) %>%
mutate(sentence 1 = sentences) %>%
pivot_longer(
cols = -sentence 1,
names_to = 'sentence 2',
values_to = 'distance'
) %>%
filter(distance > 0)

embeddings %>%
t() %>%
prcomp() %>%
pluck('rotation') %>%
as.data.frame() %>%
mutate(sentence = sentences) %>%
ggplot(aes(PC1, PC2)) +
geom_label(aes(PC1, PC2, label = sentence, vjust="inward", hjust="inward")) +
theme_minimal()`

@tomazweiss example is really close to this to. @tomazweiss could you point us to the error you are getting?

@samterfa
Copy link
Collaborator

samterfa commented Jul 8, 2022

It looks like @jpcompartir changed the example here. Maybe he was seeing an error?

@tomazweiss
Copy link
Author

@farach, I was correcting example in this file:
https://github.com/farach/huggingfaceR/blob/main/R/sentence-transformers.R
, which is different from what @samterfa is pasting above.

There is a typo (embddings) and you are updating the embeddings object and then using the previous version in plot.

@jpcompartir
Copy link
Collaborator

jpcompartir commented Jul 8, 2022

It looks like @jpcompartir changed the example here. Maybe he was seeing an error?

This just looks like me being careless - by the by - the examples (across the package) were temporarily removed to speed up running check() (also didn't feel like adding to buildignore) - they're likely to be put back in as usage, or to figure in vignettes. So I may do the same here, until we're ready to go with release and tests etc. have been added appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants