Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make unidecode optional in sanitize methods #67

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

VMRuiz
Copy link

@VMRuiz VMRuiz commented Feb 10, 2021

Allow to sanitize text from non English websites without losing data.

This is not backward compatibility as I believe this should be the default behavior in most cases.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unidecode could be a pipe of its own?

I think sanitize | unidecode (or sanitize | ascii_safe) is more readable than sanitize(ascii_safe=True)

@VMRuiz
Copy link
Author

VMRuiz commented Feb 10, 2021

Maybe unidecode could be a pipe of its own?

I think sanitize | unidecode (or sanitize | ascii_safe) is more readable than sanitize(ascii_safe=False)

Yes, I think your approach is actually better.

@VMRuiz
Copy link
Author

VMRuiz commented Feb 10, 2021

Maybe unidecode could be a pipe of its own?

I think sanitize | unidecode (or sanitize | ascii_safe) is more readable than sanitize(ascii_safe=True)

I have implemented the method ascii_safe. I tried implementing it with unidecode but it looks like there was some name collision issue between the shublang name method and the unidecode method itself.

If you are able to fix it we could use unidecode instead. I don't really have a strong opinion on which one is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants