Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for non-English characters #2

Open
kemege opened this issue Jul 11, 2016 · 2 comments
Open

Support for non-English characters #2

kemege opened this issue Jul 11, 2016 · 2 comments

Comments

@kemege
Copy link

kemege commented Jul 11, 2016

In JavaScript regular expressions, \w only matches [A-Za-z0-9_].
So it doesn't work well if we put any non-English characters in the tag, like #测试 or #テスト.

Perhaps \w+ should be replaced by something like (?:\w|[^\u0000-\u007F])+ or [^\u0000-\u0029\u0040\u005b-\u0060\u007b-\u007f], as suggested in a StackOverflow Answer?

@svbergerem
Copy link
Owner

You are already able to set the accepted characters yourself. See https://github.com/svbergerem/markdown-it-hashtag#advanced and https://github.com/svbergerem/markdown-it-hashtag/blob/master/test/hashtag.js#L23-L27 for some examples. I'll think about changing the default and keep this issue open until I made my decision.

@Powersource
Copy link

For reference, unicode has a definition for hashtags here https://unicode.org/reports/tr31/#hashtag_identifiers
It's not easy to read but I think it includes most unicode characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants