Skip to content
This repository has been archived by the owner on Jul 7, 2024. It is now read-only.

Support Unicode? #27

Open
ghost opened this issue Aug 2, 2018 · 3 comments
Open

Support Unicode? #27

ghost opened this issue Aug 2, 2018 · 3 comments

Comments

@ghost
Copy link

ghost commented Aug 2, 2018

So this is a longshot and might not be easy due to Lua, but, I am trying to write a program that parses text files that are made up of, among other things, Greek letters. I had to write a hack to allow myself to parse λs by reading two bytes out of a string... but... I can't write hacks for the general case.

Is it possible to make it effortless to use Unicode characters in strings? Or at least, if not effortless, easy? I honestly have no knowledge of what would need to be done to support this.

For reference, this is the project I've been working on: https://git.sci4me.com/sci4me/lambda_calculus

EDIT: Really I just need to use the UTF-8 library, so, ... my concern with that is that it's not supported on LuaJIT. (And I have no idea how to do it and it seems nontrivial to change all of my code to support it but idk)

@tst2005
Copy link

tst2005 commented Aug 3, 2018

Hello,

Unicode or just UTF-8 is a little nightmare.
Before Lua 5.3 was released I wrote my own utf8 solution : lua-utf8.
It is a pure lua code.
It mainly split each character in a table (each item is a character) and make an object to use it like a lua string : see https://github.com/tst2005/lua-utf8#sample-of-use.
My solution is not the only one that exists but if your main problem is only get each utf8 character it should works!
Regards,

@ghost
Copy link
Author

ghost commented Aug 3, 2018

@tst2005 Thank you for that. I will try to use your code and see if I can solve my problem.

In the context of Urn, what I would love to see is the same thing that was done for the bit library except for the utf8 library; an Urn implementation that will be used if the utf8 library (from Lua 5.3) isn't available. Might be a ton to ask for but it would provide VM agnosticism and the potential performance benefit if Lua's utf8 library is available.

@tst2005
Copy link

tst2005 commented Aug 3, 2018

@sci4me
Just take care my utf8 module don't have the same API than the lua 5.3 utf8. Lua 5.3 utf8 is a (very) low level functions set. My module is a high level abstraction that try to follow the string module api.

My TODO for my lua-utf8 is :

  • rename the module to avoid conflict
  • internally use the native lua 5.3 utf8 functions if available

I'm also working on a solution to have an "universal bit library",
but it is really harder than utf8 because there are different implementations and different API.
I started to compare them (which function is the same name, different name, is missing : see https://github.com/tst2005/lua-mini/blob/dev/BIT.md).
I also see some difference between the existing implementations :

  • the size supported (32bits vs 64bits) (lua 5.3's bit32 is 32 bits, luajit's bitop depends of the VM architectur, it can be 64bits, lua 5.3 native op, I dunno)
  • (maybe) the behavior when the limit is reached
  • things that I don't remind

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants