Add Unicode support to flexdll (was: Add wide-character version of flexdll_dlopen) #34

nojb · 2017-06-26T11:35:49Z

alainfrisch · 2017-06-26T16:25:19Z

flexdll.c

+{
+  int exec = mode & FLEXDLL_RTLD_NOEXEC ? 0 : 1;
+  if (!file) return &main_unit;
+  return flexdll_dlopen_aux(ll_dlopen(file, exec), mode);


I think this is wrong: ll_dlopen would now be called before setting FLEXDLL_RELOCATE, which can be read by the DLL's entry point.

Indeed, I had missed that!

nojb · 2017-06-27T20:01:42Z

OK, I pushed a different approach.

To avoid duplicating any code I extracted the two functions that need to be adapted (flexdll_dlopen and ll_dlopen) into the file flexdll_dlopen-templ.c with some %%VARIABLE%% placeholders. The Makefile generates flexdll_dlopen.c and flexdll_dlwopen.c by making the necessary substitutions (using sed) and those two files are #included from flexdll.c.

What do you think ?

alainfrisch · 2017-06-28T12:33:24Z

I'm not a big fan of putting such logic in the build system and introducing a custom syntax. It seems relying on the C preprocessor would work as well (simply include the same file twice, defining macros with different content each time).

Alternatively, you could use a single function taking a flag to choose between A and W variants; the string argument can be a (void*), cast to either (char*) or (WCHAR*) according to the flag.

nojb · 2017-06-29T21:32:06Z

OK, I pushed another approach. We make the wide version the "main" one and the existing version becomes a small wrapper around the wide one converting between the two encodings.

I will now be testing this to make sure it works :)

nojb · 2017-06-30T07:25:29Z

Tested by bootstrapping flexdll and ocaml from the official distribution and using the resulting native-code compiler to compile a small example, all of which seemed to work well.

nojb · 2017-06-30T12:52:28Z

I added one more commit with a primitive solution for #36, to be discussed.

alainfrisch · 2017-07-13T07:55:59Z

I think the Cygwin case is broken, as reported by the warning:

i686-pc-cygwin-gcc -c -DCYGWIN -o flexdll_cygwin.o flexdll.c
flexdll.c: In function ‘flexdll_wdlopen’:
flexdll.c:380:22: warning: passing argument 1 of ‘ll_dlopen’ from incompatible pointer type [-Wincompatible-pointer-types]
   handle = ll_dlopen(file, exec);
                      ^
flexdll.c:52:14: note: expected ‘const char *’ but argument is of type ‘const WCHAR * {aka const short unsigned int *}’
 static void *ll_dlopen(const char *libname, int for_execution) {
              ^

alainfrisch · 2017-07-13T07:58:50Z

Also the tests are failing:

MSVC:

$ make demo_msvc
(cd test && LIB="C:/cygwin/home/frisch/SHARED~2/msvc/v9/Lib;C:/cygwin/home/frisch/SHARED~2/winsdk/v7.0/Lib" INCLUDE="C:/cygwin/home/frisch/SHARED~2/msvc/v9/Include;C:/cygwin/home/frisch/SHARED~2/winsdk/v7.0/Include" make clean demo CHAIN=msvc CC="C:/cygwin/home/frisch/SHARED~2/msvc/v9/Bin/cl.exe /nologo /MD -D_CRT_SECURE_NO_DEPRECATE /GS-" O=obj)
make[1]: Entering directory '/home/frisch/flexdll/test'
rm -f *.o *.obj *.dll *.exe *~ *.manifest
C:/cygwin/home/frisch/SHARED~2/msvc/v9/Bin/cl.exe /nologo /MD -D_CRT_SECURE_NO_DEPRECATE /GS- -I.. -c dump.c
dump.c
..\flexdll.h(25) : error C2143: syntax error : missing ')' before '*'
..\flexdll.h(25) : error C2143: syntax error : missing '{' before '*'
..\flexdll.h(25) : error C2059: syntax error : ','
..\flexdll.h(25) : error C2059: syntax error : ')'

MINGW:

$ make demo_mingw
(cd test && make clean demo CHAIN=mingw CC="i686-w64-mingw32-gcc" O=o)
make[1]: Entering directory '/home/frisch/flexdll/test'
rm -f *.o *.obj *.dll *.exe *~ *.manifest
i686-w64-mingw32-gcc -I.. -c dump.c
In file included from dump.c:14:0:
../flexdll.h:25:29: error: unknown type name ‘WCHAR’
 void *flexdll_wdlopen(const WCHAR *, int);
                             ^

alainfrisch · 2017-07-13T08:00:40Z

Tests can be fixed by including windows.h in flexdll.h instead of flexdll.c.

alainfrisch · 2017-07-13T08:03:57Z

The Cygwin problem is more tricky. We want to use Cygwin's dlopen to allow using POSIX paths. But I could not find a "w" variant of it. So it is not clear what to do for flexdll_wdlopen. One could try to recode the WCHAR to the current code page, but this can obviously fail.

nojb · 2017-07-13T08:11:55Z

Ah yes, sorry about this. I will fix the tests.

Re the Cygwin issue, a simple possibility is to define a macro (say, TCHAR) which stands for char in Cygwin and WCHAR on WIN32, we use that for the signature of flexdll_wdlopen, and and we make flexdll_wdlopen equal to flexdll_dlopen on Cygwin.

alainfrisch · 2017-07-13T08:23:48Z

For Cygwin, your suggestion means that the caller needs to behave differently if compiled through cygwin or a native compiler. And this means only filenames that can be expressed in the current code page can be used. Why not, but at this point, it think it's better to tell users to use flexdll_dlopen only under Cywin (either hide flexdll_wdlopen or have it fail at runtime). Or we could try to recode to local code page in ll_dlopen for Cygwin (so that flexdll_wdlopen can be used as long as the filename can be encoded). Yet another option is to add a fallback to LoadLibraryW when the filename cannot be encoded (so the caller can use either a POSIX path that can be encoded in the local code page or an arbitrary Windows path).

nojb · 2017-07-13T08:55:22Z

Yes, you are right - my suggestion was a bit muddled; what I meant is this: in flexdll.h we write the following

void *flexdll_dlopen(const char *, int);
#ifdef _WIN32
void *flexdll_wdlopen(const WCHAR *, int);
#endif

#if defined(__CYGWIN__) || !defined(_UNICODE)
#define flexdll_tdlopen flexdll_dlopen
#else
#define flexdll_tdlopen flexdll_wdlopen
#endif

Then

One can use flexdll_dlopen both in Cygwin and native and its behaviour is the same as today.
One can use flexdll_wdlopen in native if one wants that.
One can use flexdll_tdlopen in both Cygwin and native to automatically choose the wide or narrow version (compatible with the way Windows headers work). Under Cygwin this is always flexdll_dlopen.

Personally I think this is simpler than trying to recode under Cygwin and cleaner than using a fallback.

nojb · 2017-07-13T09:11:16Z

BTW, is https://github.com/alainfrisch/flexdll/blob/master/flexdll.c#L362 correct? Shouldn't it be __CYGWIN__?

alainfrisch · 2017-07-13T09:38:15Z

We pass -DCYGWIN (resp. -DMSVC/-DMINGW) in the Makefile when we compile flexdll.c.

alainfrisch · 2017-07-13T09:40:38Z

I think it's useful to have access to flexdll_wdlopen even under Cygwin. But perhaps it's ok to declare that it doesn't support POSIX paths (so that it can always use LoadLibraryW). Only flexdll_dlopen would support POSIX path (through Cygwin's dlopen).

nojb · 2017-07-17T07:50:10Z

What would be the usecase of having flexdll_wdlopen under Cygwin ? Is it in case one wants/needs to use both Cygwin's POSIX API and the native Win API ?

alainfrisch · 2017-07-17T08:00:20Z

The case I had in mind was simply opening a DLL specified by a POSIX path with "non-local" characters (i.e. "/foo/Ђ").

nojb · 2017-07-17T10:45:41Z

I think it's useful to have access to flexdll_wdlopen even under Cygwin. But perhaps it's ok to declare that it doesn't support POSIX paths (so that it can always use LoadLibraryW). Only flexdll_dlopen would support POSIX path (through Cygwin's dlopen).

Maybe I am missing something, but what is "POSIX path" in this context ? I assumed that Cygwin's dllopen (and other filename-related functions) always assumed the argument to be encoded in the local codepage. They must make an encoding assumption somewhere along the way in order to implement those functions using native Windows APIs, right ?

alainfrisch · 2017-07-17T11:49:45Z

POSIX path: a path interpreted by Cygwin (supporting "/foo/bar", following Cygwin symlinks, etc).

I share your interpretation that Cygwin has to make an assumption about the encoding of file name when mapping to the Windows API. I guess they rely on the local codepage (probabperhaps by calling *A functions, not *W ones).

So:

It would be a regression for flexdll under Cygwin not to allow POSIX path in flexdll_dlopen => we should keep the previous behavior and call Cygwin's dlopen in that case.
It would be a limitation of the Cygwin port if flexdll_wdlopen were not available => let's expose it (it calls LoadLibraryW internally), and document that this function does not support POSIX paths. We can lift this restriction later if we find a way around it (e.g. by manualling translating the POSIX path to a Windows path, using a Cygwin-specific function).

nojb · 2017-07-19T18:22:35Z

So, it turns out that Cygwin functions assume their arguments are encoded in the "current locale" and translated internally into UTF-16, so that there is no need for flexdll_wdlopen. Here, "current locale" is the one set at process startup (changes during the lifetime of the process do not change the behaviour of the Cygwin API).

See https://cygwin.com/cygwin-ug-net/setup-locale.html for more.

nojb · 2017-07-22T19:26:03Z

I amended this PR as discussed (no wide version under Cygwin), and cleaned it up. AFAIK, the only remaining issue is that the UTF-16 response file support was not backwards compatible, as it assumes that OCaml strings are UTF-8 encoded. To handle this, I implemented the usual fallback: strings are assumed to be UTF-8 but if this is not the case then we assume they are in the local code page and fall back to the usual non UTF-16-encoded response files.

Let me know if I am forgetting about anything else, otherwise I think this should be good to go!

dra27 · 2017-10-20T22:44:45Z

reloc.ml

+  let cp n = Buffer.add_char b (Char.chr (n land 0xFF)); Buffer.add_char b (Char.chr ((n lsr 8) land 0xFF)) in
+  while !i < String.length s do
+    let n = utf8_next s i in
+    if n <= 0xFFFF then cp n else (cp (0xD7C0 + (n lsl 10)); cp (0xDC00 + (n land 0x3FF)))


Is this supposed to be UTF-16 surrogate encoding? The formula for the high surrogate is (very) wrong, I'm afraid - although sufficiently so that I'm wondering if I'm missing something else.

else let n = n - 0x10000 in (cp (0xD800 + (n lsr 10)); ...)

is allowing me to run make world from a directory called 🐫

Ooops, did I make a mistake? I will take a closer look in a bit, thanks!

Indeed, the first term seems to be really wrong - I can't understand what I must have been thinking to be honest... And it did work in the few tests I did -- pure luck I guess!

Can you please make a PR with this fix? Thanks!

Doh! Just the for proverbial record, it's described on Wikipedia.

Over my ☕, I couldn't resist the puzzle: your code works for the high surrogates for exactly 16 of the non-BMP characters (precisely ones where you get the extra 1 required to correct the base bit pattern for 0xD800)... perhaps very appropriately, lots of them are from Linear-B 😄 𐀁 𐁁 𐂁 𐃁 𐄁 𐅁 𐆁 𐊁 𐋁 𐌁 𐍁 𐎁 𐏁

alainfrisch reviewed Jun 26, 2017

View reviewed changes

nojb mentioned this pull request Jun 27, 2017

Unicode support for the Windows runtime: Let's do it! ocaml/ocaml#1200

Merged

8 tasks

nojb changed the title ~~Add wide-character version of flexdll_dlopen~~ Add Unicode support to flexdll (was: Add wide-character version of flexdll_dlopen) Jun 30, 2017

alainfrisch closed this Jul 18, 2017

alainfrisch reopened this Jul 18, 2017

nojb added 3 commits July 22, 2017 20:11

Add wide-character version of flexdll_dlopen

9939465

Generate UTF-16 linker response files (#36)

807a9ba

Backwards compatibility: UTF-8 -> CODEPAGE fallback

f3ef5f1

alainfrisch merged commit ecdc6fb into ocaml:master Jul 25, 2017

This was referenced Sep 21, 2017

msvc linker response file should be UTF-16 encoded when possible #36

Closed

flexdll_dlopen needs Unicode-aware version on Windows #33

Closed

dra27 reviewed Oct 20, 2017

View reviewed changes

Add Unicode support to flexdll (was: Add wide-character version of flexdll_dlopen) #34

Add Unicode support to flexdll (was: Add wide-character version of flexdll_dlopen) #34

Uh oh!

Conversation

nojb commented Jun 26, 2017

Uh oh!

alainfrisch Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

nojb Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

nojb commented Jun 27, 2017

Uh oh!

alainfrisch commented Jun 28, 2017

Uh oh!

nojb commented Jun 29, 2017

Uh oh!

nojb commented Jun 30, 2017

Uh oh!

nojb commented Jun 30, 2017

Uh oh!

alainfrisch commented Jul 13, 2017

Uh oh!

alainfrisch commented Jul 13, 2017

Uh oh!

alainfrisch commented Jul 13, 2017

Uh oh!

alainfrisch commented Jul 13, 2017

Uh oh!

nojb commented Jul 13, 2017

Uh oh!

alainfrisch commented Jul 13, 2017

Uh oh!

nojb commented Jul 13, 2017

Uh oh!

nojb commented Jul 13, 2017

Uh oh!

alainfrisch commented Jul 13, 2017

Uh oh!

alainfrisch commented Jul 13, 2017

Uh oh!

nojb commented Jul 17, 2017

Uh oh!

alainfrisch commented Jul 17, 2017

Uh oh!

nojb commented Jul 17, 2017

Uh oh!

alainfrisch commented Jul 17, 2017

Uh oh!

nojb commented Jul 19, 2017

Uh oh!

nojb commented Jul 22, 2017

Uh oh!

dra27 Oct 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nojb Oct 21, 2017

Choose a reason for hiding this comment

Uh oh!

nojb Oct 21, 2017

Choose a reason for hiding this comment

Uh oh!

nojb Oct 21, 2017

Choose a reason for hiding this comment

Uh oh!

dra27 Oct 21, 2017

Choose a reason for hiding this comment

Uh oh!

dra27 Oct 21, 2017

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dra27 Oct 20, 2017 •

edited

Loading