Skip to content

Conversation

angt
Copy link
Collaborator

@angt angt commented Sep 22, 2025

This is a draft that uses httplib to download, mostly copied from the existing cURL implementation.
To test, build with -DLLAMA_CURL=OFF.
Some features might be missing for now, but it's a starting point.

Signed-off-by: Adrien Gallouët <[email protected]>
@angt angt requested a review from ggerganov as a code owner September 22, 2025 23:17
@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch 2 times, most recently from ecd182e to 0201e99 Compare September 23, 2025 09:00
The existing cURL implementation is intentionally left untouched to
prevent any regressions and to allow for safe, side-by-side testing by
toggling the `LLAMA_CURL` CMake option.

Signed-off-by: Adrien Gallouët <[email protected]>
@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch from 0201e99 to e1f545f Compare September 23, 2025 10:49
@angt
Copy link
Collaborator Author

angt commented Sep 24, 2025

This is the one that concerns me, since cpp-httplib is currently a required dependency of llama.cpp:

error: "cpp-httplib doesn't support Windows 8 or lower. Please use Windows 10 or later."

@ggerganov
Copy link
Member

This is the one that concerns me, since cpp-httplib is currently a required dependency of llama.cpp:

error: "cpp-httplib doesn't support Windows 8 or lower. Please use Windows 10 or later."

It shouldn't be too difficult to add LLAMA_HTTPLIB option and do the same thing we do currently on master when LLAMA_CURL=OFF?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The biggest unknown for me is the Windows workflow - building and releases. I suppose whatever we currently do to provide libcurl we have to do for libssl.

If you plan to bring this to completion, feel free to add yourself to the CODEOWNERS. I see the following TODOs:

  • Extract file downloading implementation from common/arg.cpp to common/download.cpp
  • Remove CURL dependency (+ figure out how to build on Windows)
  • Remove json dependency from common/download.cpp
  • Add CMake option to build without httplib for old Windows support

Comment on lines +699 to +710
static void write_metadata(const std::string & path,
const std::string & url,
const common_file_metadata & metadata) {
nlohmann::json metadata_json = {
{ "url", url },
{ "etag", metadata.etag },
{ "lastModified", metadata.last_modified }
};

write_file(path, metadata_json.dump(4));
LOG_DBG("%s: file metadata saved: %s\n", __func__, path.c_str());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the previous PR about the json stuff: I hope eventually we will avoid using json for this component - it's a pity we started doing it in the first place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely can remove json here, in fact we can just read/write the etag it's enough.

@angt
Copy link
Collaborator Author

angt commented Sep 24, 2025

This is the one that concerns me, since cpp-httplib is currently a required dependency of llama.cpp:

error: "cpp-httplib doesn't support Windows 8 or lower. Please use Windows 10 or later."

It shouldn't be too difficult to add LLAMA_HTTPLIB option and do the same thing we do currently on master when LLAMA_CURL=OFF?

The windows issue comes from updating httplib (this PR yhirose/cpp-httplib#2177).
I don’t think keeping the old version would be a good idea, and I don’t believe it’s reasonable to support Windows 8 without llama-server ?

Possible solutions could be either patching httplib to restore Windows 8 compatibility, or switching to another HTTP library.

@ggerganov
Copy link
Member

ggerganov commented Sep 24, 2025

I don’t think keeping the old version would be a good idea

Yes, we should stick with the latest version of httplib.

I don’t believe it’s reasonable to support Windows 8 without llama-server ?

The idea is when LLAMA_HTTPLIB=OFF to build empty download functions that will simply print an error that downloading is not supported. Windows 8 can still run llama-server - it just won't be able to download models.

I suspect that these failing CI workflows are currently happening only for the msys/mingw toolchain. Likely there is a simple fix by tuning the WIN32 preprocessor macros to make httplib happy. Note that the runners are not actually using Windows 8, so it's some sort of mis-detection. Worst case, I think we can safely disable downloading capabilities for these specific builds.

@angt angt requested a review from CISC as a code owner September 25, 2025 07:15
@github-actions github-actions bot added the devops improvements to build systems and github actions label Sep 25, 2025
@ggerganov
Copy link
Member

Regarding 547fa26 - I suppose this is temporary? We want to keep the upstream version unchanged, so any modifications should be first upstreamed to the original repo.

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Regarding 547fa26 - I suppose this is temporary? We want to keep the upstream version unchanged, so any modifications should be first upstreamed to the original repo.

Yes, this was only to confirm that everything builds correctly with it.

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Since cpp-httplib is mandatory for llama-server (with or without the model downloader), we can bump _WIN32_WINNT to 0x0A00 to align with the current restriction.

@ggerganov
Copy link
Member

Since cpp-httplib is mandatory for llama-server

Oh right, I missed that when I wrote the comment earlier.

we can bump _WIN32_WINNT to 0x0A00 to align with the current restriction.

Yes, let's give this a try.

@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch from 1f97bec to aad19ef Compare September 25, 2025 09:42
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 25, 2025
@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Note @ggerganov : I've tested the version including commit 547fa26 (cpp-httplib: allow _WIN32_WINNT >= 0x0602) and it works fine under Wine. There should be no issue retargeting Windows 8 if needed.

$ wine build/bin/llama-server.exe -hf unsloth/Qwen3-4B-Instruct-2507-GGUF:Q4_0
it looks like wine32 is missing, you should install it.
multiarch needs to be enabled first.  as root, please
execute "dpkg --add-architecture i386 && apt-get update &&
apt-get install wine32:i386"
0048:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0048:err:winediag:nodrv_CreateWindow L"The explorer process failed to start."
0048:err:systray:initialize_systray Could not create tray window
common_download_file_single_online: no previous model file found C:\users\angt\AppData\Local\llama.cpp\unsloth_Qwen3-4B
-Instruct-2507-GGUF_Qwen3-4B-Instruct-2507-Q4_0.gguf
common_download_file_single_online: trying to download model from https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507
-GGUF/resolve/main/Qwen3-4B-Instruct-2507-Q4_0.gguf to C:\users\angt\AppData\Local\llama.cpp\unsloth_Qwen3-4B-Instruct-
2507-GGUF_Qwen3-4B-Instruct-2507-Q4_0.gguf.downloadInProgress (server_etag:"aaf2d7f5827dd5d918cc73fefac1d96c704f6b6cd3d
2c36d1e9f5c3ac675d94f", server_last_modified:)...
[>                              ^C                  ]   0%  (15 MB / 2265 MB)

And no issues when linking libssl statically on Windows :)

$ peldd build/bin/llama-server.exe
Dependencies
    ADVAPI32.dll
    bcrypt.dll
    CRYPT32.dll
    KERNEL32.dll
    msvcrt.dll
    WS2_32.dll

Signed-off-by: Adrien Gallouët <[email protected]>
Signed-off-by: Adrien Gallouët <[email protected]>
@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch from aad19ef to e7b5f55 Compare September 25, 2025 13:51
@ggerganov
Copy link
Member

Hm, the address sanitizer is acting up. Not sure if related though.

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Hm, the address sanitizer is acting up. Not sure if related though.

I think the binaries were compiled with flags that are not supported by the emulator. All the errors are ILLEGAL:

The following tests FAILED:
	  1 - test-tokenizer-0-bert-bge (ILLEGAL)               main
Errors while running CTest
Output from these tests are in: /home/runner/work/llama.cpp/llama.cpp/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
	  2 - test-tokenizer-0-command-r (ILLEGAL)              main
	  3 - test-tokenizer-0-deepseek-coder (ILLEGAL)         main
	  4 - test-tokenizer-0-deepseek-llm (ILLEGAL)           main
	  5 - test-tokenizer-0-falcon (ILLEGAL)                 main
	  6 - test-tokenizer-0-gpt-2 (ILLEGAL)                  main
	  7 - test-tokenizer-0-llama-bpe (ILLEGAL)              main
	  8 - test-tokenizer-0-llama-spm (ILLEGAL)              main
	  9 - test-tokenizer-0-mpt (ILLEGAL)                    main
	 10 - test-tokenizer-0-phi-3 (ILLEGAL)                  main
	 11 - test-tokenizer-0-qwen2 (ILLEGAL)                  main
	 12 - test-tokenizer-0-refact (ILLEGAL)                 main
	 13 - test-tokenizer-0-starcoder (ILLEGAL)              main
	 21 - test-tokenizer-1-llama-spm (ILLEGAL)              main
	 27 - test-thread-safety (ILLEGAL)                      main
	 28 - test-arg-parser (ILLEGAL)                         main
	 29 - test-gguf (ILLEGAL)                               main
	 30 - test-backend-ops (ILLEGAL)                        main
	 33 - test-barrier (ILLEGAL)                            main
	 34 - test-quantize-fns (ILLEGAL)                       main
	 35 - test-quantize-perf (ILLEGAL)                      main
	 36 - test-rope (ILLEGAL)                               main

I guess it’s just random hardware selection. I can dig into that later.

@ggerganov
Copy link
Member

I restarted the workflows. If CI is green, I think we are good to merge, correct?

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

I restarted the workflows. If CI is green, I think we are good to merge, correct?

Yes! I can refactor (downloader.cpp) and remove json just after

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

@ggerganov
Copy link
Member

I think i can fix https://github.com/ggml-org/llama.cpp/actions/runs/18009748371/job/51252952728?pr=16185

Yup, this should be fixed before merging.

@slaren
Copy link
Member

slaren commented Sep 25, 2025

I cleared the ccache cache of the sanitizer test before you re-ran the CI, I suspect that was the cause.

Signed-off-by: Adrien Gallouët <[email protected]>
Comment on lines 219 to 221
-DCMAKE_SYSTEM_NAME=Linux \
-DGGML_CCACHE=OFF \
-DGGML_NATIVE=OFF \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't think I understand how this change fixed the CI for ubuntu-cpu-make?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit extreme on this one and tried everything I could think of to make it work.

The tricky part is CMAKE_SYSTEM_NAME=Linux, which makes CMake believe we are cross-compiling (setting CMAKE_CROSSCOMPILING) but without breaking everything else. With that flag, i can disable GGML_NATIVE_DEFAULT:

if (CMAKE_CROSSCOMPILING OR DEFINED ENV{SOURCE_DATE_EPOCH})
message(STATUS "Setting GGML_NATIVE_DEFAULT to OFF")
set(GGML_NATIVE_DEFAULT OFF)
else()
set(GGML_NATIVE_DEFAULT ON)
endif()

then disabling GGML_NATIVE allows us not to set INS_ENB:

if (GGML_NATIVE OR NOT GGML_NATIVE_DEFAULT)
set(INS_ENB OFF)
else()
set(INS_ENB ON)
endif()

This way we get the lowest CPU mode. But honestly, -DGGML_NATIVE=OFF should be enough, and I also disabled ccache to increase my chances :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants