Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] SegFault on MacOS when pytorch is installed #6595

Open
connortann opened this issue Aug 7, 2024 · 7 comments
Open

[python-package] SegFault on MacOS when pytorch is installed #6595

connortann opened this issue Aug 7, 2024 · 7 comments
Labels

Comments

@connortann
Copy link

connortann commented Aug 7, 2024

Description

A segmentation fault occurs on MacOS when lightgbm and pytorch are both installed, depending on the order of imports.

Possibly related: #4229

Reproducible example

To reproduce the issue on GH actions:

# run_tests.yml
jobs:
  run_tests:
    runs-on: macos-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: 3.11
      - run: brew install libomp
      - run: pip install pytest torch scikit-learn lightgbm
      - run: pip list
      - run: pytest --noconftest test_bug.py
# test_bug.py
import time

import lightgbm  # Issue only occurs if this import is present
import torch
from sklearn.datasets import fetch_california_housing


def test_something():
    X, y = fetch_california_housing(return_X_y=True)
    torch.tensor(X)
    time.sleep(3)

Leads to Fatal Python error: Segmentation fault. Full output:

Run pytest --noconftest tests/test_bug121101.py
============================= test session starts ==============================
platform darwin -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
rootdir: /Users/runner/work/shap/shap
configfile: pyproject.toml
collected 1 item

Fatal Python error: Segmentation fault

Thread 0x0000000204c1cc00 (most recent call first):
tests/test_bug121[10](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:11)1.py 
  File "/Users/runner/work/shap/shap/tests/test_bug121101.py", line 12 in test_something
  File "/Library/Frameworks/Python.framework/Versions/3.[11](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:12)/lib/python3.11/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line [12](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:13)0 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 5[13](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:14) in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line [16](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:17)27 in runtest
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line [17](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:18)4 in pytest_runtest_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line Fatal Python error: Segmentation fault

337 in _main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/Users/runner/hostedtoolcache/Python/3.11.9/arm64/bin/pytest", line 8 in <module>

Extension modules: numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt[19](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:20)937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator
Extension modules: , numpy._core._multiarray_umathscipy.sparse._sparsetools, numpy._core._multiarray_tests, _csparsetools, numpy.linalg._umath_linalg, scipy.sparse._csparsetools, scipy._lib._ccallback_c, scipy.linalg._fblas, numpy.random._common, scipy.linalg._flapack, numpy.random.bit_generator, , scipy.linalg.cython_lapacknumpy.random._bounded_integers, , scipy.linalg._cythonized_array_utilsnumpy.random._mt19937, , scipy.linalg._solve_toeplitznumpy.random.mtrand, , numpy.random._philoxscipy.linalg._decomp_lu_cython, numpy.random._pcg64, scipy.linalg._matfuncs_sqrtm_triu, numpy.random._sfc64, scipy.linalg.cython_blas, numpy.random._generator, scipy.linalg._matfuncs_expm, scipy.sparse._sparsetools, scipy.linalg._decomp_update, _csparsetools, , scipy.sparse._csparsetoolsscipy.sparse.linalg._dsolve._superlu, , scipy.linalg._fblasscipy.sparse.linalg._eigen.arpack._arpack, scipy.linalg._flapack, , scipy.linalg.cython_lapackscipy.sparse.linalg._propack._spropack, scipy.linalg._cythonized_array_utils, scipy.sparse.linalg._propack._dpropack, scipy.linalg._solve_toeplitz, scipy.sparse.linalg._propack._cpropack, scipy.linalg._decomp_lu_cython, scipy.sparse.linalg._propack._zpropack, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.sparse.csgraph._tools, scipy.linalg._matfuncs_expm, scipy.sparse.csgraph._shortest_path, scipy.linalg._decomp_update, scipy.sparse.csgraph._traversal, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, , scipy.sparse.csgraph._min_spanning_treescipy.sparse.linalg._propack._spropack, , scipy.sparse.csgraph._flowscipy.sparse.linalg._propack._dpropack, , scipy.sparse.csgraph._matchingscipy.sparse.linalg._propack._cpropack, , scipy.sparse.csgraph._reorderingscipy.sparse.linalg._propack._zpropack, , scipy.sparse.csgraph._toolssklearn.__check_build._check_build, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, , scipy.sparse.csgraph._reorderingscipy.special._ufuncs_cxx, , sklearn.__check_build._check_buildscipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._ufuncs, scipy.spatial._ckdtree, scipy.special._specfun, scipy._lib.messagestream, scipy.special._comb, scipy.spatial._qhull, scipy.special._ellip_harm_2, scipy.spatial._voronoi, scipy.spatial._ckdtree, , scipy.spatial._distance_wrapscipy._lib.messagestream, , scipy.spatial._hausdorffscipy.spatial._qhull, scipy.spatial._voronoi, , scipy.spatial._distance_wrapscipy.spatial.transform._rotation, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, , scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNCscipy.stats._mvn, scipy.optimize._cobyla, scipy.stats._rcont.rcont, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.stats._unuran.unuran_wrapper, scipy.optimize._lsq.givens_elimination, , scipy.optimize._zeros, scipy.ndimage._nd_imagescipy.optimize._highs.cython.src._highs_wrapper, , scipy.optimize._highs._highs_wrapper_ni_label, , scipy.optimize._highs.cython.src._highs_constantsscipy.ndimage._ni_label, scipy.optimize._highs._highs_constants, sklearn.utils._isfinite, scipy.linalg._interpolative, sklearn.utils.sparsefuncs_fast, scipy.optimize._bglu_dense, sklearn.utils.murmurhash, scipy.optimize._lsap, , sklearn.utils._openmp_helpersscipy.optimize._direct, scipy.integrate._odepack, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics.cluster._expected_mutual_info_fast, scipy.integrate._quadpack, sklearn.metrics._dist_metrics, scipy.integrate._vode, sklearn.metrics._pairwise_distances_reduction._datasets_pair, scipy.integrate._dop, scipy.integrate._lsoda, sklearn.utils._cython_blas, scipy.interpolate._fitpack, sklearn.metrics._pairwise_distances_reduction._base, scipy.interpolate._dfitpack, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, scipy.interpolate._bspl, sklearn.utils._heap, scipy.interpolate._ppoly, sklearn.utils._sorting, scipy.interpolate.interpnd, sklearn.metrics._pairwise_distances_reduction._argkmin, scipy.interpolate._rbfinterp_pythran, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, scipy.interpolate._rgi_cython, scipy.special.cython_special, sklearn.utils._vector_sentinel, scipy.stats._stats, , sklearn.metrics._pairwise_distances_reduction._radius_neighborsscipy.stats._biasedurn, , sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmodescipy.stats._levy_stable.levyst, , scipy.stats._stats_pythransklearn.metrics._pairwise_fast, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, sklearn.utils._random, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, torch._C, scipy.stats._rcont.rcont, , scipy.stats._unuran.unuran_wrappertorch._C._fft, , scipy.ndimage._nd_imagetorch._C._linalg, , _ni_labeltorch._C._nested, , scipy.ndimage._ni_labeltorch._C._nn, , sklearn.utils._isfinitetorch._C._sparse, , sklearn.utils.sparsefuncs_fasttorch._C._special, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sklearn.utils._random, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, , scipy.io.matlab._mio_utilstorch._C._nn, torch._C._sparse, scipy.io.matlab._streams, torch._C._special, scipy.io.matlab._mio5_utils, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, , sklearn.datasets._svmlight_format_fastscipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, sklearn.feature_extraction._hashing_fast (total: 130, )sklearn.feature_extraction._hashing_fast
 (total: 130)
/Users/runner/work/_temp/7013399c-b6ff-43a4-b289-cc08191dbadb.sh: line 1:  2783 Segmentation fault: 11  pytest --noconftest tests/test_bug1[21](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:22)101.py

Environment info

LightGBM version or commit hash: 4.5.0

Result of pip list:

Package           Version
----------------- --------
certifi           2024.7.4
filelock          3.15.4
fsspec            2024.6.1
iniconfig         2.0.0
Jinja2            3.1.4
joblib            1.4.2
lightgbm          4.5.0
MarkupSafe        2.1.5
mpmath            1.3.0
networkx          3.3
numpy             2.0.1
packaging         24.1
pip               24.2
pluggy            1.5.0
pytest            8.3.2
scikit-learn      1.5.1
scipy             1.14.0
setuptools        65.5.0
sympy             1.13.1
threadpoolctl     3.5.0
torch             2.4.0

Additional Comments

We came across this issue over at the shap repo, trying to run tests with the latest versions of both pytorch and lightgbm. We initially raised this issue on the pytorch issue tracker: pytorch/pytorch#121101 .

However, the underlying issue doesn't seem to be specific just to pytorch or lightgbm, but rather it relates to the mutual compatibility of pytorch and lightgbm. The issue seems to relate to multiple OpenML OpenMP runtimes being loaded.

So, I thought it would be worth raising the issue here too in the hope that it helps us collectively find a fix.

@jameslamb
Copy link
Collaborator

Thanks for the excellent report @connortann !

Since #6391, import lightgbm on macOS will try to use the already-loaded OpenMP if there is one. So it shouldn't be the case that import lightgbm can cause "multiple OpenMP runtimes being loaded".

(assuming that was typo in your original report and you really mean "OpenMP", not "OpenML")

Since you have scikit-learn in the environment, import lightgbm will import sklearn. I suspect that scikit-learn may be contributing to this problem. In the past, we've seen that library's handling of its OpenMP dependency contribute to this "multiple OpenMP runtimes being loaded" situation.

To narrow it down further, could you try 2 other tests?

  • import sklearn before / after torch (no lightgbm involved)
  • pip uninstall --yes scikit-learn and then testing import lightgbm before / after torch

I'm sorry to possibly involve yet a THIRD project in your investigation. I'm familiar with these topics and happy to help us all reach a resolution.

You may also find these relevant:

@connortann
Copy link
Author

connortann commented Aug 7, 2024

Thanks for the response! Yes I think you're right about sklearn being relevant: the bug seems not to occur if sklearn is not imported.

Here's what I tried: the tests pass in all these situations

  1. import sklearn, then torch (no lightgbm involved). Tests pass.
import time

import sklearn
import torch
from sklearn.datasets import fetch_california_housing

def test_something():
    X, y = fetch_california_housing(return_X_y=True)
    torch.tensor(X)
    time.sleep(3)
  1. import torch then sklearn (no lightgbm involved). Tests pass.
import time

import torch
import sklearn
from sklearn.datasets import fetch_california_housing

def test_something():
    X, y = fetch_california_housing(return_X_y=True)
    torch.tensor(X)
    time.sleep(3)
  1. Without sklearn installed; import lightgbm then torch. Tests pass
import time

import lightgbm
import torch
import numpy as np
# from sklearn.datasets import fetch_california_housing


def test_something():
    # X, y = fetch_california_housing(return_X_y=True)
    X = np.ones(shape=(200, 20))
    torch.tensor(X)
    time.sleep(3)
  1. Without sklearn installed; import torch then lightgbm. Tests pass
# ruff: noqa
# fmt: off
import time

import torch
import lightgbm
import numpy as np
# from sklearn.datasets import fetch_california_housing


def test_something():
    # X, y = fetch_california_housing(return_X_y=True)
    X = np.ones(shape=(200, 20))
    torch.tensor(X)
    time.sleep(3)

So, I think the example above is the minimal reproducer: lightgbm, torch and sklearn!

@vnherdeiro
Copy link
Contributor

vnherdeiro commented Aug 28, 2024

Adding my two cents to this issue. I managed to reproduce the bug following the setting given by @connortann

Running the following command raises the segfault
python -m pytest test_bug.py
with
torch==2.2.2 scikit-learn==1.5.1 numpy==1.26.4 lightgbm==4.5.0

but if prepending the command with OMP_NUM_THREADS=1 (forcing single thread operations) then it irons out the segfault.

@lorentzenchr
Copy link
Contributor

@lesteve ping as scikit-learn is involved in the minimal reproducer (openmp related).

@lesteve
Copy link

lesteve commented Sep 13, 2024

Honestly @jeremiedbb may be a better person on this on the scikit-learn side. To be honest this is quite a tricky topic at the interface of different projects which make different choices how to tackle OpenMP with wheels and OpenMP in itself is already tricky.

The root cause is generally using multiple OpenMP and using threadpoolctl can highlight this, see this doc and below.

One known work-around is to use conda-forge which will use a single OpenMP and avoid most of these issues. I wanted to mention it, even if I understand using conda rather than pip is a non-starter in some use cases.

In this particular case, I played a bit with the code and can reproduce without scikit-learn, i.e. only with LightGBM and PyTorch. To be honest, I have heard of cases that go wrong with PyTorch and scikit-learn for similar reasons, but it's generally a bit hard to get a reproducer ...

I put together a quick repo: https://github.com/lesteve/lightgbm-pytorch-macos-segfault.

In particular, see build log which shows a segfault, python file, worflow YAML file. Importing pytorch before lightgbm works fine, see build log.

Python file:

import pprint
import sys
import platform

import lightgbm
import torch
import threadpoolctl

print('version: ', sys.version, flush=True)
print('platform: ', platform.platform(), flush=True)
pprint.pprint(threadpoolctl.threadpool_info())

print('before torch tensor', flush=True)
t = torch.ones(200_000)
print('after torch tensor', flush=True) 

Output:

version:  3.12.5 (v3.12.5:ff3bc82f7c9, Aug  7 2024, 05:32:06) [Clang 13.0.0 (clang-1300.0.29.30)]
platform:  macOS-14.6.1-arm64-arm-64bit
[{'architecture': 'armv8',
  'filepath': '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
  'internal_api': 'openblas',
  'num_threads': 3,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'},
 {'filepath': '/opt/homebrew/Cellar/libomp/18.1.8/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 3,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None},
 {'filepath': '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/torch/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 3,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None}]
before torch tensor
/Users/runner/work/_temp/558d95ac-031b-4858-bfb0-b7bb4841e27b.sh: line 1:  1924 Segmentation fault: 11  python test.py

From the threadpoolctl info, you can tell that there are multiple OpenMP in use the brew one (from LightGBM) and the PyTorch one bundled in the wheel.

pip list

Package           Version
----------------- ---------
certifi           2024.8.30
filelock          3.16.0
fsspec            2024.9.0
Jinja2            3.1.4
lightgbm          4.5.0
MarkupSafe        2.1.5
mpmath            1.3.0
networkx          3.3
numpy             1.26.4
pip               24.2
scipy             1.14.1
setuptools        74.1.2
sympy             1.13.2
threadpoolctl     3.5.0
torch             2.4.1
typing_extensions 4.12.2

(Edit: sorry pinged the wrong Jérémie originally ...)

@jameslamb
Copy link
Collaborator

Thanks very much for that! Your example has helped to clarify the picture for me a lot.

Short Summary

torch vendors a libomp.dylib (without library or symbol name mangling) and always prefers that vendored copy to a system installation.

lightgbm searches for a system installation.

As a result, if you've installed both these libraries via wheels on macOS, loading both will result in 2 copies of libomp.dylib being loaded. This may or may not show up as runtime issues... unpredictable, because symbol resolution is lazy by default and therefore depends on the code paths used.

Even if all copies of libomp.dylib loaded into the process are ABI-compatible with each other, there can still be runtime segfaults as a result of mixing symbols from libraries loaded at different memory addresses, I think.

Longer Summary

more details (click me)

I investigated this by running the following on my M2 Mac, with Python 3.11. Note that the versions are identical to those from the previous comment.

mkdir ./delete-me
cd ./delete-me

pip download \
  --no-deps \
  'lightgbm==4.5.0' \
  'torch==2.4.1'

unzip ./lightgbm*.whl
unzip ./torch*.whl

otool -l ./lightgbm/lib/lib_lightgbm.dylib
otool -l ./torch/lib/libtorch_cpu.dylib

lightgbm wheels have exactly one library, lib_lightgbm.dylib, with an OpenMP dependency like this:

@rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)

And the following LC_LOAD_DYLIB / LC_RPATH entries

cmd LC_LOAD_DYLIB
name @rpath/libomp.dylib (offset 24)
current version 5.0.0
compatibility version 5.0.0
...
cmd LC_RPATH
path /opt/homebrew/opt/libomp/lib (offset 12)
...
cmd LC_RPATH
path /opt/local/lib/libomp (offset 12)

torch wheels vendor libomp.dylib but without mangling the library name or its symbols.

libc10.dylib
libomp.dylib
libshm.dylib
libtorch.dylib
libtorch_cpu.dylib
libtorch_global_deps.dylib
libtorch_python.dylib

libtorch_cpu.dylib expresses its OpenMP dependency like this:

@rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)

And has the following LC_LOAD_DYLIB / LC_RPATH entries:

cmd LC_LOAD_DYLIB
name @rpath/libomp.dylib (offset 24)
current version 5.0.0
compatibility version 5.0.0
...
cmd LC_RPATH
path @loader_path (offset 12)

So lightgbm will search for libomp.dylib in various places (including where Homebrew likes to put it) and loads the first one found.

torch will ONLY and ALWAYS load exactly the one that its wheels vendor.

💥 2 copies of OpenMP loaded at the same time, and all the issues that comes with that.

Why didn't @connortann observe this same behavior?

Not sure why @connortann was not able to reproduce this in #6595 (comment). That comment shows:

Without sklearn installed; import torch then lightgbm. Tests pass

Probably because that example uses different codepaths in torch. Many OpenMP symbols would be resolved only at the first call site (as described in this Stack Overflow answer and the macOS docs it links to), so different code paths can lead to different behavior in terms of which copies of libomp.dylib certain symbols are found in.

How do we fix this?

I think some mix of the following would make this better for users.

Option 1: torch could more aggressively isolate its OpenMP dependency

If torch wants to vendor its own OpenMP in this way, it could further isolate that dependency to only torch's own uses, by doing one of the following:

Option 2a: lightgbm could vendor OpenMP like torch is, but with that added strictness described above

I really do not want to do this, for the reasons mentioned in in #6391 and the things linked to it.

Option 2b: torch could stop vendoring OpenMP and use the same LC_RPATH search order lightgbm does

I don't know if this would be palatable for torch. It comes with its own challenges.

Option 3: lightgbm could add something like @loader_path/../../torch/lib earlier in its list of RPATHS

This only works as long as torch is vendoring a version of libomp.dylib that lightgbm is ABI-compatible with.

And it only helps for the narrow case of lightgbm and torch with no other OpenMP-using dependencies. Every other library depending on OpenMP (e.g. xgboost, scikit-learn) would need to do something similar for them to all reliably use that same copy of libomp.dylib at runtime.

Option 4: OpenMP could be packaged as a wheel that all of these projects depend on (and dynamically link to)

As described in https://pypackaging-native.github.io/key-issues/native-dependencies/blas_openmp/#potential-solutions-or-mitigations. This is the wheel-based equivalent of how conda handles this case, as @lesteve alluded to... you download a single copy of the library into the environment, and everything else dynamically links to it.

I personally would be willing to help with this community effort, though I don't feel qualified to lead it.

Some related discussions (about shared-library-only wheels, not OpenMP) that have been happening in RAPIDS libraries:

@jameslamb jameslamb changed the title SegFault on MacOS when pytorch is installed [python-package] SegFault on MacOS when pytorch is installed Sep 15, 2024
@thomasjpfan
Copy link
Contributor

thomasjpfan commented Nov 5, 2024

I see Option 4 as the "proper solution", but I see the following barriers:

  1. Community work to get projects on board.
  2. There are multiple implementations of OpenMP (depending on compiler) and currently libraries that vendor can select the one they want.
  3. Decide on how to best pinning or restrict the version of this new OpenMP PyPi library, so that libraries that depend on it do not end up with version conflicts.

On a related note, NumPy & SciPy is building a OpenBLAS library shared between them: https://pypi.org/project/scipy-openblas64/. In their case, they only need to coordinate with each other to make sure the user experience is good.

Edit: Looks like NumPy and SciPy are using https://pypi.org/project/scipy-openblas64 only during build time. They still vendor a separate openblas into their wheel. So if both library loads openblas, it is loaded twice:

❯ python -m threadpoolctl -i numpy.linalg scipy.linalg
[
  {
    "user_api": "blas",
    "internal_api": "openblas",
    "num_threads": 32,
    "prefix": "libscipy_openblas",
    "filepath": "/home/thomasfan/micromamba/envs/scipy-live/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so",
    "version": "0.3.28",
    "threading_layer": "pthreads",
    "architecture": "SkylakeX"
  },
  {
    "user_api": "blas",
    "internal_api": "openblas",
    "num_threads": 32,
    "prefix": "libscipy_openblas",
    "filepath": "/home/thomasfan/micromamba/envs/scipy-live/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-68440149.so",
    "version": "0.3.28",
    "threading_layer": "pthreads",
    "architecture": "SkylakeX"
  }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants