Skip to content

Scikit-build-core build system rework #1988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 109 commits into from
Jul 22, 2024
Merged

Scikit-build-core build system rework #1988

merged 109 commits into from
Jul 22, 2024

Conversation

dudoslav
Copy link
Contributor

@dudoslav dudoslav commented Jun 5, 2024

This PR reworks build system for TileDB-Py to scikit-build-core + CMake. To comply with our packaging requirements I made sure that the TILEDB_PATH environment variable is properly handled. In all cases RPATH is set to the location where TileDB core library is present.

Additionally, Github workflows were added to create, test and upload PyPI wheels and sdist packages.


[sc-49832]

@dudoslav
Copy link
Contributor Author

dudoslav commented Jul 4, 2024

Thanks

* we can exclude all of `doc` and `ci`

* seems like this is a local thing that should not have been included?
image

I removed doc and ci folders and made sure that I create wheel in a clean environment. This is the result right now:
https://www.diffchecker.com/9i4Ge9l2/

Another difference that I found is that the original setup.py approach puts the libtiledb.so in a different place. I did not notice any difference when running the wheel or testing it, but I want to mention it:

1720088026_grim
1720088016_grim

@dudoslav
Copy link
Contributor Author

dudoslav commented Jul 4, 2024

I have noticed the following packaging behavior:

When running pipx run build if TileDB core library is found in the system it gets picked up and used as the core TileDB library for the wheel. This however never happens in our CI/CD when deploying wheels, so it should be fine, but it can lead to incorrect versions of TileDB included inside wheel for packages produced manually, for example in a Conda environment with TileDB installed in it.

@dudoslav
Copy link
Contributor Author

Testing if Conda feedstock will work:
conda-forge/tiledb-py-feedstock#224

@dudoslav
Copy link
Contributor Author

Seems like conda packages were able to produce:
conda-forge/tiledb-py-feedstock#224

@jdblischak
Copy link
Contributor

I tested the shared object copying behavior similar to how I did for TileDB-VCF (TileDB-Inc/TileDB-VCF#717).

I ran into 2 issues:

  • When bundling libtiledb.so into the installed Python package, I get the error AttributeError: module 'tiledb.cc' has no attribute 'Config' when running import tiledb
  • When using an external libtiledb.so, the compilation fails
docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes git python-is-python3 python3 python3-pip python3-venv unzip wget
cd
mkdir downloads install-libtiledb

# Install nightly binary
the_date=2024-04-08
wget --quiet -P downloads/ \
  https://github.com/jdblischak/centralized-tiledb-nightlies/releases/download/$the_date/libtiledb-$the_date.tar.gz
tar -C install-libtiledb -xzf downloads/libtiledb-$the_date.tar.gz
export LD_LIBRARY_PATH=$(pwd)/install-libtiledb/lib

# Clone TileDB-Py
git clone https://github.com/TileDB-Inc/TileDB-Py.git
cd TileDB-Py/
git checkout db/scikit-build-core

# Build and install TileDB-Py **with** bundled libtiledb
python -m pip install -v .
##   -- Downloading TileDB default version ...
##   -- [download 100% complete]
##   -- Adding "libtiledb" into install group
##
##   -- Installing: /tmp/tmppmr6xlbn/wheel/platlib/tiledb/libtiledb.so.2.24
##
## Successfully installed numpy-2.0.0 tiledb-0.30.3.dev109+gd18362b

ls /usr/local/lib/python3.10/dist-packages/tiledb/*.so*
## /usr/local/lib/python3.10/dist-packages/tiledb/cc.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.so.2.24
## /usr/local/lib/python3.10/dist-packages/tiledb/main.cpython-310-x86_64-linux-gnu.so
ldd /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep libtiledb
## libtiledb.so.2.24 => /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.so.2.24 (0x00007f3ac2e8a000)
readelf -d /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep RUNPATH
## 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN]
python -c "import tiledb; tiledb.libtiledb.version()"
## Traceback (most recent call last):
##   File "<string>", line 1, in <module>
##   File "/root/TileDB-Py/tiledb/__init__.py", line 22, in <module>
##     from .array_schema import ArraySchema
##   File "/root/TileDB-Py/tiledb/array_schema.py", line 10, in <module>
##     from .attribute import Attr
##   File "/root/TileDB-Py/tiledb/attribute.py", line 9, in <module>
##     from .ctx import Ctx, CtxMixin
##   File "/root/TileDB-Py/tiledb/ctx.py", line 16, in <module>
##     class Config(lt.Config):
## AttributeError: module 'tiledb.cc' has no attribute 'Config'

python -m pip uninstall --yes tiledb

# Build and install TileDB-Py **without** bundled libtiledb
export TILEDB_PATH=${HOME}/install-libtiledb/
python -m pip install -v .
## -- Found external TileDB core library
## -- Setting RPATH for targets "main" and "libtiledb" to /root/install-libtiledb/lib
## -- Setting RPATH for target "cc" to /root/install-libtiledb/lib
##
##   In file included from /tmp/tmp8dwduzb1/build/tiledb/libtiledb.cc:1248:
##   /root/install-libtiledb/include/tiledb/tiledb.h:2973:34: note: declared here
##    2973 | TILEDB_DEPRECATED_EXPORT int32_t tiledb_array_delete_fragments(
##         |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##   ninja: build stopped: subcommand failed.
##
##   *** CMake build failed
##   error: subprocess-exited-with-error
##
##   × Building wheel for tiledb (pyproject.toml) did not run successfully.
##   │ exit code: 1
##   ╰─> See above for output.
##
##   note: This error originates from a subprocess, and is likely not a problem with pip.
##   full command: /usr/bin/python /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpkbu_gaxq
##   cwd: /root/TileDB-Py
##   Building wheel for tiledb (pyproject.toml) ... error
##   ERROR: Failed building wheel for tiledb
## Failed to build tiledb
## ERROR: Could not build wheels for tiledb, which is required to install pyproject.toml-based projects

@jdblischak
Copy link
Contributor

@ihnorton explained to me the source of the import error I reported above. It's because I ran it within the Git repo, and by default Python looks for ./module.py and ./module/__init__.py. Thus it ran the local one instead of the one installed in site-packages. Thus the fix it to either run an editable install (-e) or move outside of the Git repo.

The reprex below confirms that the import works fine outside of the Git repo:

docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes git python-is-python3 python3 python3-pip python3-venv unzip wget
cd

# Clone TileDB-Py
git clone https://github.com/TileDB-Inc/TileDB-Py.git
cd TileDB-Py/
git checkout db/scikit-build-core

# Build and install TileDB-Py **with** bundled libtiledb
python -m pip install -v .

# Run import test outside of Git repo
cd
python -c "import tiledb; print(tiledb.libtiledb.version())"
## (2, 24, 0)

@jdblischak
Copy link
Contributor

Also, I switched to using the 2.24.2 release tarball, and I was able to compile TileDB-Py against the external libtiledb. The shared object isn't copied, and all the linking looks good.

docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes git python-is-python3 python3 python3-pip python3-venv unzip wget
cd
mkdir downloads install-libtiledb

# Install 2.24.2 release binary
# https://github.com/TileDB-Inc/TileDB/releases/tag/2.24.2
wget --quiet -P downloads/ \
  https://github.com/TileDB-Inc/TileDB/releases/download/2.24.2/tiledb-linux-x86_64-2.24.2-76cd03c.tar.gz
tar -C install-libtiledb -xzf downloads/tiledb-linux-x86_64-2.24.2-76cd03c.tar.gz
export LD_LIBRARY_PATH=$(pwd)/install-libtiledb/lib

# Clone TileDB-Py
git clone https://github.com/TileDB-Inc/TileDB-Py.git
cd TileDB-Py/
git checkout db/scikit-build-core

# Build and install TileDB-Py **without** bundled libtiledb
export TILEDB_PATH=${HOME}/install-libtiledb/
python -m pip install -vv .
##  -- Found external TileDB core library
##  -- Setting RPATH for targets "main" and "libtiledb" to /root/install-libtiledb/lib
##  -- Setting RPATH for target "cc" to /root/install-libtiledb/lib

ls /usr/local/lib/python3.10/dist-packages/tiledb/*.so*
## /usr/local/lib/python3.10/dist-packages/tiledb/cc.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so
## /usr/local/lib/python3.10/dist-packages/tiledb/main.cpython-310-x86_64-linux-gnu.so

ldd /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep libtiledb
## libtiledb.so.2.24 => /root/install-libtiledb/lib/libtiledb.so.2.24 (0x00007f7dda6f6000)

readelf -d /usr/local/lib/python3.10/dist-packages/tiledb/libtiledb.cpython-310-x86_64-linux-gnu.so | grep RUNPATH
##  0x000000000000001d (RUNPATH)            Library runpath: [/root/install-libtiledb/lib]

cd
python -c "import tiledb; print(tiledb.libtiledb.version())"
## (2, 24, 2)

@jdblischak
Copy link
Contributor

Lastly, I got this PR working in a branch of my centralized-tiledb-nightlies, so I'll be able to quickly migrate the nightly TileDB-Py build after this is merged.

message(STATUS "Downloading TileDB default version ...")
# Download latest release
fetch_prebuilt_tiledb(
VERSION 2.24.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: we need to move this out at least to the top of the cmakelist if not the workflow file.

Copy link
Member

@ihnorton ihnorton Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm not sure if this is being validated correctly, because this is the hash for 2.24.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it is definitely being validated 😆 (I grabbed the wrong one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a variable named TILEDB_VERSION? This variable can set on the top of the file

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to provide a way to override SETUPTOOLS_SCM_PRETEND_VERSION at the workflow level in order to avoid having to make temporary tags (to get properly named wheels from a QA branch).

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this branch here and noticed the manylinux wheels are ~20MB larger than the current ones on PyPI for the last release.

@dudoslav
Copy link
Contributor Author

The wheel size is most probably due to strip=False and build_type=RelWithDebInfo. Latest commit removed these options, lets see if the size will get smaller.

@dudoslav
Copy link
Contributor Author

1721652445_grim
Latest wheels are about 15MB in size

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants