Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatnav python bindings #14

Merged
merged 9 commits into from
Nov 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .github/workflows/wheels.yaml
Empty file.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@ build
.env

# Python wheel related folders/files
flatnav_pyton/flatnav.egg-info
flatnav_python/flatnav.egg-info/
flatnav_python/poetry.lock
flatnav_python/dist
flatnav_python/__pycache__


# other files
data/
Expand Down
33 changes: 15 additions & 18 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,34 +21,32 @@ set(CMAKE_CXX_STANDARD 17)
# I added compiler flags for ASan (address sanitizer). It is supposed to be very
# fast, but if we find it slow, we can remove it for good or use compiler
# directives to skip analyzing functions __attribute__((no_sanitize_address))

# https://clang.llvm.org/docs/AddressSanitizer.html Compiler flags
set(CMAKE_CXX_FLAGS
"${CMAKE_CXX_FLAGS} \
-Xclang -std=c++17 \
-Wall -Ofast \
-std=c++17 \
-Ofast \
-DHAVE_CXX0X \
-DNDEBUG \
-openmp \
-L/opt/homebrew/opt/libomp/lib \
-I/opt/homebrew/opt/libomp/include \
-lomp \
-fopenmp \
-fpic \
-w \
-ffast-math \
-funroll-loops \
-ftree-vectorize \
-g \
-fsanitize=address")
-ftree-vectorize")

# set(OpenMP_CXX_FLAGS "-fopenmp") set(OpenMP_CXX_LIB_NAMES "omp")
# link_libraries(omp)
option(CMAKE_BUILD_TYPE "Build type" Release)
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
# Add debug compile flags
message(STATUS "Building in Debug mode")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -Wall -fsanitize=address")
endif()

include(ExternalProject)
include(FeatureSummary)
include(FetchContent)

find_package(Git REQUIRED)
# find_package(OpenMP REQUIRED)

option(USE_GIT_PROTOCOL
"If behind a firewall turn this off to use HTTPS instead." OFF)
Expand Down Expand Up @@ -127,12 +125,13 @@ include_directories(${PROJECT_BINARY_DIR}/ep/include)

set(CNPY_LIB ${PROJECT_BINARY_DIR}/ep/lib/libcnpy.a)

find_package(OpenMP)
find_package(OpenMP REQUIRED)
if(OpenMP_FOUND)
message(STATUS "OpenMP Found. Building the Package using the system OpenMP.")
else()
message(
"OpenMP Not Found. Building the Package using LLVM's OpenMP. This is slower than the system OpenMP."
FATAL_ERROR
"OpenMP Not Found. Building the Package using LLVM's OpenMP. This is slower than the system OpenMP."
)
endif(OpenMP_FOUND)

Expand Down Expand Up @@ -217,7 +216,6 @@ set(HEADERS
${PROJECT_SOURCE_DIR}/flatnav/distances/inner_products_from_hnswlib.h
${PROJECT_SOURCE_DIR}/flatnav/distances/SquaredL2Distance.h
${PROJECT_SOURCE_DIR}/flatnav/distances/SquaredL2DistanceSpecializations.h
${PROJECT_SOURCE_DIR}/flatnav/distances/SQDistance.h
${PROJECT_SOURCE_DIR}/flatnav/util/ExplicitSet.h
${PROJECT_SOURCE_DIR}/flatnav/util/GorderPriorityQueue.h
${PROJECT_SOURCE_DIR}/flatnav/util/reordering.h
Expand All @@ -237,8 +235,7 @@ set_target_properties(FLAT_NAV_LIB PROPERTIES LINKER_LANGUAGE CXX)

if(BUILD_EXAMPLES)
message(STATUS "Building examples for Flatnav")
foreach(CONSTRUCT_EXEC construct_npy query query_npy
cereal_tests)
foreach(CONSTRUCT_EXEC construct_npy query_npy cereal_tests)
add_executable(${CONSTRUCT_EXEC}
${PROJECT_SOURCE_DIR}/tools/${CONSTRUCT_EXEC}.cpp ${HEADERS})
add_dependencies(${CONSTRUCT_EXEC} FLAT_NAV_LIB)
Expand Down
48 changes: 5 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,51 +70,13 @@ correct distance is computed.
The most straightforward way to include a new dataset for this evaluation is to put it into either the ANN-Benchmarks (NPY) format or to put it into the Big ANN-Benchmarks format. The NPY format requires a float32 2-D Numpy array for the train and test sets and an integer array for the ground truth. The Big ANN-Benchmarks format uses the following binary representation. For the train and test data, there is a 4-byte little-endian unsigned integer number of points followed by a 4-byte little-endian unsigned integer number of dimensions. This is followed by a flat list of `num_points * num_dimensions` values, where each value is a 32-bit float or an 8-bit integer (depending on the dataset type). The ground truth files consist of a 32-bit integer number of queries, followed by a 32-bit integer number of ground truth results for each query. This is followed by a flat list of ground truth results.


## Python Binding Instructions
We also provide python bindings for a subset of index types. This is very much a work in progress - the default build may or may not work with a given Pyton configuration. While we've successfully built the bindings on Windows, Linux and MacOS, this will still probably require some customization of the build system. To begin with, follow these instructions:
## Python Binding Instructions
We also provide python bindings for a subset of index types. We've successfully built the bindings on Linux and MacOS, and if there is interest,
we can also support Windows. To generate the python bindings you will need a stable installation of [poetry](https://python-poetry.org/).

1. `$ cd python_bindings`
2. `$ make python-bindings`
3. `$ export PYTHONPATH=$(pwd)/build:$PYTHONPATH`
4. `$ python3 python_bindings/test.py`
Then, follow instructions [here](/flatnav_python/README.md) on how to build the library. There are also examples for how to use the library
to build an index and run queries on top of it [here](/flatnav_python/test_index.py).

You are likely to encounter compilation issues depending on your Python configuration. See below for notes and instructions on how to get this working.

### Note on python bindings:
The python bindings require pybind11 to compile. This can be installed with `pip3 install pybind11`. The command `python3 -m pybind11 --includes` which is included in the Makefile gets the correct include flags for the `pybind11/pybind11.h` header file, as well as the include flags for the `Python.h` header file. On most Linux platforms, the paths in the Makefile should point to the correct include directories for this to work (for the system Python). If the `Python.h` file is not located at the specified include paths (e.g. for a non-system Python installation), then another include path may need to be added (specified by the PYTHON_INC_FLAGS variable in the Makefile). The headers may also need to be installed with `$ sudo apt-get install python3-dev`.

If you encounter the following error:

`ld: can't open output file for writing: ../build/flatnav.so, errno=2 for architecture x86_64`

The reason is likely that you forgot to make the build directory. Run `mkdir build` in the top-level flatnav directory and re-build the Python bindings.

### Special Instructions for MacOS

On MacOS, the default installation directory (`/usr/lib`) is where the global, system Python libraries are located, but this is often not where we want to perform the installation. If the user has installed their own (non-system) version of Python via Homebrew or a similar tool, the actual Python libraries will be located somewhere else. This will result in many errors similar to the following:

```
Undefined symbols for architecture x86_64:
"_PyBaseObject_Type...
```

This happens because homebrew does not install into the global installation directory, and we need to explicitly link the libpython object files on MacOS. To fix it, you will need the location of `libpython*.dylib` (where `*` stands in for the Python version). To find them, run

`sudo find / -iname "libpython*"`

And pick the one corresponding to the version of Python you use. Once you've located the library, add the following to the Makefile:

`PYTHON_LINK_FLAGS := -L /path/to/directory/containing/dylib/ -lpythonX.Y`

For example, on an Intel MacBook, I installed Python 3.9 using Homebrew and found:

`/usr/local/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/libpython3.9.dylib`

This means that my link flags are:

`PYTHON_LINK_FLAGS := -L /usr/local/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/config-3.9-darwin/ -lpython3.9`

If you installed Python in some other place (or if you use the system Python on MacOS), you will probably have a different, non-standard location for `libpython.dylib`. Note that building python bindings on M1 Macs is a work-in-progress, given the switch from x86 to arm64.



19 changes: 15 additions & 4 deletions bin/build.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
#!/bin/bash

# Make sure we are at the root directory
cd "$(dirname "$0")/.."

BUILD_TESTS=OFF
BUILD_EXAMPLES=OFF
BUILD_BENCHMARKS=OFF
MAKE_VERBOSE=0
CMAKE_BUILD_TYPE=Release

function print_usage() {
echo "Usage ./build.sh [OPTIONS]"
Expand All @@ -13,6 +17,7 @@ function print_usage() {
echo " -e, --examples: Build examples"
echo " -v, --verbose: Make verbose"
echo " -b, --benchmark: Build benchmarks"
echo " -bt, --build_type: Build type (Debug, Release, RelWithDebInfo, MinSizeRel)"
echo " -h, --help: Print this help message"
echo ""
echo "Example Usage:"
Expand All @@ -22,18 +27,19 @@ function print_usage() {

function check_clang_installed() {
if [[ ! -x "$(command -v clang)" ]]; then
echo "clang is not installed. You should have clang installed first.Exiting..."
exit 1
echo "clang is not installed. Installing it..."
./bin/install_clang.sh
fi
}

# Process the options and arguments
# Process the options and arguments
while [[ "$#" -gt 0 ]]; do
case $1 in
-t|--tests) BUILD_TESTS=ON; shift ;;
-e|--examples) BUILD_EXAMPLES=ON; shift ;;
-v|--verbose) MAKE_VERBOSE=1; shift ;;
-b|--benchmark) BUILD_BENCHMARKS=ON; shift ;;
-bt|--build_type) CMAKE_BUILD_TYPE=$2; shift; shift ;;
*) print_usage ;;
esac
done
Expand All @@ -49,6 +55,8 @@ if [[ "$(uname)" == "Darwin" ]]; then
echo "Using LLVM clang"
export CC=/opt/homebrew/opt/llvm/bin/clang
export CXX=/opt/homebrew/opt/llvm/bin/clang++
export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"
elif [[ "$(uname)" == "Linux" ]]; then
echo "Using system clang"
else
Expand All @@ -60,7 +68,10 @@ echo "Using CC=${CC} and CXX=${CXX} compilers for building."

mkdir -p build
cd build && cmake \
-DCMAKE_C_COMPILER=${CC} \
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing these here directly so that there is no confusion when someone has both GCC and clang.

-DCMAKE_CXX_COMPILER=${CXX} \
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
-DBUILD_TESTS=${BUILD_TESTS} \
-DBUILD_EXAMPLES=${BUILD_EXAMPLES} \
-DBUILD_BENCHMARKS=${BUILD_BENCHMARKS} ..
-DBUILD_BENCHMARKS=${BUILD_BENCHMARKS} ..
make -j VERBOSE=${MAKE_VERBOSE}
44 changes: 32 additions & 12 deletions bin/install_clang.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,44 @@ command_exists() {
type "$1" &> /dev/null ;
}

function install_clang_mac() {
# Install clang and clang-format on Darwin
if ! command_exists brew; then
echo "Homebrew not found. Homebrew should be installed first."
exit 1
fi
brew install llvm
}

function install_clang_linux() {
# Install clang and clang-format on Linux
if ! command_exists apt; then
echo "apt not found. apt should be installed first."
exit 1
fi
echo "Installing clang and clang-format..."
sudo apt update
sudo apt install -y clang clang-format
}


# Check for clang
if ! command_exists clang++; then
echo "clang++ not found. Installing..."
sudo apt update
sudo apt install -y clang
else
echo "clang++ already installed."
fi

if ! command_exists clang-format; then
echo "clang-format not found. Installing..."
sudo apt update
sudo apt install -y clang-format
if [[ "$(uname)" == "Darwin" ]]; then
install_clang_mac
elif [[ "$(uname)" == "Linux" ]]; then
install_clang_linux
else
echo "Unsupported OS."
exit 1
fi
else
echo "clang-format already installed."
fi
echo "clang/clang++ already installed."
fi

# Check for libomp-dev
# Check for libomp-dev. This is required for OpenMP support.
PKG_STATUS=$(dpkg-query -W --showformat='${Status}\n' libomp-dev | grep "install ok installed")
if [ "" == "$PKG_STATUS" ]; then
echo "libomp-dev not found. Installing..."
Expand Down
66 changes: 0 additions & 66 deletions bin/run_anns.sh

This file was deleted.

Loading