Skip to content

Commit 5a47de9

Browse files
ethanglasermihaic
authored andcommitted
bump version to 0.0.6 (#416)
GitOrigin-RevId: cce32b8e0efdc4a77db0b009f52b0686c0f1b240
1 parent a4fc334 commit 5a47de9

File tree

7 files changed

+254
-250
lines changed

7 files changed

+254
-250
lines changed

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ project(svs
99
# - /bindings/python/tests/test_common.py
1010
# Manually keep in-sync with:
1111
# - /bindings/python/setup.py
12-
VERSION 0.0.5
12+
VERSION 0.0.6
1313
)
1414

1515
set(SVS_LIB svs_devel)

HISTORY.md

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,249 @@
1+
# SVS 0.0.4 Release Notes
2+
3+
Note that `pysvs` was changed to `svs` since this release.
4+
5+
## Major Changes
6+
7+
### Serialization Update
8+
9+
The serialization strategy for all SVS serialized objects has been updated from `v0.0.1` to
10+
`v0.0.2`. In order to be compatible, all previously saved objects will need to be updated.
11+
12+
Updating can be done using the `pysvs` upgrade tool:
13+
```python
14+
import pysvs
15+
pysvs.upgrader.upgrade("path-to-save-directory")
16+
```
17+
Some notes about the upgrade process are given below.
18+
19+
* If the object does not need upgrading, no changes will be made.
20+
* If the object *does* need upgrading, then only the TOML file will be modified. The upgrade
21+
tool fill first create a backup (by changing the extension to `.backup.toml`) and
22+
then attempt an upgrade.
23+
24+
If the upgrade fails, please let the maintainers know so we can fix the upgrade tool.
25+
26+
Furthermore, if a backup file already exists, the upgrade process will abort.
27+
* Objects upgraded to the `v0.0.2` serialization format are *forwards* compatible.
28+
This means that they can still be loaded by *older* version of `pysvs`.
29+
30+
**Why is this change needed?**
31+
32+
This change is necessary to support efficient introspective loading, where serialized
33+
objects can be inspected for load compatibility. This, in turn, enables automatic loading
34+
of previously serialized SVS objects.
35+
36+
### Build System and Testing
37+
38+
Included reference results for the Vamana index require Intel(R) MKL 2024.1 for reproducibility in testing.
39+
Linking against Intel(R) MKL 2023.X may cause LeanVec tests to fail.
40+
41+
Reference results for your version of Intel(R) MKL can be regenerated using
42+
```sh
43+
# Build the test generators
44+
mkdir build
45+
cd build
46+
CC=gcc-11 CXX=g++-11 cmake .. -DCMAKE_BUILD_TYPE=Release -DSVS_BUILD_BENCHMARK_TEST_GENERATORS=YES -DSVS_EXPERIMENTAL_LEANVEC=YES
47+
make -j
48+
49+
# Run the test generator executable.
50+
./benchmark/svs_benchmark vamana_test_generator ../tools/benchmark_inputs/vamana/test-generator.toml vamana_reference.toml 5 ../data/test_dataset
51+
cp ./vamana_reference ../data/test_dataset/reference/vamana_reference.toml
52+
```
53+
54+
### Logging Infrastructure
55+
56+
SVS has switched to using [spdlog](https://github.com/gabime/spdlog) for its logging needs.
57+
As such, users will now have control of what messages get logged and where they are logged.
58+
The default interface for configuring logging is through the environment variables `SVS_LOG_LEVEL` and `SVS_LOG_SINK`.
59+
Valid values for `SVS_LOG_LEVEL` in order of increasing severity are shown below:
60+
61+
62+
| `SVS_LOG_LEVEL` | Descriptions |
63+
| --------------------- | ------------------------------------------------------------- |
64+
| ``TRACE`` | Tracing control flow through functions. Verbose. |
65+
| ``DEBUG`` | Verbose logging useful for debugging. Verbose. |
66+
| ``INFO`` | Informative prints for long-running processed. |
67+
| ``WARN`` (default) | Diagnostic prints that may need to be addressed by the user. |
68+
| ``ERROR`` | Program errors. |
69+
| ``CRITICAL`` | Critical information. |
70+
| ``OFF`` | Disable logging. |
71+
72+
Logging sinks control where logged message get sent and can be controlled using `SVS_LOG_SINK` with the following values.
73+
74+
| `SVS_LOG_SINK` | Description
75+
| ------------------------- | ----------------------------------------- |
76+
| ``stdout`` (default) | Send all messages to ``stdout`` |
77+
| ``stderr`` | Send all messages to ``stderr`` |
78+
| ``null`` | Suppress all logging messages. |
79+
| ``file:/path/to/file`` | Send all messages to `/path/to/file`. |
80+
81+
Additionally, both the C++ library and `pysvs` contain APIs for customizing logging that supersede the environment variables.
82+
In C++, any `std::shared_ptr<spdlog::logger>` can be used if desired.
83+
84+
Finally, if environment variable based initialization is not desired, it can be disabled by providing `-DSVS_INITIALIZE_LOGGER=NO` to CMake at configuration time.
85+
86+
### Performance Enhancements
87+
88+
* Generally improved the performance of uncompressed distance computations with run-time lengths for Intel(R) AVX-512 based systems.
89+
* Fixed a performance pathology for run-time dimensional sequential LVQ4(xN) when the number of dimensions is not a multiple of 16.
90+
91+
## `pysvs` (Python)
92+
93+
### Additions and Changes
94+
95+
* Reloading a previously saved index no longer requires exact reconstruction of the original
96+
loader.
97+
98+
Previously, if an index was constructed and saved using the following
99+
```python
100+
# Load data using online compression.
101+
loader = pysvs.VectorDataLoader(...)
102+
lvq = pysvs.LVQLoader(loader, primary = 4, residual = 8)
103+
104+
# Build the index.
105+
parameters = pysvs.VamanaBuildParameters(...)
106+
index = pysvs.Vamana.build(params, lvq, pysvs.L2, num_threads = 10)
107+
108+
# Save the result to three directories.
109+
index.save("config", "graph", "data")
110+
```
111+
Then the index must be reloaded using
112+
```python
113+
lvq = pysvs.LVQLoader("data", primary = 4, residual = 8)
114+
index = pysvs.Vamana("config", pysvs.GraphLoader("graph"), lvq, distance = pysvs.L2)
115+
```
116+
Now, the following will work
117+
```python
118+
index = pysvs.Vamana(
119+
"config",
120+
"graph", # No longer need explicit `pysvs.GraphLoader`
121+
"data", # SVS discovers this is LVQ data automatically
122+
distance = pysvs.L2
123+
)
124+
```
125+
To tailor the run-time parameters of reloaded data (for example, the strategy and padding
126+
used by LVQ), automatic inference of identifying parameters makes this easier:
127+
```python
128+
lvq = pysvs.LVQLoader(
129+
"data", # Parameters `primary`, `residual`, and `dims` are discovered automatically
130+
strategy = pysvs.LVQStrategy.Sequential,
131+
padding = 0
132+
)
133+
index = pysvs.Vamana("config", "graph", loader)
134+
```
135+
136+
* The Vamana and DynamicVamana indexes now have a reconstruction interface.
137+
This has the form
138+
```python
139+
index = pysvs.Vamana(...)
140+
vectors = index.reconstruct(I)
141+
```
142+
where `I` is a arbitrary dimenaional `numpy` array of `uint64` indices.
143+
This API returns reconstructed vectors as a `numpy` array with the shape
144+
```python
145+
vectors.shape == (*I.shape, index.dimensions())
146+
```
147+
148+
In particular, the following now works:
149+
```python
150+
I, D = index.search(...)
151+
vectors = index.reconstruct(I)
152+
```
153+
**Requirements**
154+
* For `pysvs.Vamana` the indices in `I` must all in `[0, index.size())`.
155+
* For `pysvs.DynamicVamana`, the in `I` must be in `index.all_ids()`.
156+
157+
**Reconstruction Semantics**
158+
* Uncompressed data is returned directly (potentially promoting to `float32`).
159+
* LVQ compressed data is reconstructed using this highest precision possible. For two
160+
level datasets, boths levels will be used.
161+
* LeanVec datasets will reconstruct using the full-precision secondary dataset.
162+
163+
* Added an upgrade tool `pysvs.upgrader.upgrade` to upgrade the serialization layout of SVS
164+
objects.
165+
166+
## `libsvs` (C++)
167+
168+
### Changes
169+
170+
* Overhauled object loading. Context free classes should now accept a
171+
`svs::lib::ContextFreeLoadTable` and contextual classes should take a
172+
`svs::lib::LoadTable`.
173+
174+
While most of the top level API remains unchanged, users are encouraged to look at the
175+
at the definitions of these classes in `include/svs/lib/saveload/load.h` to understand
176+
177+
their capabilities and API.
178+
* Added a new optional loading function `try_load -> svs::lib::Expected` which tries to load
179+
an object from a table and fails gracefully without an exception if it cannot.
180+
181+
This API enables discovery and matching of previously serialized object, allowing
182+
implementation of the auto-loading functionality in `pysvs`.
183+
184+
* Added the following member functions to `pysvs::Vamana` and `pysvs::DynamicVamana`
185+
```c++
186+
void reconstruct_at(svs::data::SimpleDataView<float> data, std::span<const uint64_t> ids);
187+
```
188+
which will reconstruct the vector indices in `ids` into the destination `data`.
189+
190+
See the description in the release notes for `pysvs` regarding the semantics of
191+
reconstruction.
192+
193+
* Type erased orchestrators may now be compiled with support for multiple query types.
194+
For example,
195+
```c++
196+
auto index = svs::Vamana::assemble<svs::lib::Types<svs::Float16, float>>(...);
197+
```
198+
will compile an orchestrator capable of processing queries of either 16-bit or 32-bit
199+
floating values. The old syntax of
200+
```c++
201+
auto index = svs::Vamana:assemble<float>(...);
202+
```
203+
is still supported and yields an index capable of only processing a single query type.
204+
205+
The augmented methods are given below:
206+
* `svs::Vamana::assemble`
207+
* `svs::Vamana::build`
208+
* `svs::DynamicVamana::assemble`
209+
* `svs::DynamicVamana::build`
210+
* `svs::Flat::assemble`
211+
212+
## Object Serialization Changes
213+
214+
* The implementation of two-level LVQ has changed from bitwise extension to true cascaded
215+
application of scalar quantization. See the discussion on
216+
[this PR](https://github.com/intel/ScalableVectorSearch/pull/28).
217+
218+
Consequently, previously saved two-level LVQ datasets have had their serialization version
219+
incremented from `v0.0.2` to `v0.0.3` and will need to be regenerated.
220+
221+
* The data structure `svsbenchmark::vamana::BuildJob` has been updated from `v0.0.3` to
222+
`v0.0.4`. This change is backwards compatible, but users of this class are encouraged to
223+
upgrade as soon as possible.
224+
225+
1. This change drops the `search_window_size` array and adds a field `preset_parameters`
226+
which must be an array of `svs::index::vamana::VamanaSearchParameters`. This is done to
227+
provide more fine-grained control of preset search parameters (including split-buffer,
228+
visited-set, and prefetching) more inline with `svsbenchmark::vamana::SearchJob`.
229+
230+
Version `v0.0.3` with `search_window_size` be compatible until the next minor version
231+
of SVS with the semantics of constructing a non-split-buffered
232+
`svs::index::vamana::VamanaSearchParameters` with no visited filter and no prefetching.
233+
234+
2. Added a field `save_directory` into which the constructed index will be saved.
235+
This field can be left as the empty string to indicate that no saving is desired.
236+
237+
The benchmarking framework will ensure that all requested save directories are unique
238+
and that the parents of the requested directories exist.
239+
240+
Older serialization version will default to no saving.
241+
242+
An example of the new format can be obtained by running
243+
```
244+
./svs_benchmark vamana_static_build --example
245+
```
246+
1247
# SVS 0.0.3 Release Notes
2248

3249
Highlighted Features

0 commit comments

Comments
 (0)