Release v0.21.0 · wi2trier/cbrkit

0.21.0 (2025-02-05)

⚠ BREAKING CHANGES

The entire library has largely been rewritten, so there will be additional breaking changes. Please refer to the Readme and the tests for more information.
The function cbrkit.reuse.build now expects a retriever function instead of a similarity function so that more logic can be shared between the phases.
To better support the new retrieval functions, the arguments limit, min_similarity, and max_similarity of the function cbrkit.retrieval.build have been removed. Instead, wrap your call of cbrkit.retrieval.build with the new function cbrkit.retrieval.dropout that now exposes these arguments.
The functions apply and mapply have been removed to better support processing multiple queries at once. They have been replaced by the functions apply_query and apply_queries. Both return the same result object, so the return value of apply_queries is not identical to the one of the previous mapply function. The functions apply and apply_query however share the same return type.
The number of processes to use for retrieval is no longer passed to the apply functions, but instead given to the build function.
To better support the new retrieval functions, the arguments limit, min_similarity, and max_similarity of the function cbrkit.retrieval.build have been removed. Instead, wrap your call of cbrkit.retrieval.build with the new function cbrkit.retrieval.dropout that now exposes these arguments.
CBRkit now provides additional modules for adapt, reuse, cycle, and eval.
We added support for logging via the standard library.
There is a new synthesis module that provides tight integration with various LLM providers. This can for instance be used to develop RAG applications using CBR.
Loading and dumping cases has been reworked, we now provide generators to construct serialization and deserialization functions.
Caching of similarity values has been added, simply wrap your existing similarity function with the new cbrkit.sim.cache wrapper.
A new embedding module cbrkit.sim.embed has been added that provides a better interface to compose string-based similarity functions that rely on vectors. It also includes a cache that can be stored on disk.
Similarity functions for graphs have been overhauled and now provide a more consistent interface.

Features

adapt: add openai function to adapt cases (38cbb26)
adapt: add similarity delta to pipe function (9d58252)
add docstrings to export (3991a51)
add dumpers module for serializing casebases (475f532)
add dumpers, anthropic provider, update docs (#215) (0f440c5)
add generation submodule to handle provider-specific code (068b6ff)
add global handling of asyncio event loop (0ed704a)
add initial version of rag module (f803b3b)
add integration with voyageai (6d7b4eb)
add logging (32fde3d)
add methods to perform entire r4 cycles more easily (eb08557)
add openapi schema generator (611370d)
add rag support to api and cli (57a0334)
add support for factories (529aa1a)
add support for factories to more functions (2794ffc)
add transpose_value wrappers (ebbabc9)
api: allow passing paths for casebase/query (8cdcbb4)
api: support passing files (2993fac)
api: switch query parameters to request body (9604428)
convert results to pydantic models (3b1c5e0)
dumpers: make markdown function generic (33d627c)
embed/openai: add lazy loading (7e5c783)
embed: add lazy loading for cache (98790cc)
eval: add helper for arbitrary scores (79ce192)
eval: add proper support for relevance levels (c414112)
eval: allow conversion of retrieval result to qrels (c8a7cb5)
eval: allow custom metric functions (fde869a)
generate: add memory to openai (e645fde)
helpers: add getitem_or_getattr (0b94774)
helpers: allow conversion of functions to base models (b206c1f)
improve genai providers (8da6269)
improve handling of multiprocessing (81ac32c)
improve logging and multiprocessing (3b78deb)
integrate processing of query collections into the core of cbrkit (b8df8ee)
make cbrkit project layout more consistent (b738d6f)
multiprocessing: allow boolean values (e9e3827)
openai: add support for tool calling via unions (0b3e29e)
optimize multiprocessing (87ee55f)
rag: add model similar to retrieval/reuse (ca76c73)
retrieval: add dropout function (3f50dbf)
retrieval: add openai function for estimating the similarity (803ff75)
retrieval: add sentence transformers reranker (bd05b2e)
retrieval: add transpose helper to simplify conversion of cases (216eca3)
retrieval: use async clients for cohere and voyage ai (6b37814)
reuse: allow passing multiple adaptation functions to builder (6493923)
reuse: allow passing similarities from earlier steps (6732b38)
reuse: introduce dropout function similar to retrieval (c3254c6)
rework reuse phase and update apply helpers (f4a11e8)
rework type structure and improve genai/rag modules (b94e898)
sim/embed: add chunking/truncation for openai (406c7bc)
sim/graphs: add dtw alignment (#214) (bc44cb7)
sim/graphs: add dtw and smith-waterman functions (#201) (040b702)
sim/graphs: add initial version of exhaustive mapping (818a356)
sim/graphs: add local sims, update astar heuristics (fe51441)
sim/graphs: add precompute function (5ba52d7)
sim/graphs: add smith (#218) (771fe3b)
sim/graphs: make it easier to define node similarities (0c95d3a)
sim/graphs: rewrite astar algorithm (9c788f6)
sim/strings: add vector database (07727f0)
sim: add cache method (8c9992f)
sim: add default sim for attribute value (686f270)
sim: update interface for embed and taxonomy functions (6af97f9)
sim: update table helpers and move to wrappers (868bd8d)
synthesis/providers: add delay parameter (0fad3c5)
synthesis: add google provider (1724ad0)
synthesis: allow chunking with overlap (ee5bebd)
synthesis: run chunk helper in parallel (b91c9c0)
various improvements (da308f2)

Bug Fixes

adapt/generic: add strategy to pipe function (d943f59)
api: response types failed to validate (a5fe024)
api: simplify definition of retrievers/reusers (371cd78)
astar: convert to batch sim func (cfc0d42)
astar: improve logic (1643549)
astar: make naming and exports more consistent (007dfb0)
astar: restructure legal mapping funcs, add sim precomputer (8d54903)
chunkify: check arguments (90b8f38)
cli: disable pretty exceptions (215070b)
cli: use dumpers for exporting (2293948)
convert some lambdas to real functions (10994c3)
correctly construct pydantic models (fc49b1a)
correctly set sentence_transformers metadata (d3511cc)
default to structured outputs for openai (247bb05)
dumpers: properly get name for markdown code block (40ba442)
embed: add autodump to cache (f7dc949)
embed: add lazy loading to sentence transformers (0cf74ec)
embed: add logging (9156846)
embed: autodump only if new texts are found (bd9d05b)
embed: check hash before dumping cache (118611b)
embed: remove unneeded lazy loading (45738c0)
embed: use modified time instead of hash to detect changes (4ee3963)
eval: add kendall tau (95eda58)
eval: add mean_score function (9ac7c30)
eval: improve conversion of scores to qrels (579de7f)
eval: improve metric generation (c14f48e)
export default aggregator (f4d4473)
extend support for lazy loading (645cad4)
formatting and typing improvements (d49f3e5)
genai/prompts: add transpose function (dc02582)
graph: enhance serialization (d6111a4)
graphs: add converter callbacks to dump/load (d96e9d8)
graphs: add load/dump (e61bd92)
graphs: drop SerializedNode (b4e0939)
helpers: add log_batch (cf17591)
helpers: correctly handle bool values for multiprocessing (b02e749)
helpers: optimize loading of callable maps (22e9443)
improve dumpers, especially for graphs (cb9f6df)
improve eval module (a85f817)
improve handling of defaults for tables (f082511)
improve logging during multiprocessing (fb038ba)
improve synthesis (00e95c2)
improve typing of generic tables (7ca8740)
improve vector db (31afcc1)
keep casebase/query in result object when dumping (3cfd623)
loaders: correctly handle files in directories (afa26f2)
loaders: properly handle io (088f3c9)
loaders: properly load binary data (33c3424)
log only if more than one batches are processed (10d8371)
make dumper argument ordering more consistent (5e90214)
make reuse/retrieval functions more robust (e7437b9)
minor improvements (f1c25b0)
model: add default_query to top result class (9d33c17)
model: store unfiltered casebase as well (73ccb69)
move from TypedDict to BaseModel/dataclass (2afce5c)
openai: use not_given where necessary (e39f297)
prompts: allow giving functions as instructions (1646548)
prompts: remove dedent (8abce29)
re-add logging to astar (aa99c92)
remove factories that are no longer needed (a420695)
remove synthesis-based retriever/reuser until a better interface is defined (7d813c5)
restore similarity filtering behavior (7ec627d)
result export (58ba034)
retrieval: improve metadata (76d90e4)
retrieval: optimize sentence transformers (553a018)
sim/astar: add default to max-calls (e2fc133)
sim/astar: correctly compute sim and loop over the open set (28cb22b)
sim/astar: force to map all edges in select2 (87ea326)
sim/astar: remove optimization for edge expansion (f8c1409)
sim/collections: allow dtw for arbitrary types (#204) (5f7585e)
sim/collections: update types for dtw (74737f1)
sim/embed: correctly convert to float (32925d5)
sim/embed: correctly load/dump cached store (22c8ffc)
sim/embed: generalize helper functions (3bb7c10)
sim/graphs: generalize graph sim (7212929)
sim/graphs: improve is_sequential and conditionally import alignment metrics (33b25e4)
sim/graphs: improve isomorphism (a6bac6b)
sim/graphs: merge node_data_sim and node_obj_sim (9b912ad)
sim/graphs: swap x and y in some cases (070e8fb)
sim/graphs: use dicts for graph sim return value (4fffbc0)
sim/graphs: use optional dependencies for alignment (4ea0005)
sim/strings: gracefully handle empty batches (6f7e86e)
sim/strings: optimize computation of semantic similarities (cfa6505)
sim: add type_equality function (c49a3af)
sim: do not serialize cache (9ef0b26)
sim: expand functionality of dynamic table (e79cb33)
sim: improve table similarities (977d798)
small improvements for sentence transformers (7877c27)
synthesis: add logging (c59839a)
synthesis: openai message construction (959d33e)
synthesis: properly use init vars (dd4ddf1)
synthesis: update openai parameters (9eac912)
taxonomy: allow paths (b35e488)
typing: use np.float64 (fdbd45b)
update text loaders (c60219c)
use rag functions in retrieve/adapt and add chunking (f737a2b)

Miscellaneous Chores

add notable changes (1f1bd17)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.21.0

0.21.0 (2025-02-05)

⚠ BREAKING CHANGES

Features

Bug Fixes

Miscellaneous Chores