Skip to content

v0.21.0

Compare
Choose a tag to compare
@github-actions github-actions released this 05 Feb 14:06
· 28 commits to main since this release

0.21.0 (2025-02-05)

⚠ BREAKING CHANGES

  • The entire library has largely been rewritten, so there will be additional breaking changes. Please refer to the Readme and the tests for more information.
  • The function cbrkit.reuse.build now expects a retriever function instead of a similarity function so that more logic can be shared between the phases.
  • To better support the new retrieval functions, the arguments limit, min_similarity, and max_similarity of the function cbrkit.retrieval.build have been removed. Instead, wrap your call of cbrkit.retrieval.build with the new function cbrkit.retrieval.dropout that now exposes these arguments.
  • The functions apply and mapply have been removed to better support processing multiple queries at once. They have been replaced by the functions apply_query and apply_queries. Both return the same result object, so the return value of apply_queries is not identical to the one of the previous mapply function. The functions apply and apply_query however share the same return type.
  • The number of processes to use for retrieval is no longer passed to the apply functions, but instead given to the build function.
  • To better support the new retrieval functions, the arguments limit, min_similarity, and max_similarity of the function cbrkit.retrieval.build have been removed. Instead, wrap your call of cbrkit.retrieval.build with the new function cbrkit.retrieval.dropout that now exposes these arguments.
  • CBRkit now provides additional modules for adapt, reuse, cycle, and eval.
  • We added support for logging via the standard library.
  • There is a new synthesis module that provides tight integration with various LLM providers. This can for instance be used to develop RAG applications using CBR.
  • Loading and dumping cases has been reworked, we now provide generators to construct serialization and deserialization functions.
  • Caching of similarity values has been added, simply wrap your existing similarity function with the new cbrkit.sim.cache wrapper.
  • A new embedding module cbrkit.sim.embed has been added that provides a better interface to compose string-based similarity functions that rely on vectors. It also includes a cache that can be stored on disk.
  • Similarity functions for graphs have been overhauled and now provide a more consistent interface.

Features

  • adapt: add openai function to adapt cases (38cbb26)
  • adapt: add similarity delta to pipe function (9d58252)
  • add docstrings to export (3991a51)
  • add dumpers module for serializing casebases (475f532)
  • add dumpers, anthropic provider, update docs (#215) (0f440c5)
  • add generation submodule to handle provider-specific code (068b6ff)
  • add global handling of asyncio event loop (0ed704a)
  • add initial version of rag module (f803b3b)
  • add integration with voyageai (6d7b4eb)
  • add logging (32fde3d)
  • add methods to perform entire r4 cycles more easily (eb08557)
  • add openapi schema generator (611370d)
  • add rag support to api and cli (57a0334)
  • add support for factories (529aa1a)
  • add support for factories to more functions (2794ffc)
  • add transpose_value wrappers (ebbabc9)
  • api: allow passing paths for casebase/query (8cdcbb4)
  • api: support passing files (2993fac)
  • api: switch query parameters to request body (9604428)
  • convert results to pydantic models (3b1c5e0)
  • dumpers: make markdown function generic (33d627c)
  • embed/openai: add lazy loading (7e5c783)
  • embed: add lazy loading for cache (98790cc)
  • eval: add helper for arbitrary scores (79ce192)
  • eval: add proper support for relevance levels (c414112)
  • eval: allow conversion of retrieval result to qrels (c8a7cb5)
  • eval: allow custom metric functions (fde869a)
  • generate: add memory to openai (e645fde)
  • helpers: add getitem_or_getattr (0b94774)
  • helpers: allow conversion of functions to base models (b206c1f)
  • improve genai providers (8da6269)
  • improve handling of multiprocessing (81ac32c)
  • improve logging and multiprocessing (3b78deb)
  • integrate processing of query collections into the core of cbrkit (b8df8ee)
  • make cbrkit project layout more consistent (b738d6f)
  • multiprocessing: allow boolean values (e9e3827)
  • openai: add support for tool calling via unions (0b3e29e)
  • optimize multiprocessing (87ee55f)
  • rag: add model similar to retrieval/reuse (ca76c73)
  • retrieval: add dropout function (3f50dbf)
  • retrieval: add openai function for estimating the similarity (803ff75)
  • retrieval: add sentence transformers reranker (bd05b2e)
  • retrieval: add transpose helper to simplify conversion of cases (216eca3)
  • retrieval: use async clients for cohere and voyage ai (6b37814)
  • reuse: allow passing multiple adaptation functions to builder (6493923)
  • reuse: allow passing similarities from earlier steps (6732b38)
  • reuse: introduce dropout function similar to retrieval (c3254c6)
  • rework reuse phase and update apply helpers (f4a11e8)
  • rework type structure and improve genai/rag modules (b94e898)
  • sim/embed: add chunking/truncation for openai (406c7bc)
  • sim/graphs: add dtw alignment (#214) (bc44cb7)
  • sim/graphs: add dtw and smith-waterman functions (#201) (040b702)
  • sim/graphs: add initial version of exhaustive mapping (818a356)
  • sim/graphs: add local sims, update astar heuristics (fe51441)
  • sim/graphs: add precompute function (5ba52d7)
  • sim/graphs: add smith (#218) (771fe3b)
  • sim/graphs: make it easier to define node similarities (0c95d3a)
  • sim/graphs: rewrite astar algorithm (9c788f6)
  • sim/strings: add vector database (07727f0)
  • sim: add cache method (8c9992f)
  • sim: add default sim for attribute value (686f270)
  • sim: update interface for embed and taxonomy functions (6af97f9)
  • sim: update table helpers and move to wrappers (868bd8d)
  • synthesis/providers: add delay parameter (0fad3c5)
  • synthesis: add google provider (1724ad0)
  • synthesis: allow chunking with overlap (ee5bebd)
  • synthesis: run chunk helper in parallel (b91c9c0)
  • various improvements (da308f2)

Bug Fixes

  • adapt/generic: add strategy to pipe function (d943f59)
  • api: response types failed to validate (a5fe024)
  • api: simplify definition of retrievers/reusers (371cd78)
  • astar: convert to batch sim func (cfc0d42)
  • astar: improve logic (1643549)
  • astar: make naming and exports more consistent (007dfb0)
  • astar: restructure legal mapping funcs, add sim precomputer (8d54903)
  • chunkify: check arguments (90b8f38)
  • cli: disable pretty exceptions (215070b)
  • cli: use dumpers for exporting (2293948)
  • convert some lambdas to real functions (10994c3)
  • correctly construct pydantic models (fc49b1a)
  • correctly set sentence_transformers metadata (d3511cc)
  • default to structured outputs for openai (247bb05)
  • dumpers: properly get name for markdown code block (40ba442)
  • embed: add autodump to cache (f7dc949)
  • embed: add lazy loading to sentence transformers (0cf74ec)
  • embed: add logging (9156846)
  • embed: autodump only if new texts are found (bd9d05b)
  • embed: check hash before dumping cache (118611b)
  • embed: remove unneeded lazy loading (45738c0)
  • embed: use modified time instead of hash to detect changes (4ee3963)
  • eval: add kendall tau (95eda58)
  • eval: add mean_score function (9ac7c30)
  • eval: improve conversion of scores to qrels (579de7f)
  • eval: improve metric generation (c14f48e)
  • export default aggregator (f4d4473)
  • extend support for lazy loading (645cad4)
  • formatting and typing improvements (d49f3e5)
  • genai/prompts: add transpose function (dc02582)
  • graph: enhance serialization (d6111a4)
  • graphs: add converter callbacks to dump/load (d96e9d8)
  • graphs: add load/dump (e61bd92)
  • graphs: drop SerializedNode (b4e0939)
  • helpers: add log_batch (cf17591)
  • helpers: correctly handle bool values for multiprocessing (b02e749)
  • helpers: optimize loading of callable maps (22e9443)
  • improve dumpers, especially for graphs (cb9f6df)
  • improve eval module (a85f817)
  • improve handling of defaults for tables (f082511)
  • improve logging during multiprocessing (fb038ba)
  • improve synthesis (00e95c2)
  • improve typing of generic tables (7ca8740)
  • improve vector db (31afcc1)
  • keep casebase/query in result object when dumping (3cfd623)
  • loaders: correctly handle files in directories (afa26f2)
  • loaders: properly handle io (088f3c9)
  • loaders: properly load binary data (33c3424)
  • log only if more than one batches are processed (10d8371)
  • make dumper argument ordering more consistent (5e90214)
  • make reuse/retrieval functions more robust (e7437b9)
  • minor improvements (f1c25b0)
  • model: add default_query to top result class (9d33c17)
  • model: store unfiltered casebase as well (73ccb69)
  • move from TypedDict to BaseModel/dataclass (2afce5c)
  • openai: use not_given where necessary (e39f297)
  • prompts: allow giving functions as instructions (1646548)
  • prompts: remove dedent (8abce29)
  • re-add logging to astar (aa99c92)
  • remove factories that are no longer needed (a420695)
  • remove synthesis-based retriever/reuser until a better interface is defined (7d813c5)
  • restore similarity filtering behavior (7ec627d)
  • result export (58ba034)
  • retrieval: improve metadata (76d90e4)
  • retrieval: optimize sentence transformers (553a018)
  • sim/astar: add default to max-calls (e2fc133)
  • sim/astar: correctly compute sim and loop over the open set (28cb22b)
  • sim/astar: force to map all edges in select2 (87ea326)
  • sim/astar: remove optimization for edge expansion (f8c1409)
  • sim/collections: allow dtw for arbitrary types (#204) (5f7585e)
  • sim/collections: update types for dtw (74737f1)
  • sim/embed: correctly convert to float (32925d5)
  • sim/embed: correctly load/dump cached store (22c8ffc)
  • sim/embed: generalize helper functions (3bb7c10)
  • sim/graphs: generalize graph sim (7212929)
  • sim/graphs: improve is_sequential and conditionally import alignment metrics (33b25e4)
  • sim/graphs: improve isomorphism (a6bac6b)
  • sim/graphs: merge node_data_sim and node_obj_sim (9b912ad)
  • sim/graphs: swap x and y in some cases (070e8fb)
  • sim/graphs: use dicts for graph sim return value (4fffbc0)
  • sim/graphs: use optional dependencies for alignment (4ea0005)
  • sim/strings: gracefully handle empty batches (6f7e86e)
  • sim/strings: optimize computation of semantic similarities (cfa6505)
  • sim: add type_equality function (c49a3af)
  • sim: do not serialize cache (9ef0b26)
  • sim: expand functionality of dynamic table (e79cb33)
  • sim: improve table similarities (977d798)
  • small improvements for sentence transformers (7877c27)
  • synthesis: add logging (c59839a)
  • synthesis: openai message construction (959d33e)
  • synthesis: properly use init vars (dd4ddf1)
  • synthesis: update openai parameters (9eac912)
  • taxonomy: allow paths (b35e488)
  • typing: use np.float64 (fdbd45b)
  • update text loaders (c60219c)
  • use rag functions in retrieve/adapt and add chunking (f737a2b)

Miscellaneous Chores