Skip to content
Open
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
e011555
workflow fix
CompromisedKiwi Jan 20, 2026
bfb16fa
coarce migration
CompromisedKiwi Jan 21, 2026
54e43c1
underline
CompromisedKiwi Jan 21, 2026
894bb73
c++17
CompromisedKiwi Jan 22, 2026
b0a1ca0
rename
CompromisedKiwi Jan 22, 2026
8e7e017
save
CompromisedKiwi Jan 23, 2026
fa1d7d2
Merge branch 'main' into yzh/migrate_doc_node
CompromisedKiwi Jan 28, 2026
7ea108e
undo workflow fix
CompromisedKiwi Jan 28, 2026
f0f6657
refactor
CompromisedKiwi Jan 30, 2026
1854448
adaptor
CompromisedKiwi Jan 30, 2026
1484fc7
finish doc_node init
CompromisedKiwi Feb 2, 2026
a69f82f
children
CompromisedKiwi Feb 3, 2026
a6cfceb
doc_node hpp
CompromisedKiwi Feb 4, 2026
0170a0e
DocNode done
CompromisedKiwi Feb 4, 2026
459cfd4
pending review
CompromisedKiwi Feb 5, 2026
5ea167c
NodeTransform done
CompromisedKiwi Feb 6, 2026
e4070f8
rename
CompromisedKiwi Feb 6, 2026
6017ffa
save
CompromisedKiwi Feb 7, 2026
cc7ab7e
Merge branch 'main' into yzh/migrate_doc_node
CompromisedKiwi Feb 7, 2026
615b7b0
Module
CompromisedKiwi Feb 7, 2026
0b193c8
map_params
CompromisedKiwi Feb 10, 2026
0d88ea6
save
CompromisedKiwi Feb 10, 2026
02cbec4
Integrate utf8proc to split text to readable chars.
CompromisedKiwi Feb 10, 2026
af7e617
UnicodeProcessor
CompromisedKiwi Feb 12, 2026
1c7ee82
text splitter base cpp finish
CompromisedKiwi Feb 13, 2026
9ef9bd8
keys
CompromisedKiwi Feb 13, 2026
068ca98
export
CompromisedKiwi Feb 13, 2026
19e00dd
sentence_splitter
CompromisedKiwi Feb 13, 2026
e0c3acc
compile_options
CompromisedKiwi Feb 24, 2026
06aa586
tests in cpp side
CompromisedKiwi Feb 24, 2026
a214e35
libstdc++.so.6
CompromisedKiwi Feb 27, 2026
e865ab6
DocNode manage itself.
CompromisedKiwi Feb 27, 2026
2fd8583
finish cpp side tests
CompromisedKiwi Mar 2, 2026
ac9dad3
cpp env switch
CompromisedKiwi Mar 4, 2026
4ab5a93
no need to test cpp override
CompromisedKiwi Mar 4, 2026
b38affc
cpp tests passed.
CompromisedKiwi Mar 5, 2026
79218fb
merge
CompromisedKiwi Mar 5, 2026
ee3ecbc
install and third parties so.
CompromisedKiwi Mar 5, 2026
42252a7
Reuse python side tests.
CompromisedKiwi Mar 6, 2026
06eabd4
LD_PRELOAD
CompromisedKiwi Mar 11, 2026
fa73e50
feat: add cpp_class decorator for C++ class replacement
CompromisedKiwi Mar 12, 2026
08f3333
docnode cpp ext repaired
CompromisedKiwi Mar 12, 2026
2c893df
save
CompromisedKiwi Mar 13, 2026
f850d15
RegisterMap
CompromisedKiwi Mar 17, 2026
9e709da
NodeTransform refactor
CompromisedKiwi Mar 17, 2026
1024d0e
no node_transform
CompromisedKiwi Mar 17, 2026
81c9aaa
simplify
CompromisedKiwi Mar 18, 2026
25d0c83
new TextSplitterBaseCPPImpl
CompromisedKiwi Mar 18, 2026
7680fff
cpp tests passed
CompromisedKiwi Mar 18, 2026
980d0ad
python tests passed
CompromisedKiwi Mar 19, 2026
0dec57a
change tiktoken cache dir outside
CompromisedKiwi Mar 19, 2026
b5c4ba3
Merge branch 'main' into yzh/migrate_doc_node
CompromisedKiwi Mar 20, 2026
1e1087a
GIL
CompromisedKiwi Mar 20, 2026
13a167d
linting
CompromisedKiwi Mar 23, 2026
043fd1b
cpp_build_and_python_regression
CompromisedKiwi Mar 23, 2026
5ccc9a7
fatal: could not read Username for
CompromisedKiwi Mar 23, 2026
a9417f3
no LAZYLLM_DATA
CompromisedKiwi Mar 23, 2026
c1d03cd
add lazyllm_data
CompromisedKiwi Mar 23, 2026
2b5393e
no rerun
CompromisedKiwi Mar 23, 2026
3fbe4b0
basic tests regression
CompromisedKiwi Mar 23, 2026
787cddd
basic tests regression done
CompromisedKiwi Mar 23, 2026
ed4c5d5
purify cpp_proxy
CompromisedKiwi Apr 7, 2026
8ae6609
no adaptor
CompromisedKiwi Apr 8, 2026
f25a1d4
docnode simplification
CompromisedKiwi Apr 8, 2026
9e2cb42
UUID
CompromisedKiwi Apr 8, 2026
6e32f84
fix
CompromisedKiwi Apr 8, 2026
369eada
fix
CompromisedKiwi Apr 8, 2026
326c068
cpp_proxy simplification
CompromisedKiwi Apr 9, 2026
5a9d1b7
no view
CompromisedKiwi Apr 9, 2026
1b74fa4
save
CompromisedKiwi Apr 9, 2026
4600b72
save
CompromisedKiwi Apr 10, 2026
a105fc1
dynamic cpp member signature checking
CompromisedKiwi Apr 10, 2026
62f2793
save
CompromisedKiwi Apr 11, 2026
d82edc8
cpp proxy class name could be specified.
CompromisedKiwi Apr 11, 2026
20b3eb5
include(FetchContent)
CompromisedKiwi Apr 11, 2026
7608a63
Forbiden R value
CompromisedKiwi Apr 11, 2026
47a2680
else if
CompromisedKiwi Apr 11, 2026
754388e
inline is implicitly specified
CompromisedKiwi Apr 11, 2026
7e895b7
new tests
CompromisedKiwi Apr 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -754,6 +754,7 @@ jobs:
cpp_ext_test:
name: C++ Extension Test (${{ matrix.os }})
needs: [ clone ]
if: always()
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an AI-generated suggestion; please verify before applying.

[critical] [logic] if: always() 使得 cpp_ext_test 作业在 needs: [clone] 失败时仍会运行,可能导致在没有正确 clone 的情况下执行后续步骤并产生不可预测的失败。

Suggestion: 如果意图是即使其他作业失败也运行,但前提是 clone 成功,应改为:


auto reviewed by BOT (claude-opus-4-6)

Expand Down
6 changes: 0 additions & 6 deletions .github/workflows/publish_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -191,12 +191,6 @@ jobs:
name: repo-with-docs
path: ./repo_artifact

- name: Install Python dev headers (Ubuntu only)
if: startsWith(matrix.os, 'ubuntu')
run: |
sudo apt-get update
sudo apt-get install -y python3-dev

- name: Extract repo-with-docs
run: |
set -ex
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ test/
dist/
tmp/
build
.cache/
*.lock
*.db
mkdocs.yml
Expand Down Expand Up @@ -64,3 +65,4 @@ docs/zh/assets
build*
lazyllm_cpp.egg-info/
!build*.sh
lazyllm/cpp_lib/
48 changes: 40 additions & 8 deletions csrc/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,23 +1,48 @@
cmake_minimum_required(VERSION 3.16)
project(LazyLLMCPP LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

find_package(Python3 COMPONENTS Interpreter Development.Module REQUIRED)
find_package(pybind11 CONFIG REQUIRED)
# Third party libs
include(cmake/third_party.cmake)

# Config lazyllm_core lib with pure cpp code.
file(GLOB_RECURSE LAZYLLM_CORE_SOURCES CONFIGURE_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/src/*.cpp")
file(GLOB_RECURSE LAZYLLM_CORE_SOURCES CONFIGURE_DEPENDS
"${CMAKE_CURRENT_SOURCE_DIR}/core/src/*.cpp")
add_library(lazyllm_core STATIC ${LAZYLLM_CORE_SOURCES})
target_include_directories(lazyllm_core PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include)
target_include_directories(lazyllm_core PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/core/include)
target_link_libraries(lazyllm_core PUBLIC xxhash)
target_link_libraries(lazyllm_core PUBLIC tiktoken)
target_link_libraries(lazyllm_core PUBLIC utf8proc)
target_compile_options(lazyllm_core PRIVATE -Werror -Wshadow)

# Config lazyllm_adaptor lib which maintains callback invocations.
file(GLOB_RECURSE LAZYLLM_ADAPTOR_SOURCES CONFIGURE_DEPENDS
"${CMAKE_CURRENT_SOURCE_DIR}/adaptor/*.cpp")
add_library(lazyllm_adaptor STATIC ${LAZYLLM_ADAPTOR_SOURCES})
target_include_directories(lazyllm_adaptor PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/adaptor)
target_link_libraries(lazyllm_adaptor PUBLIC pybind11::headers Python3::Python lazyllm_core)
target_compile_options(lazyllm_adaptor PRIVATE -Werror -Wshadow)

# Config lazyllm_cpp lib with binding infomations.
set(LAZYLLM_BINDING_SOURCES binding/lazyllm.cpp binding/doc.cpp)
file(GLOB_RECURSE LAZYLLM_BINDING_SOURCES CONFIGURE_DEPENDS
"${CMAKE_CURRENT_SOURCE_DIR}/binding/*.cpp")
set(INTERFACE_TARGET_NAME lazyllm_cpp)
pybind11_add_module(${INTERFACE_TARGET_NAME} ${LAZYLLM_BINDING_SOURCES})
target_link_libraries(${INTERFACE_TARGET_NAME} PRIVATE lazyllm_core)
target_link_libraries(${INTERFACE_TARGET_NAME} PRIVATE lazyllm_core lazyllm_adaptor)
target_compile_options(${INTERFACE_TARGET_NAME} PRIVATE -Werror -Wshadow)

# Ensure lazyllm_cpp can find third-party shared libraries under lazyllm/cpp_lib.
set(_lazyllm_cpp_rpath "$ORIGIN/cpp_lib")
if (APPLE)
set(_lazyllm_cpp_rpath "@loader_path/cpp_lib")
endif()
set_target_properties(${INTERFACE_TARGET_NAME} PROPERTIES
BUILD_RPATH "${_lazyllm_cpp_rpath}"
INSTALL_RPATH "${_lazyllm_cpp_rpath}"
)

if (CMAKE_BUILD_TYPE STREQUAL "Debug")
# SHOW_SYMBOL
Expand All @@ -26,7 +51,14 @@ if (CMAKE_BUILD_TYPE STREQUAL "Debug")
endif()

# Install
install(TARGETS ${INTERFACE_TARGET_NAME} LIBRARY DESTINATION lazyllm)
install(TARGETS ${INTERFACE_TARGET_NAME}
LIBRARY DESTINATION lazyllm COMPONENT lazyllm_cpp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an AI-generated suggestion; please verify before applying.

[medium] [logic] Install rules for shared libraries omit ARCHIVE destination, so Windows import libraries (.lib) won't be installed correctly.

Suggestion: Add ARCHIVE DESTINATION clauses to both install commands to ensure import libraries are correctly placed.


auto reviewed by BOT (claude-opus-4-6)

RUNTIME DESTINATION lazyllm COMPONENT lazyllm_cpp
)
install(TARGETS tiktoken utf8proc
LIBRARY DESTINATION lazyllm/cpp_lib COMPONENT lazyllm_cpp
RUNTIME DESTINATION lazyllm/cpp_lib COMPONENT lazyllm_cpp
)


# TESTS
Expand Down
File renamed without changes.
2 changes: 2 additions & 0 deletions csrc/adaptor/adaptor.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#include "adaptor_base_wrapper.hpp"
#include "document_store.hpp"
37 changes: 37 additions & 0 deletions csrc/adaptor/adaptor_base_wrapper.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#pragma once

#include <memory>
#include <mutex>
#include <string>
#include <unordered_map>
#include <vector>

#include <pybind11/pybind11.h>

#include "adaptor_base.hpp"


namespace lazyllm {

class LAZYLLM_HIDDEN AdaptorBaseWrapper : public AdaptorBase {
pybind11::object _py_obj;
public:
AdaptorBaseWrapper(const pybind11::object &obj) : _py_obj(obj) {}
virtual ~AdaptorBaseWrapper() = default;

std::any call(
const std::string& func_name,
const std::unordered_map<std::string, std::any>& args) const override final
{
pybind11::gil_scoped_acquire gil;
pybind11::object func = pybind11::getattr(_py_obj, func_name.c_str(), pybind11::none());
return call_impl(func_name, func, args);
}

virtual std::any call_impl(
const std::string& func_name,
const pybind11::object& func,
const std::unordered_map<std::string, std::any>& args) const = 0;
};

}
119 changes: 119 additions & 0 deletions csrc/adaptor/document_store.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#pragma once

#include <memory>
#include <string>
#include <unordered_map>
#include <vector>

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>

#include "adaptor_base_wrapper.hpp"
#include "doc_node.hpp"

namespace lazyllm {

struct NodeGroup {
enum class Type {
ORIGINAL, CHUNK, SUMMARY, IMAGE_INFO, QUESTION_ANSWER, OTHER
};
std::string _parent;
std::string _display_name;
Type _type;
NodeGroup(
const std::string& parent,
const std::string& display_name,
const Type& type = Type::ORIGINAL) :
_parent(parent), _display_name(display_name), _type(type) {}
};

class LAZYLLM_HIDDEN DocumentStore : public AdaptorBaseWrapper {
public:
DocumentStore() = delete;
explicit DocumentStore(
const pybind11::object& store,
const std::unordered_map<std::string, NodeGroup> &map) :
AdaptorBaseWrapper(store), _node_groups_map(map) {}

// Cache-aware factory to avoid rebuilding adaptor for the same Python store.
static std::shared_ptr<DocumentStore> from_store(
const pybind11::object& store, const std::unordered_map<std::string, NodeGroup>& map) {
if (store.is_none()) return nullptr;

pybind11::gil_scoped_acquire gil;
PyObject *key = store.ptr();
auto &cache = store_cache();
auto it = cache.find(key);
if (it != cache.end()) {
if (auto existing = it->second.lock())
return existing;
}
auto created = std::make_shared<DocumentStore>(store, map);
cache[key] = created;
return created;
}

DocNode::Children get_node_children(const DocNode* node) const {
DocNode::Children out;
auto& kb_id = std::any_cast<std::string&>(node->_p_global_metadata->at(std::string(RAGMetadataKeys::KB_ID)));
auto& doc_id = std::any_cast<std::string&>(node->_p_global_metadata->at(std::string(RAGMetadataKeys::DOC_ID)));
auto& group_name = node->_group_name;
for(auto& [current_group_name, group] : _node_groups_map) {
if (group._parent != group_name) continue;
if (!std::any_cast<bool>(call("is_group_active", {{"group", current_group_name}}))) continue;
auto nodes_in_group = std::any_cast<std::vector<PDocNode>>(call("get_nodes", {
{"group_name", current_group_name},
{"kb_id", kb_id},
{"doc_ids", std::vector<std::string>({doc_id})}
}));

std::vector<PDocNode> children;
children.reserve(nodes_in_group.size());
for (auto n : nodes_in_group)
if (n->get_parent_node() == node) children.push_back(n);
out[current_group_name] = children;
}
return out;
}

private:
std::unordered_map<std::string, NodeGroup> _node_groups_map;

std::any call_impl(
const std::string& func_name,
const pybind11::object& func,
const std::unordered_map<std::string, std::any>& args) const override
{
if (func_name == "is_group_active") {
return func(args.at("group")).cast<bool>();
}
else if (func_name == "get_node") {
return func(
pybind11::arg("group_name") = std::any_cast<std::string>(args.at("group_name")),
pybind11::arg("uids") = std::vector<std::string>({std::any_cast<std::string>(args.at("uid"))}),
pybind11::arg("kb_id") = std::any_cast<std::string>(args.at("kb_id")),
pybind11::arg("display") = true
).cast<pybind11::list>()[0].cast<DocNode*>();
}
else if (func_name == "get_nodes") {
return func(
pybind11::arg("group_name") = std::any_cast<std::string>(args.at("group_name")),
pybind11::arg("kb_id") = std::any_cast<std::string>(args.at("kb_id")),
pybind11::arg("doc_ids") = std::vector<std::string>({std::any_cast<std::string>(args.at("doc_id"))})
).cast<std::vector<DocNode*>>();
}
else if (func_name == "get_node_children") {
return get_node_children(std::any_cast<DocNode*>(args.at("node")));
}

throw std::runtime_error("Unknown DocumentStore function: " + func_name);
}

// Cache by Python object identity to ensure one wrapper per store instance.
static std::unordered_map<PyObject *, std::weak_ptr<DocumentStore>> &store_cache() {
static std::unordered_map<PyObject *, std::weak_ptr<DocumentStore>> cache;
return cache;
}
};

} // namespace lazyllm
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,6 @@ void addDocStr(py::object obj, std::string docs) {
}
}

void exportDoc(py::module& m) {
void exportAddDocStr(py::module& m) {
m.def("add_doc", &addDocStr, "Add docstring to a function or method", py::arg("obj"), py::arg("docs"));
}
Loading
Loading