-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: MarkdownHeaderSplitter #9660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: MarkdownHeaderSplitter #9660
Conversation
@OGuggenbuehl definitely looks like an interesting approach! I've left an initial set of comments, but to further review I'd appreciate if you could add a set of tests like the ones we have for the This will help me be able to review the actual algorithm for splitting since it's easier to understand with examples. |
use haystack logging Co-authored-by: Sebastian Husch Lee <[email protected]>
remove temp toc Co-authored-by: Sebastian Husch Lee <[email protected]>
…enbuehl/haystack into feature/md-header-splitter
ba90272
to
7ef16a7
Compare
56dd0a0
to
d7d4f18
Compare
abb0a84
to
44e0454
Compare
@sjrl I have been thinking about whether keeping |
Good question! It does sound different from what our other splitters do and almost fits better into the |
I feel that this would be a more appropriate approach - also because it would improve the separation of concerns by component. I suggest I:
|
@OGuggenbuehl that sounds good! |
Proposed Changes:
Implement MarkdownHeaderSplitter to split Documents written in .md based on their headers
How did you test it?
unit tests
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.