-
Notifications
You must be signed in to change notification settings - Fork 0
Implement lakeFS document loader #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7091a77
to
69bbfdf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The lakefs client uses the default endpoint and creds from the environment and not the one we pass in the code.
- UnstructuredLakeFSLoader doesn't hold a client to perform api call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some suggestions and questions
return partition(filename=local_path) | ||
else: | ||
with tempfile.TemporaryDirectory() as temp_dir: | ||
file_path = f"{temp_dir}/{self.path.split('/')[-1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format alternative path under temporary directory
file_path = f"{temp_dir}/{self.path.split('/')[-1]}" | |
file_path = os.path.join(temp_dir, os.path.basename(self.path)) |
tests/unit_tests/test_lakefs.py
Outdated
import requests_mock | ||
from requests_mock.mocker import Mocker | ||
|
||
from langchain_community.document_loaders.lakefs import LakeFSLoader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import from langchain_community.document_loaders.lakefs
is not what I saw running in the sample code.
Based on the code it was from langchain_lakefs.document_loaders
.
tests/unit_tests/test_lakefs.py
Outdated
# endpoint: str = "endpoint" | ||
endpoint: str = "http://localhost:8000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pick one and remove the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - minor comments
|
||
|
||
@pytest.fixture | ||
def mock_unstructured_local() -> Any: | ||
with patch( | ||
"langchain_community.document_loaders.lakefs.UnstructuredLakeFSLoader" | ||
"langchain_lakefs.document_loaders.UnstructuredLakeFSLoader" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
tests/unit_tests/test_lakefs.py
Outdated
@@ -140,3 +96,4 @@ def test_load(self, mocker: Mocker) -> None: | |||
loader.set_path(self.path) | |||
documents = loader.load() | |||
self.assertEqual(len(documents), 2) | |||
self.assertEqual(len(documents[0].metadata),5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual(len(documents[0].metadata),5) | |
self.assertEqual(len(documents[0].metadata), 5) |
This PR takes the lakeFS implementation from the langchain_communtiy and implements it here:
All of the implementation and tests are taken as is with the following changes