Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat elasticsearch japanese #12194

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

fujita-h
Copy link
Contributor

Summary

Fixes #12193

This pull request introduces support for a new vector type, ELASTICSEARCH_JA, which includes changes across various files to integrate this new functionality. The most important changes include adding the new vector type to the existing system, implementing the necessary configurations, and modifying the Docker setup to support the new vector type.

Integration of new vector type ELASTICSEARCH_JA:

  • api/controllers/console/datasets/datasets.py: Added VectorType.ELASTICSEARCH_JA to the list of supported vector types.
  • api/core/rag/datasource/vdb/vector_type.py: Added ELASTICSEARCH_JA to the VectorType class.
  • api/core/rag/datasource/vdb/vector_factory.py: Added a case for VectorType.ELASTICSEARCH_JA to return the ElasticSearchJaVectorFactory.
  • api/core/rag/datasource/vdb/elasticsearch/elasticsearch_ja_vector.py: Created a new file to define ElasticSearchJaConfig, ElasticSearchJaVector, and ElasticSearchJaVectorFactory classes, providing the implementation for the new vector type.

Configuration and setup changes:

  • docker/.env.example: Updated the supported values for VECTOR_STORE and modified the ELASTICSEARCH_HOST default value.
  • docker/docker-compose.yaml: Added the elasticsearch-ja profile, updated the elasticsearch service configuration to include the new profile, and added resource limits and entrypoint configurations.
  • docker/elasticsearch/docker-entrypoint.sh: Created a new script to install necessary plugins for elasticsearch-ja and run the original entrypoint script.

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

A trial license is a temporary license that allows you to try out commercial features, but restrictions will be applied when the trial period (usually 30 days) ends.
Leaving a trial license after the trial period has expired may be in violation of the license.
Elasticsearch can consume a large amount of available memory unless you explicitly limit the amount of memory.
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. labels Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: Add support for Elasticsearch with Japanese language analysis
1 participant