Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove deprecated APIs #81

Merged
merged 1 commit into from
Mar 20, 2025
Merged

Remove deprecated APIs #81

merged 1 commit into from
Mar 20, 2025

Conversation

dliubarskyi
Copy link
Member

@dliubarskyi dliubarskyi commented Mar 20, 2025

Change

This is part of langchain4j/langchain4j#2726

Removed:

  • BertTokenizer -> Replaced by: HuggingFaceTokenizer
  • OnnxEmbeddingModel(Path) -> Replaced by: OnnxEmbeddingModel(Path, Path, PoolingMode)
  • OnnxEmbeddingModel(String) -> Replaced by: OnnxEmbeddingModel(Path, Path, PoolingMode)

General checklist

  • There are no breaking changes
  • I have added unit and/or integration tests for my change
  • The tests cover both positive and negative cases
  • I have manually run all the unit and integration tests in the module I have added/changed, and they are all green
  • I have added/updated the documentation
  • I have added an example in the examples repo (only for "big" features)
  • I have added/updated Spring Boot starter(s) (if applicable)

Sorry, something went wrong.

Copy link

Hi @dliubarskyi, thank you very much for your PR! ❤️
I'm a bot powered by Google AI Gemini gemini-2.0-flash-exp.
The maintainers of LangChain4j will perform a thorough code review as soon as they can, but in the meantime, here’s a preliminary review from me. I hope you find it helpful.

Potential Issues

Potential Breaking Changes

  • Removal of deprecated BertTokenizer and OnnxEmbeddingModel constructors. Users relying on these deprecated APIs will need to migrate to the HuggingFaceTokenizer or the new OnnxEmbeddingModel constructors.

Potential Design Issues

  • The removal of the deprecated BertTokenizer might force users to migrate to HuggingFaceTokenizer, potentially increasing the project's dependency on HuggingFace's tokenization implementation. This could introduce compatibility issues if HuggingFace's implementation changes significantly in the future.

Potential Bugs

  • The HuggingFaceTokenizer.estimateTokenCountInMessage method now handles different message types (SystemMessage, UserMessage, AiMessage, ToolExecutionResultMessage). A potential bug could be that the handling of these different message types is not consistent or complete, leading to incorrect token counts for certain message types.

Testing

Changes in this PR are sufficiently tested: ❌

Suggested Positive Test Scenarios

    • Given: A valid text.
    • When: Calling estimateTokenCountInText(text)
    • Then: The tokenizer should correctly estimate the number of tokens.
    • Given: A ChatMessage of type SystemMessage, UserMessage, AiMessage and ToolExecutionResultMessage.
    • When: Calling estimateTokenCountInMessage(message)
    • Then: The tokenizer should correctly estimate the number of tokens.

Suggested Negative Test Scenarios

    • Given: An invalid path to the tokenizer file.
    • When: Creating an instance of OnnxEmbeddingModel with an invalid tokenizer path.
    • Then: The OnnxEmbeddingModel should throw an exception during initialization.
    • Given: A null or empty text.
    • When: Calling estimateTokenCountInText(null) or estimateTokenCountInText("")
    • Then: The tokenizer should return 0 or throw an IllegalArgumentException.

Suggested Corner Case Test Scenarios

    • Given: A very long text that exceeds the maximum token length supported by the underlying tokenizer.
    • When: Calling estimateTokenCountInText(text)
    • Then: The tokenizer should handle the overflow gracefully, either by truncating the text or by throwing an appropriate exception.
    • Given: A text containing special characters or unicode characters.
    • When: Calling estimateTokenCountInText(text)
    • Then: The tokenizer should correctly tokenize these characters.

Documentation

Changes in this PR are sufficiently documented: ✅

@dliubarskyi dliubarskyi merged commit 60ae836 into main Mar 20, 2025
1 of 2 checks passed
@dliubarskyi dliubarskyi deleted the remove-deprecated-APIs branch March 20, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant