Skip to content

Conversation

@rawwerks
Copy link

Summary

Previously, the handelize function only accepted filenames containing ASCII alphanumeric characters [a-zA-Z0-9]. This caused indexing to fail for files with non-ASCII names.

Before: Files like 腾讯混元.md, 日本語.md, café-notes.md would throw an error during indexing.

After: Unicode letters and numbers are now supported in filenames across all languages.

Changes

  • Updated validation regex to use Unicode property escapes [\p{L}\p{N}] which match any Unicode letter or number
  • Updated character replacement regex to preserve Unicode letters/numbers instead of stripping them
  • Updated tests to verify Unicode filename support

Test Plan

  • All existing handelize tests pass
  • New tests added for Chinese, Japanese, Korean filenames
  • Mixed Unicode/ASCII filenames work correctly
  • Folder paths with Unicode work correctly

Example

// Before: throws "no valid filename content"
handelize("腾讯混元.md") 

// After: returns "腾讯混元.md"
handelize("腾讯混元.md") // => "腾讯混元.md"
handelize("日本語-notes.md") // => "日本語-notes.md"  
handelize("café-notes.md") // => "café-notes.md"

🤖 Generated with Claude Code

Previously, the handelize function only accepted filenames containing
ASCII alphanumeric characters [a-zA-Z0-9]. This caused indexing to fail
for files with non-ASCII names, such as Chinese (腾讯混元.md), Japanese
(日本語.md), Korean (한국어.md), or accented characters (café.md, naïve.md).

Changes:
- Updated validation regex to use Unicode property escapes [\p{L}\p{N}]
  which match any Unicode letter or number
- Updated character replacement regex to preserve Unicode letters/numbers
  instead of stripping them
- Updated tests to verify Unicode filename support

This enables indexing documents in any language without requiring ASCII
characters in the filename.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@neoromantic
Copy link

@tobi appreciate if you could merge this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants