Commit 002f3f1
Improve Chinese character support with robust Unicode property union
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.
Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases
Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates
Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)
Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.
Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>
Auto-cherry-pick to 1.0.x
Fixes spring-projects#41921 parent 98470b6 commit 002f3f1
File tree
2 files changed
+119
-2
lines changed- mcp/common/src
- main/java/org/springframework/ai/mcp
- test/java/org/springframework/ai/mcp
2 files changed
+119
-2
lines changedLines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
84 | | - | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
85 | 87 | | |
86 | 88 | | |
87 | 89 | | |
| |||
Lines changed: 115 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
96 | 211 | | |
97 | 212 | | |
98 | 213 | | |
| |||
0 commit comments