Bug #3846: Release search version sorting is lexicographic instead of natural numeric order#3847
Bug #3846: Release search version sorting is lexicographic instead of natural numeric order#3847YianZhao wants to merge 2 commits intoeclipse-sw360:mainfrom
Conversation
Signed-off-by: Alex <[email protected]>
| " return lower.replace(/\\d+/g, function(match) {" + | ||
| " var normalized = match.replace(/^0+(?!$)/, '');" + | ||
| " var length = normalized.length.toString();" + | ||
| " while (length.length < 6) { length = '0' + length; }" + | ||
| " return '{' + length + normalized + '}';" + | ||
| " });" + |
There was a problem hiding this comment.
Would you mind explaining the magic here?
There was a problem hiding this comment.
This builds a natural sort key for version-like strings. It lowercases the input, then rewrites every numeric chunk into a sortable token: first its length (zero-padded), then the normalized number itself. That way lexicographic sorting compares numbers by numeric magnitude instead of plain string order, so for example 1.2.10 sorts after 1.2.3.
lower: first convert everything to lowercase, so case differences do not affect sorting.replace(/\d+/g, ...): finds each contiguous numeric segment in the string.match.replace(/^0+(?!$)/, ''): removes leading zeros, while still preserving a single0.- for example,
0012 -> 12 - and
000 -> 0
- for example,
normalized.length: gets the length of that normalized numeric string.- Then the length is left-padded to a fixed width of 6 digits.
2 -> 00000210 -> 000010
- Finally it returns
'{'+ length + normalized + '}'.
The purpose of this is to make plain string sorting behave like numeric sorting for embedded numbers.
For example:
1.2.3→ the numeric segment3becomes something like{0000013}1.2.10→ the numeric segment10becomes something like{00000210}
Because the comparison looks at the length first, and then the value, this ensures that:
3is smaller than10- and you do not get the usual lexicographic problem where
"10" < "3"
| " return lower.replace(/\\d+/g, function(match) {" + | ||
| " var normalized = match.replace(/^0+(?!$)/, '');" + | ||
| " var length = normalized.length.toString();" + | ||
| " while (length.length < 6) { length = '0' + length; }" + |
There was a problem hiding this comment.
Thank for asking, 6 is just a fixed padding width for the length prefix, so all rewritten numeric tokens have a comparable shape.
For example:
3-> length1->00000110-> length2->000002123-> length3->000003
I chose 6 simply as a sufficiently large constant for expected version segments, not because it has special meaning. The goal is only to keep the length field fixed-width so lexicographic comparison works reliably. If we want, I can replace it with a named constant or add a short comment to make that clearer.
| }; | ||
| } | ||
|
|
||
| static String normalizeVersionForSort(String version) { |
There was a problem hiding this comment.
normalizeVersionForSort(String) is currently used by unit tests as a Java mirror of the JS index normalization logic, to ensure both implementations stay consistent.
Fixes #3846
What happened
Release search sorted by version used lexicographic order, so versions like
1.10could be ordered before1.2.Root cause
version_sortwas indexed from rawdoc.versionstring.What changed
ReleaseSearchHandlerLucene index function.ReleaseSearchHandlerTestregression tests:1.2 < 1.10)1.02 == 1.2)alpha2 < alpha10)Verification