Skip to content

Bug #3846: Release search version sorting is lexicographic instead of natural numeric order#3847

Open
YianZhao wants to merge 2 commits intoeclipse-sw360:mainfrom
YianZhao:bug/release-version-natural-sort
Open

Bug #3846: Release search version sorting is lexicographic instead of natural numeric order#3847
YianZhao wants to merge 2 commits intoeclipse-sw360:mainfrom
YianZhao:bug/release-version-natural-sort

Conversation

@YianZhao
Copy link
Contributor

Fixes #3846

What happened

Release search sorted by version used lexicographic order, so versions like 1.10 could be ordered before 1.2.

Root cause

version_sort was indexed from raw doc.version string.

What changed

  • Added version normalization for sorting in ReleaseSearchHandler Lucene index function.
  • Added matching Java normalization utility used by tests.
  • Ensured numeric-length prefix padding is 6 digits (same behavior as JS logic).
  • Added ReleaseSearchHandlerTest regression tests:
    • natural numeric segment sort (1.2 < 1.10)
    • leading zero equivalence (1.02 == 1.2)
    • numeric suffix sort (alpha2 < alpha10)
    • explicit 6-digit length-prefix assertion

Verification

mvn -pl backend/common "-Dbase.deploy.dir=." -Dtest=ReleaseSearchHandlerTest test

@GMishx GMishx added needs code review needs general test This is general testing, meaning that there is no org specific issue to check for labels Mar 12, 2026
Copy link
Member

@GMishx GMishx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions

Comment on lines +52 to +57
" return lower.replace(/\\d+/g, function(match) {" +
" var normalized = match.replace(/^0+(?!$)/, '');" +
" var length = normalized.length.toString();" +
" while (length.length < 6) { length = '0' + length; }" +
" return '{' + length + normalized + '}';" +
" });" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explaining the magic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds a natural sort key for version-like strings. It lowercases the input, then rewrites every numeric chunk into a sortable token: first its length (zero-padded), then the normalized number itself. That way lexicographic sorting compares numbers by numeric magnitude instead of plain string order, so for example 1.2.10 sorts after 1.2.3.

  • lower: first convert everything to lowercase, so case differences do not affect sorting.
  • replace(/\d+/g, ...): finds each contiguous numeric segment in the string.
  • match.replace(/^0+(?!$)/, ''): removes leading zeros, while still preserving a single 0.
    • for example, 0012 -> 12
    • and 000 -> 0
  • normalized.length: gets the length of that normalized numeric string.
  • Then the length is left-padded to a fixed width of 6 digits.
    • 2 -> 000002
    • 10 -> 000010
  • Finally it returns '{'+ length + normalized + '}'.

The purpose of this is to make plain string sorting behave like numeric sorting for embedded numbers.

For example:

  • 1.2.3 → the numeric segment 3 becomes something like {0000013}
  • 1.2.10 → the numeric segment 10 becomes something like {00000210}

Because the comparison looks at the length first, and then the value, this ensures that:

  • 3 is smaller than 10
  • and you do not get the usual lexicographic problem where "10" < "3"

" return lower.replace(/\\d+/g, function(match) {" +
" var normalized = match.replace(/^0+(?!$)/, '');" +
" var length = normalized.length.toString();" +
" while (length.length < 6) { length = '0' + length; }" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the magic number 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for asking, 6 is just a fixed padding width for the length prefix, so all rewritten numeric tokens have a comparable shape.

For example:

  • 3 -> length 1 -> 000001
  • 10 -> length 2 -> 000002
  • 123 -> length 3 -> 000003

I chose 6 simply as a sufficiently large constant for expected version segments, not because it has special meaning. The goal is only to keep the length field fixed-width so lexicographic comparison works reliably. If we want, I can replace it with a named constant or add a short comment to make that clearer.

};
}

static String normalizeVersionForSort(String version) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function declared??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizeVersionForSort(String) is currently used by unit tests as a Java mirror of the JS index normalization logic, to ensure both implementations stay consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs code review needs general test This is general testing, meaning that there is no org specific issue to check for

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Release search version sorting is lexicographic instead of natural numeric order

2 participants