Skip to content

Conversation

zane-neo
Copy link
Collaborator

Description

Move HttpClientFactory to common to expose to other components(skills)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

compileOnly group: 'com.networknt' , name: 'json-schema-validator', version: '1.4.0'
// Multi-tenant SDK Client
compileOnly "org.opensearch:opensearch-remote-metadata-sdk:${opensearch_build}"
compileOnly (group: 'software.amazon.awssdk', name: 'netty-nio-client', version: "2.30.18") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any impact to other plugins which depend on ml-commons? Like neural-search, flow-framework

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it's only skills.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbwiddis I know you wanted dependent plugins to get informed if in ml-commons we change any dependencies.

Copy link
Member

@dbwiddis dbwiddis Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dhrubo-os .

@zane-neo Highly recommend you use the version from the OpenSearch version catalog for any awssdk components. I believe 2.30.18 is CVE-impacted. (see https://mvnrepository.com/artifact/software.amazon.awssdk/netty-nio-client/2.30.18)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ylwu-amzn
Copy link
Collaborator

CI failed

@zane-neo
Copy link
Collaborator Author

CI failed

The CI failed since opensearch-remote-metadata-sdk used ThreadContextAccess which is not for public use, I've created this PR to fix: opensearch-project/opensearch-remote-metadata-sdk#254

Copy link

codecov bot commented Sep 25, 2025

Codecov Report

❌ Patch coverage is 78.12500% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.93%. Comparing base (7e04653) to head (0d1424d).

Files with missing lines Patch % Lines
...arch/ml/common/httpclient/MLHttpClientFactory.java 78.12% 1 Missing and 6 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4175      +/-   ##
============================================
- Coverage     81.94%   81.93%   -0.01%     
- Complexity     9197     9200       +3     
============================================
  Files           789      789              
  Lines         39625    39626       +1     
  Branches       4394     4400       +6     
============================================
- Hits          32469    32466       -3     
- Misses         5277     5279       +2     
- Partials       1879     1881       +2     
Flag Coverage Δ
ml-commons 81.93% <78.12%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2025 06:40 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 25, 2025 06:40 — with GitHub Actions Inactive
@zane-neo zane-neo force-pushed the fix-websearchtool-security-issue branch from 2dd3690 to 0d1424d Compare September 26, 2025 09:13
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 26, 2025 09:15 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 26, 2025 09:15 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 26, 2025 09:15 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 26, 2025 09:15 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 26, 2025 10:17 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env September 26, 2025 10:17 — with GitHub Actions Inactive
dhrubo-os
dhrubo-os previously approved these changes Sep 26, 2025
@dhrubo-os dhrubo-os dismissed their stale review September 26, 2025 17:37

Let's make the changes what was suggested.

Copy link
Collaborator

@dhrubo-os dhrubo-os left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specific Code Review - IP Validation Logic

I've reviewed the IP validation changes and have specific concerns about the implementation:

Performance Concern - Lines 105-107

private static boolean eqCheckValue(byte input, int targetValue) {
    int original = input & 0xff;
    return original == targetValue || parseWithRadix(original, 8) == targetValue || parseWithRadix(original, 16) == targetValue;
}

Issue: For every byte comparison, this method attempts 3 different parsing operations. Since IP addresses are typically decimal, the octal and hex parsing will frequently fail and fall back to exception handling in parseWithRadix().

Unnecessary Exception Handling - Lines 121-127

private static int parseWithRadix(int input, int radix) {
    try {
        return Integer.parseInt(String.valueOf(input), radix);
    } catch (NumberFormatException e) {
        return input;
    }
}

Issue: This method will throw NumberFormatException for common IP octets like 192, 168, 127 when parsed as octal (since these contain digits 8,9 which are invalid in octal). Using exceptions for normal control flow is an anti-pattern.

Complexity Without Clear Benefit - Lines 109-119

private static boolean rangeCheckValue(byte input) {
    int original = input & 0xff;
    if (original >= 16 && original <= 31) {
        return true;
    } else {
        int octalValue = parseWithRadix(original, 8);
        if (octalValue >= 16 && octalValue <= 31) {
            return true;
        } else {
            int hexValue = parseWithRadix(original, 16);
            return hexValue >= 16 && hexValue <= 31;
        }
    }
}

Issue: The nested if-else with multiple radix parsing attempts makes the code harder to understand and debug. Standard IP addresses use decimal notation, so the octal/hex parsing adds complexity without clear benefit.

Recommendation

Consider simplifying the IP validation to use straightforward decimal comparisons since IP addresses are standardized as decimal:

private static boolean isPrivateIPv4(byte[] bytes) {
    int first = bytes[0] & 0xff;
    int second = bytes[1] & 0xff;
    
    // 127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, 169.254.x.x
    return (first == 127 && second == 0) ||
           (first == 10) ||
           (first == 172 && second >= 16 && second <= 31) ||
           (first == 192 && second == 168) ||
           (first == 169 && second == 254);
}

This would be more performant and easier to maintain while achieving the same security goals.

int original = input & 0xff;
return original == targetValue || parseWithRadix(original, 8) == targetValue || parseWithRadix(original, 16) == targetValue;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: This method attempts octal and hex parsing for every byte comparison, which will frequently throw NumberFormatException for common IP octets (192, 168, 127, etc.) since they contain invalid octal digits. Consider using simple decimal comparison since IP addresses are standardized as decimal.

private static int parseWithRadix(int input, int radix) {
try {
return Integer.parseInt(input, 8);
return Integer.parseInt(String.valueOf(input), radix);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anti-pattern: Using exceptions for normal control flow. For common IP octets like 192, 168, parsing as octal will always throw NumberFormatException. This impacts performance and makes debugging harder.

} else {
int hexValue = parseWithRadix(original, 16);
return hexValue >= 16 && hexValue <= 31;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complexity concern: The nested if-else with multiple radix parsing attempts makes this harder to understand and maintain. Since IP addresses use decimal notation, the octal/hex parsing adds unnecessary complexity.

MLHttpClientFactory.validate(HTTP, "177.0.0.2", 80, PRIVATE_IP_DISABLED);
MLHttpClientFactory.validate(HTTP, "::ffff", 80, PRIVATE_IP_DISABLED);
MLHttpClientFactory.validate(HTTP, "172.32.0.1", 80, PRIVATE_IP_ENABLED);
MLHttpClientFactory.validate(HTTP, "172.2097152", 80, PRIVATE_IP_ENABLED);
Copy link
Contributor

@brianf-aws brianf-aws Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 172.2097152 a spelling mistake if so how come it didn't throw an exception? We may need to tune the logic to catch these edge cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants