Skip to content

fix(cpp): Handle npos in PathToDirectory for S3 URIs without query strings#888

Open
shivendra-dev54 wants to merge 14 commits intoapache:mainfrom
shivendra-dev54:881-pathtodir-can-throw
Open

fix(cpp): Handle npos in PathToDirectory for S3 URIs without query strings#888
shivendra-dev54 wants to merge 14 commits intoapache:mainfrom
shivendra-dev54:881-pathtodir-can-throw

Conversation

@shivendra-dev54
Copy link
Contributor

Reason for this PR

Fixes #881

PathToDirectory was crashing with a std::out_of_range exception when parsing valid S3 URIs that do not contain a query string (?). This happens because find_last_of('?') returns std::string::npos, which was being passed directly into path.substr().

What changes are included in this PR?

Added an explicit check for std::string::npos in PathToDirectory. When an S3 URI without a query string is processed, it now correctly identifies the entire path as the prefix and sets the suffix to an empty string, safely avoiding the out-of-bounds crash.

Are these changes tested?

Yes, the C++ test suite was compiled and run locally, and all tests pass successfully.

Are there any user-facing changes?

No.

Critical Fix: Fixes a bug that causes a crash (std::out_of_range exception) when parsing valid S3 URIs without query strings.

@shivendra-dev54
Copy link
Contributor Author

@Sober7135
check the code and also the title of the PR

@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.67%. Comparing base (b21da29) to head (b5b723b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #888      +/-   ##
============================================
+ Coverage     80.60%   80.67%   +0.07%     
  Complexity      615      615              
============================================
  Files            94       94              
  Lines         10709    10709              
  Branches       1055     1055              
============================================
+ Hits           8632     8640       +8     
+ Misses         1837     1829       -8     
  Partials        240      240              
Flag Coverage Δ
cpp 71.05% <100.00%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Sober7135
Copy link
Contributor

Yes, the C++ test suite was compiled and run locally, and all tests pass successfully.

This is not quite accurate, since no test case was added for this bug.

Testing this bug is also somewhat complicated, because PathToDirectory is a static helper inside an anonymous namespace rather than a public API. In practice, we cannot test it directly in the current test suite.

More importantly, this bug is only exercised through GraphInfo::Load(const std::string& path): Load() first reads the graph metadata file, and then calls PathToDirectory(path). So to reproduce this bug in a test, we need to drive the full GraphInfo::Load code path rather than test the helper in isolation.

Result<std::shared_ptr<GraphInfo>> GraphInfo::Load(const std::string& path) {
std::string no_url_path;
GAR_ASSIGN_OR_RAISE(auto fs, FileSystemFromUriOrPath(path, &no_url_path));
GAR_ASSIGN_OR_RAISE(auto yaml_content,
fs->ReadFileToValue<std::string>(no_url_path));
GAR_ASSIGN_OR_RAISE(auto graph_meta, Yaml::Load(yaml_content));
std::string default_name = "graph";
std::string default_prefix = PathToDirectory(path);
no_url_path = PathToDirectory(no_url_path);
return ConstructGraphInfo(graph_meta, default_name, default_prefix, fs,
no_url_path);
}

I can think of two possible solutions:

  • Set up a mock S3 server so that we can exercise the full workflow and reliably trigger this bug.
  • Expose this function — for example, move it into a namespace like graphar::details / graphar::internal, and avoid installing that header into the user system.

Actually, I am not familiar with this kind of problem.

Any advice? CC @yangxk1

@shivendra-dev54
Copy link
Contributor Author

shivendra-dev54 commented Mar 3, 2026

@Sober7135
I have added test as you mentioned in the 2nd possible solution

here is what I did

  • moved the PathToDirectory function to namespace util to expose it for tests.
  • updated util.h with the function declaration.
  • added test as you suggested.
  • added it to the cpp/test/CMakeLists.txt.
  • checked with the new tests.

But I have added that function to graphar::util namespace
let me know if you want me to change that

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a crash (std::out_of_range exception) in PathToDirectory when parsing valid S3 URIs that don't contain a query string (?). The function previously passed std::string::npos directly to path.substr(), causing the exception. The fix adds an explicit check for npos and moves the function from a file-local static in graph_info.cc to the graphar::util namespace to improve reusability and testability.

Changes:

  • Added a npos guard in PathToDirectory so S3 URIs without ? are handled safely
  • Moved PathToDirectory from anonymous namespace in graph_info.cc to graphar::util, declared in util.h
  • Added a new test file test_util.cc covering S3 with/without query strings, local paths, and relative paths

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cpp/src/graphar/graph_info.cc Removed old static PathToDirectory, added new graphar::util::PathToDirectory with the npos fix, updated call sites
cpp/src/graphar/util.h Added declaration for PathToDirectory in graphar::util
cpp/test/test_util.cc New test file with 5 test cases for PathToDirectory
cpp/test/CMakeLists.txt Registered the new test target
cpp/CMakeLists.txt Trivial trailing newline addition

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

std::string PathToDirectory(const std::string& path) {
if (path.rfind("s3://", 0) == 0) {
size_t t = path.find_last_of('?');
// Your fix here
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover debug/placeholder comment that should be removed before merging. It doesn't describe the logic and reads like a TODO marker from development.

Copilot uses AI. Check for mistakes.
Comment on lines +1471 to +1491
namespace util {
std::string PathToDirectory(const std::string& path) {
if (path.rfind("s3://", 0) == 0) {
size_t t = path.find_last_of('?');
// Your fix here
std::string prefix = (t == std::string::npos) ? path : path.substr(0, t);
std::string suffix = (t == std::string::npos) ? "" : path.substr(t);

const size_t last_slash_idx = prefix.rfind('/');
if (std::string::npos != last_slash_idx) {
return prefix.substr(0, last_slash_idx + 1) + suffix;
}
} else {
const size_t last_slash_idx = path.rfind('/');
if (std::string::npos != last_slash_idx) {
return path.substr(0, last_slash_idx + 1);
}
}
return path;
}
} // namespace util
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PathToDirectory function definition is placed in graph_info.cc, but it is declared in util.h and the natural implementation file for graphar::util functions is util.cc (which already exists at cpp/src/graphar/util.cc and contains the other graphar::util function definitions). Moving the definition to util.cc would be consistent with the existing codebase organization and avoid mixing unrelated concerns in graph_info.cc.

Copilot uses AI. Check for mistakes.

namespace graphar {

TEST_CASE_METHOD(GlobalFixture, "PathUtilTest") {
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using GlobalFixture here means these pure string-manipulation tests will throw a std::runtime_error if the GAR_TEST_DATA environment variable is not set, even though these tests don't use any test data files at all. Consider using a plain TEST_CASE instead of TEST_CASE_METHOD(GlobalFixture, ...) so these unit tests can run without that environment dependency.

Copilot uses AI. Check for mistakes.
@shivendra-dev54
Copy link
Contributor Author

@yangxk1
I have incorporated all the changes mentioned by the copilot
But the python mac build is still failing just like the #906

@shivendra-dev54
Copy link
Contributor Author

@yangxk1
please do review this if it meets the issue expectation or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PathToDirectory can throw for valid S3 URIs without ?

5 participants