Skip to content

feat!: Improve command line argument handling #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Jan 15, 2025

Description

spider_worker and spider_scheduler do not print help message when --help is required or no argument is provided. They also throw uncaught messages when value is missing or unknown argument is provided. Improve the command line argument handling to resolve #51.

Validation performed

  • GitHub workflow pass
  • Unit tests pass in dev container
  • Integration tests pass in dev container
  • spider_worker and spider_scheduler print help message when --help is provided
  • spider_worker and spider_scheduler print help message when no argument is provided
  • spider_worker and spider_scheduler print meaningful error message when argument or value missing
  • spider_worker and spider_scheduler print meaningful error message when unknown argument is provided

Summary by CodeRabbit

  • New Features

    • Enhanced argument parsing for scheduler and worker components with clearer usage instructions and help messages.
  • Bug Fixes

    • Improved error handling and messaging for missing required parameters.
    • Standardized command-line argument naming from --storage_url to --storage-url across documentation and tests.
  • Refactor

    • Updated function signatures to accept parameters directly and return success status.
    • Replaced hardcoded command-line option strings with centralized constants.
    • Streamlined and consolidated argument parsing logic across components.
    • Added consistent inclusion of program options and formatting utilities for command-line handling.

Copy link
Contributor

coderabbitai bot commented Jan 15, 2025

Walkthrough

The pull request refactors argument parsing in spider_scheduler and spider_worker to use output parameters and boolean success indicators instead of returning a variables map. It centralizes error handling, adds required argument checks, and prints usage/help messages on errors or --help flag. It also introduces constants for CLI options and updates related documentation and tests for consistent argument naming.

Changes

File Change Summary
src/spider/scheduler/scheduler.cpp Refactored parse_args to output parameters (host, port, storage_url) with boolean return; added error handling and usage display; added <iostream>; removed unused include
src/spider/worker/worker.cpp Refactored parse_args to output parameters (host, storage_url, libs) with boolean return; enhanced error handling and usage display; added <iostream>; removed unused include
src/spider/utils/ProgramOptions.hpp Added new header defining CLI option and help message constants for scheduler, worker, and task executor
src/spider/worker/task_executor.cpp Replaced hardcoded CLI option strings with constants from ProgramOptions.hpp
src/spider/worker/TaskExecutor.hpp Updated constructors to use formatted CLI option strings with constants instead of hardcoded literals; added includes for fmt/format.h and ProgramOptions.hpp
tests/client/client-test.cpp Replaced hardcoded CLI option strings with constants for parsing and error messages
docs/src/dev-docs/testing.md Changed parameter name from storage_url to storage-url for consistency
docs/src/user-docs/guides-quick-start.md Updated command-line argument name from --storage_url to --storage-url in usage examples
tests/integration/test_client.py Changed subprocess command argument from --storage_url to --storage-url
tests/integration/test_scheduler_worker.py Updated scheduler and worker subprocess command argument from --storage_url to --storage-url
tests/integration/test_signal.py Changed subprocess command argument from --storage_url to --storage-url
src/spider/CMakeLists.txt Added utils/ProgramOptions.hpp to source lists of scheduler, worker, and task executor components; reordered StopFlag.hpp inclusion

Assessment against linked issues

Objective Addressed Explanation
Print CLI usage info [#51]
Handle --help flag [#51]
Catch parsing exceptions [#51]
Provide clear error messages [#51]

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes detected related to the objectives from the linked issues.

Suggested reviewers

  • davidlion
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/spider/scheduler/scheduler.cpp (1)

41-44: Consider using std::format for usage message.

The usage message could benefit from using std::format for better maintainability, consistent with the project's use of fmt library elsewhere.

-char const* const cUsage
-        = "Usage: spider_scheduler --host <host> --port <port> --storage_url <url>";
+char const* const cUsage = std::format(
+        "Usage: {} --host <host> --port <port> --storage_url <url>",
+        "spider_scheduler");
src/spider/worker/worker.cpp (1)

53-55: Consider using std::format for usage message.

The usage message could benefit from using std::format for better maintainability, consistent with the project's use of fmt library elsewhere.

-char const* const cUsage = "Usage: spider_worker --host <host> --storage_url <storage_url> --libs <libs>";
+char const* const cUsage = std::format(
+        "Usage: {} --host <host> --storage_url <storage_url> --libs <libs>",
+        "spider_worker");
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between becbf55 and 3c5e18b.

📒 Files selected for processing (2)
  • src/spider/scheduler/scheduler.cpp (3 hunks)
  • src/spider/worker/worker.cpp (3 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/worker/worker.cpp

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: lint
🔇 Additional comments (6)
src/spider/scheduler/scheduler.cpp (3)

5-5: LGTM!

The addition of the <iostream> header is necessary for the new error output functionality.


172-172: LGTM!

The main function correctly uses the new parse_args signature.


45-96: Verify error handling for edge cases.

The argument parsing logic looks good, but let's verify the handling of edge cases:

  1. When only --help is provided
  2. When invalid port numbers are supplied
✅ Verification successful

Edge cases are properly handled in the argument parsing implementation

The code demonstrates robust handling of all key edge cases:

  • Help flag (--help) displays usage information and available options
  • Port validation is enforced through the unsigned short type
  • Missing or invalid arguments show clear error messages with usage instructions
  • Distinct error codes for different failure scenarios
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for test cases covering argument parsing edge cases
rg -l "TestSchedulerArgParse|test.*parse.*args" -g "test_*.cpp"

Length of output: 214


Script:

#!/bin/bash
# Find test files
fd -e cpp -e hpp test

# Search for help flag handling
rg -A 5 "\"help\".*program_options"

# Search for port validation or error handling
rg -A 10 "port.*program_options::value"

Length of output: 549


Script:

#!/bin/bash
# Check scheduler test files content
cat tests/scheduler/test-SchedulerServer.cpp
cat tests/scheduler/test-SchedulerPolicy.cpp

# Look for main function implementation
rg -l "int\s+main\s*\(" --type cpp

Length of output: 11927


Script:

#!/bin/bash
# Find and check the main scheduler program file
fd -e cpp scheduler$ src/
cat src/spider/scheduler/scheduler.cpp

Length of output: 8522

src/spider/worker/worker.cpp (3)

8-8: LGTM!

The addition of the <iostream> header is necessary for the new error output functionality.


360-360: LGTM!

The main function correctly uses the new parse_args signature.


56-100: Verify handling of empty --libs argument.

The argument parsing logic looks good, but let's verify the handling of edge cases:

  1. When only --help is provided
  2. When --libs is provided without any values
🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/spider/worker/worker.cpp (4)

54-55: Enhance the usage message with more descriptive information.

The current usage message is concise but could be more helpful. Consider adding brief descriptions of what each argument expects.

-char const* const cUsage
-        = "Usage: spider_worker --host <host> --storage_url <storage_url> --libs <libs>";
+char const* const cUsage
+        = "Usage: spider_worker --host <host_address> --storage_url <mysql_url> --libs <path_to_libraries>\n"
+          "Example: spider_worker --host localhost:8080 --storage_url mysql://user:pass@localhost/spider --libs /path/to/lib1.so,/path/to/lib2.so";

103-108: Standardize error message format.

Error messages should follow a consistent format and provide clear guidance.

-        std::cerr << "spider_worker: " << e.what() << "\n";
-        std::cerr << cUsage << "\n";
-        std::cerr << "Try 'spider_worker --help' for more information.\n";
+        std::cerr << "Error: " << e.what() << "\n"
+                  << cUsage << "\n"
+                  << "For more information, run: spider_worker --help\n";

368-368: Maintain consistency in variable naming.

The variable worker_addr is used for the host parameter, which could be confusing. Consider using consistent naming throughout the codebase.

-    if (!parse_args(argc, argv, worker_addr, storage_url, libs)) {
+    if (!parse_args(argc, argv, host_addr, storage_url, libs)) {

Also update the variable declaration above:

-    std::string worker_addr;
+    std::string host_addr;

57-108: Consider implementing structured error handling.

The current error handling approach mixes concerns between argument parsing, validation, and error reporting. Consider implementing a dedicated error handling structure.

Suggestions:

  1. Create a dedicated ArgumentParser class to encapsulate parsing logic
  2. Define an ArgumentError class hierarchy for different types of errors
  3. Implement a separate error formatter for consistent error message formatting

This would improve maintainability and make it easier to add new argument validation rules in the future.

🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c5e18b and ff98aaf.

📒 Files selected for processing (1)
  • src/spider/worker/worker.cpp (3 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/worker/worker.cpp

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: lint

// clang-format off
desc.add_options()
("help", "spider scheduler")
(
"host",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace these argument names with constants since we're using them in multiple places.

Comment on lines 81 to 82
if (!variables.contains("host") && !variables.contains("port")
&& !variables.contains("storage_url"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Boost's program-options already validates the presence of these arguments since we marked them required.
  • We should validate that the user doesn't pass in an empty value for host and storage-url. If they do, we should print a specific error message for each one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Boost's program-options validation is not flexible enough. For example, if user does not any argument or only provide --help, Boost cannot correctly print the help message but treat it as arguments not provided and throw exceptions.
  • I'll add more validation for arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments as on the other file.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff98aaf and df31fe2.

📒 Files selected for processing (11)
  • docs/src/dev-docs/testing.md (1 hunks)
  • docs/src/user-docs/guides-quick-start.md (4 hunks)
  • src/spider/CMakeLists.txt (3 hunks)
  • src/spider/scheduler/scheduler.cpp (4 hunks)
  • src/spider/utils/ProgramOptions.hpp (1 hunks)
  • src/spider/worker/TaskExecutor.hpp (3 hunks)
  • src/spider/worker/task_executor.cpp (3 hunks)
  • src/spider/worker/worker.cpp (4 hunks)
  • tests/client/client-test.cpp (2 hunks)
  • tests/integration/test_client.py (2 hunks)
  • tests/integration/test_scheduler_worker.py (1 hunks)
✅ Files skipped from review due to trivial changes (4)
  • tests/integration/test_scheduler_worker.py
  • tests/integration/test_client.py
  • docs/src/dev-docs/testing.md
  • docs/src/user-docs/guides-quick-start.md
🧰 Additional context used
🪛 Cppcheck (2.10-2)
src/spider/worker/task_executor.cpp

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

src/spider/scheduler/scheduler.cpp

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

src/spider/worker/worker.cpp

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 116-116: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: lint
🔇 Additional comments (7)
tests/client/client-test.cpp (1)

23-27: LGTM!

The changes correctly replace string literals with constants from ProgramOptions.hpp, improving maintainability and consistency.

Also applies to: 57-61

src/spider/worker/TaskExecutor.hpp (1)

56-63: LGTM!

The changes correctly use fmt::format for constructing command line arguments and consistently use the new constants from ProgramOptions.hpp.

Also applies to: 104-111

src/spider/worker/task_executor.cpp (1)

38-58: LGTM!

The changes correctly replace string literals with constants from ProgramOptions.hpp, improving maintainability and consistency. The error handling remains robust.

Also applies to: 94-110

src/spider/scheduler/scheduler.cpp (1)

43-107: LGTM! The command-line argument handling has been improved.

The changes align with the PR objectives by:

  • Using constants from ProgramOptions.hpp for consistent naming.
  • Displaying help message when no arguments are provided.
  • Providing meaningful error messages for missing or invalid arguments.
  • Validating required arguments and empty values.
🧰 Tools
🪛 Cppcheck (2.10-2)

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

src/spider/worker/worker.cpp (2)

56-125: LGTM! The command-line argument handling has been improved.

The changes align with the PR objectives by:

  • Using constants from ProgramOptions.hpp for consistent naming.
  • Displaying help message when no arguments are provided.
  • Providing meaningful error messages for missing or invalid arguments.
  • Validating required arguments and empty values.
🧰 Tools
🪛 Cppcheck (2.10-2)

[performance] 97-97: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


[performance] 116-116: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)


92-95: ⚠️ Potential issue

Fix the argument presence check logic.

The current condition is incorrect as it checks if all arguments are missing. It should check if any required argument is missing or if help is requested.

Apply this diff to fix the logic:

-        if (!variables.contains(spider::core::cHostOption.data())
-            && !variables.contains(spider::core::cStorageUrlOption.data())
-            && !variables.contains(spider::core::cLibsOption.data()))
+        if (variables.count(spider::core::cHelpOption.data())) {
+            std::cout << spider::core::cWorkerUsage << "\n";
+            std::cout << desc << "\n";
+            return false;
+        }
+
+        if (!variables.contains(spider::core::cHostOption.data())
+            || !variables.contains(spider::core::cStorageUrlOption.data())
+            || !variables.contains(spider::core::cLibsOption.data()))

Likely invalid or redundant comment.

src/spider/CMakeLists.txt (1)

57-57: LGTM! The CMake changes support the improved command-line argument handling.

The changes correctly add ProgramOptions.hpp to the required source sets, ensuring that the command-line argument handling improvements are properly integrated into the build system.

Also applies to: 69-69, 116-118

Comment on lines +14 to +15
constexpr std::string_view cWorkerUsage
= {"Usage: spider_worker --host <host> --storage-url <storage_url> --libs <libs>"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent argument naming in worker usage message.

The usage message shows storage_url with an underscore, but the actual option is defined with a hyphen as storage-url. This inconsistency could confuse users.

Apply this diff to fix the inconsistency:

-        = {"Usage: spider_worker --host <host> --storage-url <storage_url> --libs <libs>"};
+        = {"Usage: spider_worker --host <host> --storage-url <storage-url> --libs <libs>"};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constexpr std::string_view cWorkerUsage
= {"Usage: spider_worker --host <host> --storage-url <storage_url> --libs <libs>"};
constexpr std::string_view cWorkerUsage
= {"Usage: spider_worker --host <host> --storage-url <storage-url> --libs <libs>"};

Comment on lines +20 to +22
constexpr std::string_view cTaskExecutorUsage
= {"Usage: spider_task_executor --func <function> --task-id <task_id> --storage-url "
"<storage_url> --libs <libs>"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent argument naming in task executor usage message.

The option task-id is defined with a hyphen, but task_id appears with an underscore in the usage message.

Apply this diff to fix the inconsistency:

-        = {"Usage: spider_task_executor --func <function> --task-id <task_id> --storage-url "
+        = {"Usage: spider_task_executor --func <function> --task-id <task-id> --storage-url "
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constexpr std::string_view cTaskExecutorUsage
= {"Usage: spider_task_executor --func <function> --task-id <task_id> --storage-url "
"<storage_url> --libs <libs>"};
constexpr std::string_view cTaskExecutorUsage
= {"Usage: spider_task_executor --func <function> --task-id <task-id> --storage-url "
"<storage_url> --libs <libs>"};

Comment on lines 79 to 82
if (false == variables.contains(spider::core::cHostOption.data())
&& false == variables.contains(spider::core::cPortOption.data())
&& false == variables.contains(spider::core::cStorageUrlOption.data()))
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the argument presence check logic.

The current condition is incorrect as it checks if all arguments are missing. It should check if any required argument is missing or if help is requested.

Apply this diff to fix the logic:

-        if (false == variables.contains(spider::core::cHostOption.data())
-            && false == variables.contains(spider::core::cPortOption.data())
-            && false == variables.contains(spider::core::cStorageUrlOption.data()))
+        if (variables.count(spider::core::cHelpOption.data())) {
+            std::cout << spider::core::cSchedulerUsage << "\n";
+            std::cout << desc << "\n";
+            return false;
+        }
+
+        if (false == variables.contains(spider::core::cHostOption.data())
+            || false == variables.contains(spider::core::cPortOption.data())
+            || false == variables.contains(spider::core::cStorageUrlOption.data()))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (false == variables.contains(spider::core::cHostOption.data())
&& false == variables.contains(spider::core::cPortOption.data())
&& false == variables.contains(spider::core::cStorageUrlOption.data()))
{
if (variables.count(spider::core::cHelpOption.data())) {
std::cout << spider::core::cSchedulerUsage << "\n";
std::cout << desc << "\n";
return false;
}
if (false == variables.contains(spider::core::cHostOption.data())
|| false == variables.contains(spider::core::cPortOption.data())
|| false == variables.contains(spider::core::cStorageUrlOption.data()))
{

@sitaowang1998 sitaowang1998 changed the title feat: Improve command line argument handling feat!: Improve command line argument handling Feb 4, 2025
@sitaowang1998 sitaowang1998 requested a review from a team as a code owner April 14, 2025 15:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/spider/scheduler/scheduler.cpp (1)

93-100: ⚠️ Potential issue

Fix the argument presence check logic.

The current condition is incorrect as it checks if all arguments are missing. It should check if any required argument is missing or if help is requested.

Apply this diff to fix the logic:

-        if (false == variables.contains(std::string(spider::core::cHostOption))
-            && false == variables.contains(std::string(spider::core::cPortOption))
-            && false == variables.contains(std::string(spider::core::cStorageUrlOption)))
+        if (variables.count(spider::core::cHelpOption.data())) {
+            std::cout << spider::core::cSchedulerUsage << "\n";
+            std::cout << desc << "\n";
+            return false;
+        }
+
+        if (false == variables.contains(std::string(spider::core::cHostOption))
+            || false == variables.contains(std::string(spider::core::cPortOption))
+            || false == variables.contains(std::string(spider::core::cStorageUrlOption)))
src/spider/worker/worker.cpp (1)

117-124: ⚠️ Potential issue

Fix the argument validation logic and add proper help handling.

The current condition has several issues:

  1. It doesn't handle the --help flag properly
  2. The && operators check if ALL arguments are missing, but this should show usage when ANY required argument is missing OR when help is requested
  3. Missing explicit help flag detection

Apply this diff to fix the validation logic:

-        if (!variables.contains(std::string(spider::core::cHostOption))
-            && !variables.contains(std::string(spider::core::cStorageUrlOption))
-            && !variables.contains(std::string(spider::core::cLibsOption)))
-        {
+        if (variables.contains("help")) {
             std::cout << spider::core::cWorkerUsage << "\n";
             std::cout << desc << "\n";
             return false;
+        }
+
+        if (argc == 1) {
+            std::cout << spider::core::cWorkerUsage << "\n";
+            std::cout << desc << "\n";
+            return false;
         }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08a3d5d and 2788feb.

📒 Files selected for processing (10)
  • docs/src/user-docs/guides-quick-start.md (4 hunks)
  • src/spider/CMakeLists.txt (3 hunks)
  • src/spider/scheduler/scheduler.cpp (4 hunks)
  • src/spider/worker/TaskExecutor.hpp (3 hunks)
  • src/spider/worker/task_executor.cpp (3 hunks)
  • src/spider/worker/worker.cpp (4 hunks)
  • tests/client/client-test.cpp (2 hunks)
  • tests/integration/test_client.py (2 hunks)
  • tests/integration/test_scheduler_worker.py (1 hunks)
  • tests/integration/test_signal.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • docs/src/user-docs/guides-quick-start.md
  • src/spider/CMakeLists.txt
  • src/spider/worker/TaskExecutor.hpp
  • src/spider/worker/task_executor.cpp
  • tests/client/client-test.cpp
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/spider/scheduler/scheduler.cpp (1)
src/spider/worker/worker.cpp (2)
  • parse_args (81-150)
  • parse_args (81-87)
🪛 Pylint (3.3.7)
tests/integration/test_signal.py

[refactor] 39-39: Consider using 'with' for resource-allocating operations

(R1732)

tests/integration/test_client.py

[refactor] 30-30: Consider using 'with' for resource-allocating operations

(R1732)

tests/integration/test_scheduler_worker.py

[refactor] 45-45: Consider using 'with' for resource-allocating operations

(R1732)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: lint
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
🔇 Additional comments (13)
tests/integration/test_signal.py (1)

36-36: LGTM! Argument standardization aligns with PR objectives.

The change from --storage_url to --storage-url correctly implements the hyphenated naming convention as part of the command line argument standardization effort.

Also applies to: 44-44

tests/integration/test_scheduler_worker.py (1)

42-42: LGTM! Consistent argument standardization.

The changes from --storage_url to --storage-url maintain consistency with the command line argument naming standardization across the codebase.

Also applies to: 50-50

tests/integration/test_client.py (1)

27-27: LGTM! Complete argument standardization in client tests.

All instances of --storage_url have been correctly updated to --storage-url, maintaining consistency with the hyphenated naming convention.

Also applies to: 35-35, 64-64

src/spider/scheduler/scheduler.cpp (4)

57-63: LGTM! Excellent refactoring of function signature.

The change from returning boost::program_options::variables_map to using output parameters and returning a boolean success indicator improves the API design and makes error handling clearer.


66-82: LGTM! Good use of centralized constants.

Using constants from spider::core namespace (like cHelpOption, cHostOption, etc.) instead of hardcoded strings improves maintainability and consistency across the codebase.


104-112: LGTM! Excellent validation for empty arguments.

The explicit validation for empty host and storage_url with specific error messages provides better user feedback than relying solely on Boost.Program_options validation.


115-120: LGTM! Robust error handling implementation.

The try-catch block properly handles boost::program_options::error exceptions and provides helpful error messages along with usage instructions.

src/spider/worker/worker.cpp (6)

9-9: LGTM!

Adding the <iostream> header is necessary for the console output functionality used in the refactored argument parsing.


48-48: LGTM!

Including the centralized ProgramOptions.hpp header aligns with the effort to standardize command-line option definitions across the project.


81-87: LGTM!

The refactored function signature using output parameters and boolean return value is a good design choice that makes success/failure more explicit and eliminates the need to return complex data structures.


89-107: LGTM!

Using centralized constants from spider::core for option names and messages improves maintainability and consistency across the codebase.


144-149: LGTM!

The exception handling properly catches boost::program_options errors and provides helpful feedback to users with appropriate usage messages.


473-473: LGTM!

The main function call has been correctly updated to match the new parse_args signature with output parameters.

return false;
}

boost::program_options::notify(variables);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Move notify() call before validation checks.

The notify() call should happen before the validation checks to ensure that required arguments are properly validated by boost::program_options.

+        boost::program_options::notify(variables);
+
         if (!variables.contains(std::string(spider::core::cHostOption))
             && !variables.contains(std::string(spider::core::cStorageUrlOption))
             && !variables.contains(std::string(spider::core::cLibsOption)))
         {
             std::cout << spider::core::cWorkerUsage << "\n";
             std::cout << desc << "\n";
             return false;
         }
-
-        boost::program_options::notify(variables);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
boost::program_options::notify(variables);
// Ensure all semantic actions and required‐option checks run before manual validation
boost::program_options::notify(variables);
if (!variables.contains(std::string(spider::core::cHostOption))
&& !variables.contains(std::string(spider::core::cStorageUrlOption))
&& !variables.contains(std::string(spider::core::cLibsOption)))
{
std::cout << spider::core::cWorkerUsage << "\n";
std::cout << desc << "\n";
return false;
}
🤖 Prompt for AI Agents
In src/spider/worker/worker.cpp at line 126, move the call to
boost::program_options::notify(variables) to occur before any validation checks
on the program options. This ensures that required arguments are properly
processed and validated by boost::program_options before custom validation logic
runs.

Comment on lines +128 to +141
if (host.empty()) {
std::cerr << spider::core::cHostEmptyMessage << "\n";
return false;
}

if (storage_url.empty()) {
std::cerr << spider::core::cStorageUrlEmptyMessage << "\n";
return false;
}

if (libs.empty()) {
std::cerr << spider::core::cLibsEmptyMessage << "\n";
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove redundant validation checks.

Since the options are marked as required() in the option descriptions, boost::program_options will automatically validate their presence when notify() is called. The manual empty checks are redundant and the notify() call will throw an exception for missing required arguments, which is already handled in the catch block.

-        if (host.empty()) {
-            std::cerr << spider::core::cHostEmptyMessage << "\n";
-            return false;
-        }
-
-        if (storage_url.empty()) {
-            std::cerr << spider::core::cStorageUrlEmptyMessage << "\n";
-            return false;
-        }
-
-        if (libs.empty()) {
-            std::cerr << spider::core::cLibsEmptyMessage << "\n";
-            return false;
-        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (host.empty()) {
std::cerr << spider::core::cHostEmptyMessage << "\n";
return false;
}
if (storage_url.empty()) {
std::cerr << spider::core::cStorageUrlEmptyMessage << "\n";
return false;
}
if (libs.empty()) {
std::cerr << spider::core::cLibsEmptyMessage << "\n";
return false;
}
🤖 Prompt for AI Agents
In src/spider/worker/worker.cpp around lines 128 to 141, remove the manual empty
string checks for host, storage_url, and libs since these options are already
marked as required() in the boost::program_options setup. The notify() call will
automatically validate their presence and throw exceptions if missing, which are
handled in the existing catch block. This eliminates redundant validation code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spider_scheduler and spider_worker don't print CLI usage info
2 participants