Skip to content

[Feature] Add per-gpu mode switch in CLI#9

Merged
ricky-chaoju merged 14 commits intoInfinirc:mainfrom
delwiv:feature/cli-per-gpu-mode
Mar 9, 2026
Merged

[Feature] Add per-gpu mode switch in CLI#9
ricky-chaoju merged 14 commits intoInfinirc:mainfrom
delwiv:feature/cli-per-gpu-mode

Conversation

@delwiv
Copy link
Copy Markdown
Contributor

@delwiv delwiv commented Mar 4, 2026

Hello, I've added per-gpu mode switch with CLI interface, as I was missing this feature.

We cannot go under 30% fan speed, but my fans are pretty loud, so I wanted to switch to auto mode when temps is low (< 35-45°), and switch to curve when hitting this temp.

It was done with AI help (Qwen3.5 27B and Opencode), I insisted on being clear, not pollute existing code or docs, this is not AI slope.

Included an /utils directory to include script and systemd unit

Add Makefile targets for utils, and an arg to install script for optional utils

The Chinese README was updated by Qwen, which I believe, isn't too bad at that, but I can't guarantee it is correct.

Here is an AI summary :

Overview

Adds per-GPU fan mode switching via CLI (nvfd <gpu_index> auto|curve|manual <speed>), enabling granular GPU fan control and temperature-aware automation.

Key Changes

  • Per-GPU CLI commands: Switch individual GPUs to auto/curve/manual modes
  • Immediate effect: Mode changes apply immediately and persist to config.json
  • Utility script: nvfd-fan-control.sh with hysteresis (threshold-up/down) to prevent mode thrashing
  • Makefile integration: install-utils/uninstall-utils targets
  • Documentation: Updated README with monitoring section and latency details (5-15s total)

Addresses Review Feedback

  1. ✅ Fixed fragile control flow (explicit keyword checks, no atoi() side effects)
  2. ✅ Curve mode now applies immediately (curve_apply_to_gpu() function)
  3. ✅ Removed redundant atoi() calls
  4. ✅ Fixed help table formatting
  5. ✅ Consistent error handling across all paths

Backward Compatible

All existing commands work unchanged. Tested on dual-GPU system (RTX 2080 Ti + RTX 3090).

Installation

sudo scripts/install.sh --with-utils  # Install with utilities
# or
sudo make install-utils               # Manual install

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the nvfd utility by introducing per-GPU fan mode switching capabilities via the command-line interface. This allows for granular control over individual NVIDIA GPU fan behavior, enabling users to select between driver-controlled 'auto' mode, a custom 'curve' mode, or a fixed 'manual' speed. The changes also include a practical example script that showcases how this new functionality can be used to implement an intelligent, temperature-aware fan control system, ultimately improving acoustic comfort during periods of low GPU activity.

Highlights

  • Enhanced CLI for Per-GPU Fan Control: New command-line interface options have been added to nvfd, allowing users to set individual GPUs to 'auto', 'curve', or 'manual' fan modes directly from the CLI.
  • Dynamic Fan Mode Switching Script Example: An example bash script (nvfd-fan-control.sh) is provided, demonstrating how to leverage the new CLI features to automatically switch GPU fan modes between 'auto' and 'curve' based on temperature thresholds, aiming for quieter operation during low GPU utilization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • README.md
    • Documented the new per-GPU mode switching feature in the main features list.
    • Expanded the CLI command reference to include nvfd <gpu_index> auto, nvfd <gpu_index> curve, and nvfd <gpu_index> manual <speed>.
    • Added usage examples for per-GPU fan mode control.
  • src/display.c
    • Modified display_help to include new CLI options for per-GPU fan mode control.
  • src/main.c
    • Introduced new command-line argument parsing for nvfd <gpu_index> auto, nvfd <gpu_index> curve, and nvfd <gpu_index> manual <speed>.
    • Added logic to set individual GPU fan modes (auto, curve, manual) and persist these settings via config_write_gpu.
    • Included validation for GPU index and fan speed values.
Activity
  • The pull request was opened by delwiv.
  • The author provided a detailed description, including an example bash script and systemd unit for automatic fan control, highlighting the practical application of the new feature.
  • No external review comments or activity have been recorded yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a useful feature for per-GPU fan mode control from the CLI. The implementation in src/main.c introduces new command-line argument parsing logic. While it works for the happy path, I've found some issues with error handling for invalid inputs, which could lead to confusing error messages for users. I've provided a detailed comment with a suggested refactoring to make the argument parsing more robust and maintainable. The documentation updates in README.md and src/display.c are clear.

@delwiv delwiv changed the title Add per-gpu mode switch in CLI [Feature] Add per-gpu mode switch in CLI Mar 4, 2026
@ricky-chaoju
Copy link
Copy Markdown
Contributor

Thanks for the contribution and the detailed explanation!

I'll review the PR and test it on my setup tomorrow morning and get back to you with feedback.

Copy link
Copy Markdown
Contributor

@ricky-chaoju ricky-chaoju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @delwiv, thanks for the contribution! The feature idea is great and useful. However there are some structural issues with the implementation that need to be addressed before merging. Please see inline comments below.

Summary of issues

  1. Fragile control flow — The new per-GPU logic is placed inside the fallthrough else block and relies on atoi() returning 0 for non-numeric strings like "auto"/"curve" to reach the new branches. This works by accident but is brittle and hard to follow.

  2. curve mode doesn't take effect immediatelyauto calls fan_reset_to_auto() but curve only writes config without applying the fan curve. If the daemon isn't running, the command has no immediate effect.

  3. Redundant atoi() call — In the argc == 4 branch, gpu_index = atoi(argv[1]) is called again even though it was already set at line 216.

  4. Help table formatting broken — The manual row is missing the | separator.

  5. Inconsistent error handling — Some error paths call gpu_shutdown(); return 1; while others just fall through.

Suggested approach

Instead of appending new branches after the speed validation, handle mode keywords inside the existing argc == 3 block, before treating argv[2] as a speed:

} else if (argc == 3) {
    gpu_index = atoi(argv[1]);
    if (gpu_index >= 0 && gpu_index < (int)device_count) {
        if (strcmp(argv[2], "auto") == 0) {
            // ... per-GPU auto
        } else if (strcmp(argv[2], "curve") == 0) {
            // ... per-GPU curve
        } else {
            // original fixed speed logic
            speed = atoi(argv[2]);
            // ...
        }
    }
} else if (argc == 4) {
    // nvfd <gpu_index> manual <speed>
    // ...
}

This makes the intent explicit and doesn't rely on atoi() side effects. Happy to discuss further!

Comment thread src/main.c Outdated
Comment thread src/main.c Outdated
Comment thread src/main.c Outdated
Comment thread src/display.c Outdated
@delwiv
Copy link
Copy Markdown
Contributor Author

delwiv commented Mar 6, 2026

@ricky-chaoju Thank you for the review, I'm going to check this later today

@delwiv
Copy link
Copy Markdown
Contributor Author

delwiv commented Mar 6, 2026

@ricky-chaoju It got quite big from where it started, I took the freedom to add a new root folder : utils, to store the script and systemd unit, which can be optionally installed.

The main.c code is much cleaner now, based on your review.

The original description were updated to reflect actual state of this PR.

Copy link
Copy Markdown
Contributor

@ricky-chaoju ricky-chaoju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all 5 issues from the previous review. The code is much cleaner now. A few remaining issues:

Issues

1. Silent failure when gpu_index is out of range (src/main.c)

In the argc == 3 block, if gpu_index is invalid (e.g. nvfd 99 auto), the code silently does nothing — no error message, no help output. The else branch for an invalid gpu_index is missing.

Please add an else branch that prints an error like "Invalid GPU index. Use 'nvfd list' to see available GPUs.".

2. Makefileuninstall unconditionally depends on uninstall-utils

uninstall: uninstall-utils

This means make uninstall always runs uninstall-utils, even if the user never installed utils. It's harmless now since it's just rm -f, but if uninstall-utils later adds systemctl disable or similar, it could cause issues. Consider removing the dependency and documenting make uninstall-utils as a separate step.

3. Minor: nvfd-fan-control.sh — no root check

The script writes to /var/run/nvfd-fan-control.lock which requires root, but there's no early root check. If run without root, the error message from exec 200>"$LOCKFILE" won't be very user-friendly.

The rest looks good. Nice work on the curve_apply_to_gpu() addition and the overall restructuring.

Comment thread src/main.c
@delwiv
Copy link
Copy Markdown
Contributor Author

delwiv commented Mar 9, 2026

@ricky-chaoju : I've processed your pertinent remarks, and have a few ones :

  • I see a small problem : we don't keep track of an utils installation, so a user (that would not have RTFM) could uninstall but keep utils unit that would then fail at least until next boot where the After=nvfd.service hook would never trigger.
  • The config.json (or a new utils.json) could track installed utils to automatically remove them when main uninstall is called. It makes sense to uninstall only utils, but none to uninstall main and keep utils.
  • This may be out of scope for this PR

@ricky-chaoju
Copy link
Copy Markdown
Contributor

Thanks for addressing all the previous review items — the code is much cleaner now.

A few remaining nits before we merge:

  1. Duplicate .PHONY in Makefile — There are two .PHONY lines (the new one and the original). Please merge them into one.

  2. nvfd-fan-control.service missing trailing newline — The file doesn't end with a newline character.

  3. argc == 4 error message is misleading — Running nvfd 99 manual 50 (invalid GPU index) prints "Invalid command: manual", but manual is valid — it's the GPU index that's wrong. Please split the checks:

    } else if (argc == 4) {
        int gpu_idx = atoi(argv[1]);
        if (strcmp(argv[2], "manual") != 0) {
            printf("Invalid command: %s\n", argv[2]);
            // ...
        } else if (gpu_idx < 0 || gpu_idx >= (int)device_count) {
            printf("Invalid GPU index. Use 'nvfd list' to see available GPUs.\n");
            // ...
        } else {
            speed = atoi(argv[3]);
            gpu_index = gpu_idx;
        }
    }
  4. README.zh-TW.md has simplified Chinese — "二进制文件" should be "二進位檔案".


Regarding the utils uninstall tracking issue you raised — good catch, I agree that uninstalling main while keeping utils makes no sense and would leave a broken service. Let's handle that in a follow-up issue to keep this PR focused.

@ricky-chaoju
Copy link
Copy Markdown
Contributor

LGTM! Thanks for the fix.

@ricky-chaoju ricky-chaoju merged commit 6905084 into Infinirc:main Mar 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants