-
Notifications
You must be signed in to change notification settings - Fork 21
SFollow #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
SFollow #567
Changes from all commits
d79c5d6
296bd0a
8ec4c24
e439b0f
37f50c4
774c7d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| # FRE SFollow Module | ||
|
|
||
| The FRE SFollow module provides functionality to monitor SLURM job output in real-time. It queries job information using `scontrol` and follows the standard output file using `less +F`. | ||
|
|
||
| ## Features | ||
|
|
||
| - Query SLURM job information using job ID | ||
| - Extract standard output file path from job information | ||
| - Follow output files in real-time using `less +F` | ||
| - Validation mode to check job status without following | ||
| - Comprehensive error handling and user feedback | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Command Line Interface | ||
|
|
||
| The sfollow module integrates with the FRE CLI framework and can be used as follows: | ||
|
|
||
| ```bash | ||
| # Follow a job's output in real-time | ||
| fre sfollow 12345 | ||
|
|
||
| # Validate a job exists and has output without following | ||
| fre sfollow 12345 --validate | ||
|
|
||
| # Short form of validate flag | ||
| fre sfollow 12345 -v | ||
| ``` | ||
|
|
||
| ### Examples | ||
|
|
||
| ```bash | ||
| # Monitor a running job | ||
| fre sfollow 135549171 | ||
|
|
||
| # Check if a job has output available | ||
| fre sfollow 135549171 --validate | ||
| ``` | ||
|
|
||
| ## Module Structure | ||
|
|
||
| ``` | ||
| fre/sfollow/ | ||
| ├── __init__.py # Module initialization | ||
| ├── sfollow.py # Core functionality | ||
| ├── fresfollow.py # Click CLI interface | ||
| ├── README.md # This file | ||
| └── tests/ # Unit tests | ||
| ├── __init__.py | ||
| ├── test_sfollow.py # Tests for core functionality | ||
| └── test_fresfollow.py # Tests for CLI interface | ||
| ``` | ||
|
|
||
| ## Core Functions | ||
|
|
||
| ### `get_job_info(job_id: str) -> str` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is bordering more on content worthy of the |
||
| Retrieves job information from SLURM using the `scontrol show jobid` command. | ||
|
|
||
| ### `parse_stdout_path(scontrol_output: str) -> Optional[str]` | ||
| Parses the standard output file path from scontrol output by looking for lines containing 'StdOut='. | ||
|
|
||
| ### `follow_output_file(file_path: str) -> None` | ||
| Follows the output file using `less +F` command for real-time monitoring. | ||
|
|
||
| ### `follow_job_output(job_id: str) -> Tuple[bool, str]` | ||
| Main function that combines all functionality to follow a SLURM job's output. | ||
|
|
||
| ## Dependencies | ||
|
|
||
| - `subprocess` - For running system commands | ||
| - `os` - For file system operations | ||
| - `click` - For command line interface | ||
| - `typing` - For type hints | ||
|
|
||
| ## Requirements | ||
|
|
||
| - SLURM workload manager with `scontrol` command available | ||
| - `less` command available for file following | ||
| - Python 3.6+ with typing support | ||
|
|
||
| ## Error Handling | ||
|
|
||
| The module includes comprehensive error handling for: | ||
|
|
||
| - SLURM command failures (job not found, permission issues) | ||
| - Missing commands (`scontrol`, `less`) | ||
| - File system errors (output file not found, permission issues) | ||
| - User interrupts (Ctrl+C) | ||
|
|
||
| ## Testing | ||
|
|
||
| Run the unit tests using: | ||
|
|
||
| ```bash | ||
| # Run all tests in the sfollow module | ||
| python -m pytest fre/sfollow/tests/ | ||
|
|
||
| # Run specific test files | ||
| python -m pytest fre/sfollow/tests/test_sfollow.py | ||
| python -m pytest fre/sfollow/tests/test_fresfollow.py | ||
|
|
||
| # Run with coverage | ||
| python -m pytest fre/sfollow/tests/ --cov=fre.sfollow | ||
| ``` | ||
|
|
||
| ## Development | ||
|
|
||
| When adding new functionality: | ||
|
|
||
| 1. Update the core functions in `sfollow.py` | ||
| 2. Update the CLI interface in `fresfollow.py` if needed | ||
| 3. Add comprehensive unit tests | ||
| 4. Update this README with new features | ||
| 5. Ensure all tests pass | ||
|
|
||
| ## Integration with FRE-CLI | ||
|
|
||
| The sfollow module is integrated into the main FRE-CLI framework through: | ||
|
|
||
| 1. Entry in `fre/fre.py` lazy_subcommands dictionary | ||
| 2. CLI command defined in `fresfollow.py` | ||
| 3. Module structure following FRE-CLI conventions | ||
|
|
||
| ## Author | ||
|
|
||
| Tom Robinson | ||
| NOAA | GFDL | ||
| 2025 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # SFollow module initialization |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. generally prefer |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| """ | ||
| FRE SFollow CLI - Click command line interface for following SLURM job output | ||
|
|
||
| This module provides the Click CLI interface for the sfollow functionality. | ||
|
|
||
| authored by Tom Robinson | ||
| NOAA | GFDL | ||
| 2025 | ||
| """ | ||
|
|
||
| import sys | ||
| import click | ||
| from .sfollow import follow_job_output | ||
|
|
||
|
|
||
| @click.command(help=click.style(" - follow SLURM job output in real-time", fg=(34, 139, 34))) | ||
| @click.argument('job_id', type=str, required=True) | ||
| @click.option('--validate', '-v', is_flag=True, | ||
| help='Validate job exists and has output file without following') | ||
| def sfollow_cli(job_id, validate): | ||
| """ | ||
| Follow the standard output of a SLURM job in real-time. | ||
|
|
||
| This command queries the SLURM scheduler for job information, | ||
| extracts the standard output file path, and follows it using | ||
| 'less +F' for real-time monitoring. | ||
|
|
||
| :param job_id: The SLURM job ID to follow | ||
| :type job_id: str | ||
| :param validate: If True, only validate the job without following | ||
| :type validate: bool | ||
|
|
||
| Examples: | ||
| fre sfollow 12345 | ||
| fre sfollow 12345 --validate | ||
| """ | ||
| if validate: | ||
| click.echo(f"Validating job {job_id}...") | ||
| # For validation, we'll just try to get the job info and stdout path | ||
| try: | ||
| from .sfollow import get_job_info, parse_stdout_path | ||
| job_info = get_job_info(job_id) | ||
| stdout_path = parse_stdout_path(job_info) | ||
|
|
||
| if stdout_path: | ||
| click.echo(f"✓ Job {job_id} found") | ||
| click.echo(f"✓ Standard output file: {stdout_path}") | ||
| sys.exit(0) | ||
| else: | ||
| click.echo(f"✗ No standard output file found for job {job_id}") | ||
| sys.exit(1) | ||
| except Exception as e: | ||
| click.echo(f"✗ Error validating job {job_id}: {e}") | ||
| sys.exit(1) | ||
| else: | ||
| # Follow the job output | ||
| success, message = follow_job_output(job_id) | ||
|
|
||
| if success: | ||
| click.echo(f"✓ {message}") | ||
| sys.exit(0) | ||
| else: | ||
| click.echo(f"✗ {message}") | ||
| sys.exit(1) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| sfollow_cli() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| """ | ||
| FRE SFollow - Monitor SLURM job output in real-time | ||
|
|
||
| This module provides functionality to follow the standard output of SLURM jobs. | ||
| It queries job information using scontrol and then follows the output file using less +F. | ||
|
|
||
| authored by Tom Robinson | ||
| NOAA | GFDL | ||
| 2025 | ||
| """ | ||
|
|
||
| import os | ||
| import subprocess | ||
| import logging | ||
| from typing import Optional, Tuple | ||
|
|
||
|
|
||
| def get_job_info(job_id: str) -> str: | ||
| """ | ||
| Retrieve job information from SLURM using scontrol command. | ||
|
|
||
| :param job_id: The SLURM job ID to query | ||
| :type job_id: str | ||
| :raises subprocess.CalledProcessError: If scontrol command fails | ||
| :raises FileNotFoundError: If scontrol command is not found | ||
| :return: Raw output from scontrol show jobid command | ||
| :rtype: str | ||
|
|
||
| .. note:: This function requires scontrol to be available in the system PATH | ||
| """ | ||
| try: | ||
| cmd = ["scontrol", "show", f"jobid={job_id}"] | ||
| result = subprocess.run(cmd, capture_output=True, text=True, check=True) | ||
| return result.stdout | ||
| except subprocess.CalledProcessError as e: | ||
| raise subprocess.CalledProcessError( | ||
| e.returncode, | ||
| e.cmd, | ||
| f"Failed to get job information for job {job_id}: {e.stderr}" | ||
| ) from e | ||
| except FileNotFoundError as exc: | ||
| raise FileNotFoundError( | ||
| "scontrol command not found. Please ensure SLURM is installed and in PATH." | ||
| ) from exc | ||
|
|
||
|
|
||
| def parse_stdout_path(scontrol_output: str) -> Optional[str]: | ||
| """ | ||
| Parse the standard output file path from scontrol output. | ||
|
|
||
| :param scontrol_output: Raw output from scontrol show jobid command | ||
| :type scontrol_output: str | ||
| :return: Path to the standard output file, or None if not found | ||
| :rtype: Optional[str] | ||
|
|
||
| .. note:: This function looks for lines containing 'StdOut=' and extracts the file path | ||
| """ | ||
| for line in scontrol_output.split('\n'): | ||
| line = line.strip() | ||
| if line.startswith('StdOut='): | ||
| # Split on '=' and take everything after the first '=' | ||
| parts = line.split('=', 1) | ||
| if len(parts) == 2: | ||
| stdout_path = parts[1].strip() | ||
| # Handle case where path might be /dev/null or other special cases | ||
| if stdout_path and stdout_path != '/dev/null': | ||
| return stdout_path | ||
| return None | ||
|
|
||
|
|
||
| def follow_output_file(file_path: str) -> None: | ||
| """ | ||
| Follow the output file using less +F command. | ||
|
|
||
| :param file_path: Path to the file to follow | ||
| :type file_path: str | ||
| :raises FileNotFoundError: If the output file doesn't exist | ||
| :raises subprocess.CalledProcessError: If less command fails | ||
|
|
||
| .. note:: This function uses 'less +F' which follows the file and updates in real-time | ||
| .. warning:: This function will block until the user exits less (typically with 'q') | ||
| """ | ||
| if not os.path.exists(file_path): | ||
| raise FileNotFoundError(f"Output file not found: {file_path}") | ||
|
|
||
| try: | ||
| # Use less +F to follow the file | ||
| subprocess.run(["less", "+F", file_path], check=True) | ||
| except subprocess.CalledProcessError as e: | ||
| raise subprocess.CalledProcessError( | ||
| e.returncode, | ||
| e.cmd, | ||
| f"Failed to follow output file {file_path}" | ||
| ) from e | ||
| except FileNotFoundError as exc: | ||
| raise FileNotFoundError( | ||
| "less command not found. Please ensure less is installed and in PATH." | ||
| ) from exc | ||
|
|
||
|
|
||
| def follow_job_output(job_id: str) -> Tuple[bool, str]: | ||
| """ | ||
| Main function to follow a SLURM job's standard output. | ||
|
|
||
| This function combines getting job information, parsing the stdout path, | ||
| and following the output file. | ||
|
|
||
| :param job_id: The SLURM job ID to follow | ||
| :type job_id: str | ||
| :return: Tuple of (success, message) indicating whether the operation succeeded | ||
| :rtype: Tuple[bool, str] | ||
|
|
||
| .. note:: This is the main entry point for the follow functionality | ||
| """ | ||
| try: | ||
| # Get job information | ||
| job_info = get_job_info(job_id) | ||
|
|
||
| # Parse stdout path | ||
| stdout_path = parse_stdout_path(job_info) | ||
|
|
||
| if stdout_path is None: | ||
| return False, f"Could not find standard output file for job {job_id}" | ||
|
|
||
| logging.info(f"Following output file: {stdout_path}") | ||
thomas-robinson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| logging.info("Press 'q' to quit, Ctrl+C to interrupt following") | ||
thomas-robinson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Follow the output file | ||
| follow_output_file(stdout_path) | ||
|
|
||
| return True, f"Successfully followed job {job_id} output" | ||
|
|
||
| except subprocess.CalledProcessError as e: | ||
| return False, f"SLURM error: {e}" | ||
| except FileNotFoundError as e: | ||
| return False, f"File error: {e}" | ||
| except KeyboardInterrupt: | ||
| return True, f"Following job {job_id} interrupted by user" | ||
| except Exception as e: | ||
| return False, f"Unexpected error: {e}" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # SFollow tests initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of this is not needed, like the module structure. or the listed dependencies that all come from the python standard library.