Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ omit =
fre/catalog/tests/*
fre/check/*
fre/cmor/tests/*
fre/sfollow/tests/*
fre/list_/tests/*
fre/make/tests/*
fre/pp/tests/*
Expand Down
3 changes: 2 additions & 1 deletion fre/fre.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@
"make": ".make.fremake.make_cli",
"app": ".app.freapp.app_cli",
"cmor": ".cmor.frecmor.cmor_cli",
"analysis": ".analysis.freanalysis.analysis_cli"},
"analysis": ".analysis.freanalysis.analysis_cli",
"sfollow": ".sfollow.fresfollow.sfollow_cli"},
help = click.style(
"'fre' is the main CLI click group. It houses the other tool groups as lazy subcommands.",
fg = 'cyan')
Expand Down
128 changes: 128 additions & 0 deletions fre/sfollow/README.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of this is not needed, like the module structure. or the listed dependencies that all come from the python standard library.

Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# FRE SFollow Module

The FRE SFollow module provides functionality to monitor SLURM job output in real-time. It queries job information using `scontrol` and follows the standard output file using `less +F`.

## Features

- Query SLURM job information using job ID
- Extract standard output file path from job information
- Follow output files in real-time using `less +F`
- Validation mode to check job status without following
- Comprehensive error handling and user feedback

## Usage

### Command Line Interface

The sfollow module integrates with the FRE CLI framework and can be used as follows:

```bash
# Follow a job's output in real-time
fre sfollow 12345

# Validate a job exists and has output without following
fre sfollow 12345 --validate

# Short form of validate flag
fre sfollow 12345 -v
```

### Examples

```bash
# Monitor a running job
fre sfollow 135549171

# Check if a job has output available
fre sfollow 135549171 --validate
```

## Module Structure

```
fre/sfollow/
├── __init__.py # Module initialization
├── sfollow.py # Core functionality
├── fresfollow.py # Click CLI interface
├── README.md # This file
└── tests/ # Unit tests
├── __init__.py
├── test_sfollow.py # Tests for core functionality
└── test_fresfollow.py # Tests for CLI interface
```

## Core Functions

### `get_job_info(job_id: str) -> str`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is bordering more on content worthy of the docs/*rst files

Retrieves job information from SLURM using the `scontrol show jobid` command.

### `parse_stdout_path(scontrol_output: str) -> Optional[str]`
Parses the standard output file path from scontrol output by looking for lines containing 'StdOut='.

### `follow_output_file(file_path: str) -> None`
Follows the output file using `less +F` command for real-time monitoring.

### `follow_job_output(job_id: str) -> Tuple[bool, str]`
Main function that combines all functionality to follow a SLURM job's output.

## Dependencies

- `subprocess` - For running system commands
- `os` - For file system operations
- `click` - For command line interface
- `typing` - For type hints

## Requirements

- SLURM workload manager with `scontrol` command available
- `less` command available for file following
- Python 3.6+ with typing support

## Error Handling

The module includes comprehensive error handling for:

- SLURM command failures (job not found, permission issues)
- Missing commands (`scontrol`, `less`)
- File system errors (output file not found, permission issues)
- User interrupts (Ctrl+C)

## Testing

Run the unit tests using:

```bash
# Run all tests in the sfollow module
python -m pytest fre/sfollow/tests/

# Run specific test files
python -m pytest fre/sfollow/tests/test_sfollow.py
python -m pytest fre/sfollow/tests/test_fresfollow.py

# Run with coverage
python -m pytest fre/sfollow/tests/ --cov=fre.sfollow
```

## Development

When adding new functionality:

1. Update the core functions in `sfollow.py`
2. Update the CLI interface in `fresfollow.py` if needed
3. Add comprehensive unit tests
4. Update this README with new features
5. Ensure all tests pass

## Integration with FRE-CLI

The sfollow module is integrated into the main FRE-CLI framework through:

1. Entry in `fre/fre.py` lazy_subcommands dictionary
2. CLI command defined in `fresfollow.py`
3. Module structure following FRE-CLI conventions

## Author

Tom Robinson
NOAA | GFDL
2025
1 change: 1 addition & 0 deletions fre/sfollow/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# SFollow module initialization
68 changes: 68 additions & 0 deletions fre/sfollow/fresfollow.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally prefer raise to sys.exit, if possible. also, this is generally more logic than in most of the click cli entry points. internalizing it to the function call directly would be better.

Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""
FRE SFollow CLI - Click command line interface for following SLURM job output

This module provides the Click CLI interface for the sfollow functionality.

authored by Tom Robinson
NOAA | GFDL
2025
"""

import sys
import click
from .sfollow import follow_job_output


@click.command(help=click.style(" - follow SLURM job output in real-time", fg=(34, 139, 34)))
@click.argument('job_id', type=str, required=True)
@click.option('--validate', '-v', is_flag=True,
help='Validate job exists and has output file without following')
def sfollow_cli(job_id, validate):
"""
Follow the standard output of a SLURM job in real-time.

This command queries the SLURM scheduler for job information,
extracts the standard output file path, and follows it using
'less +F' for real-time monitoring.

:param job_id: The SLURM job ID to follow
:type job_id: str
:param validate: If True, only validate the job without following
:type validate: bool

Examples:
fre sfollow 12345
fre sfollow 12345 --validate
"""
if validate:
click.echo(f"Validating job {job_id}...")
# For validation, we'll just try to get the job info and stdout path
try:
from .sfollow import get_job_info, parse_stdout_path
job_info = get_job_info(job_id)
stdout_path = parse_stdout_path(job_info)

if stdout_path:
click.echo(f"✓ Job {job_id} found")
click.echo(f"✓ Standard output file: {stdout_path}")
sys.exit(0)
else:
click.echo(f"✗ No standard output file found for job {job_id}")
sys.exit(1)
except Exception as e:
click.echo(f"✗ Error validating job {job_id}: {e}")
sys.exit(1)
else:
# Follow the job output
success, message = follow_job_output(job_id)

if success:
click.echo(f"✓ {message}")
sys.exit(0)
else:
click.echo(f"✗ {message}")
sys.exit(1)


if __name__ == "__main__":
sfollow_cli()
140 changes: 140 additions & 0 deletions fre/sfollow/sfollow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
"""
FRE SFollow - Monitor SLURM job output in real-time

This module provides functionality to follow the standard output of SLURM jobs.
It queries job information using scontrol and then follows the output file using less +F.

authored by Tom Robinson
NOAA | GFDL
2025
"""

import os
import subprocess
import logging
from typing import Optional, Tuple


def get_job_info(job_id: str) -> str:
"""
Retrieve job information from SLURM using scontrol command.

:param job_id: The SLURM job ID to query
:type job_id: str
:raises subprocess.CalledProcessError: If scontrol command fails
:raises FileNotFoundError: If scontrol command is not found
:return: Raw output from scontrol show jobid command
:rtype: str

.. note:: This function requires scontrol to be available in the system PATH
"""
try:
cmd = ["scontrol", "show", f"jobid={job_id}"]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
return result.stdout
except subprocess.CalledProcessError as e:
raise subprocess.CalledProcessError(
e.returncode,
e.cmd,
f"Failed to get job information for job {job_id}: {e.stderr}"
) from e
except FileNotFoundError as exc:
raise FileNotFoundError(
"scontrol command not found. Please ensure SLURM is installed and in PATH."
) from exc


def parse_stdout_path(scontrol_output: str) -> Optional[str]:
"""
Parse the standard output file path from scontrol output.

:param scontrol_output: Raw output from scontrol show jobid command
:type scontrol_output: str
:return: Path to the standard output file, or None if not found
:rtype: Optional[str]

.. note:: This function looks for lines containing 'StdOut=' and extracts the file path
"""
for line in scontrol_output.split('\n'):
line = line.strip()
if line.startswith('StdOut='):
# Split on '=' and take everything after the first '='
parts = line.split('=', 1)
if len(parts) == 2:
stdout_path = parts[1].strip()
# Handle case where path might be /dev/null or other special cases
if stdout_path and stdout_path != '/dev/null':
return stdout_path
return None


def follow_output_file(file_path: str) -> None:
"""
Follow the output file using less +F command.

:param file_path: Path to the file to follow
:type file_path: str
:raises FileNotFoundError: If the output file doesn't exist
:raises subprocess.CalledProcessError: If less command fails

.. note:: This function uses 'less +F' which follows the file and updates in real-time
.. warning:: This function will block until the user exits less (typically with 'q')
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"Output file not found: {file_path}")

try:
# Use less +F to follow the file
subprocess.run(["less", "+F", file_path], check=True)
except subprocess.CalledProcessError as e:
raise subprocess.CalledProcessError(
e.returncode,
e.cmd,
f"Failed to follow output file {file_path}"
) from e
except FileNotFoundError as exc:
raise FileNotFoundError(
"less command not found. Please ensure less is installed and in PATH."
) from exc


def follow_job_output(job_id: str) -> Tuple[bool, str]:
"""
Main function to follow a SLURM job's standard output.

This function combines getting job information, parsing the stdout path,
and following the output file.

:param job_id: The SLURM job ID to follow
:type job_id: str
:return: Tuple of (success, message) indicating whether the operation succeeded
:rtype: Tuple[bool, str]

.. note:: This is the main entry point for the follow functionality
"""
try:
# Get job information
job_info = get_job_info(job_id)

# Parse stdout path
stdout_path = parse_stdout_path(job_info)

if stdout_path is None:
return False, f"Could not find standard output file for job {job_id}"

logging.info(f"Following output file: {stdout_path}")
logging.info("Press 'q' to quit, Ctrl+C to interrupt following")

# Follow the output file
follow_output_file(stdout_path)

return True, f"Successfully followed job {job_id} output"

except subprocess.CalledProcessError as e:
return False, f"SLURM error: {e}"
except FileNotFoundError as e:
return False, f"File error: {e}"
except KeyboardInterrupt:
return True, f"Following job {job_id} interrupted by user"
except Exception as e:
return False, f"Unexpected error: {e}"
1 change: 1 addition & 0 deletions fre/sfollow/tests/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# SFollow tests initialization
Loading