refactor: use API to fetch Hp index by sahiljhawar · Pull Request #53 · GFZ/SWVO

sahiljhawar · 2026-02-13T09:54:06Z

No description provided.

Copilot

Pull request overview

This pull request refactors the HP index data fetching mechanism from FTP-based downloads to using the GFZ JSON API. The main change replaces FTP file downloads and text parsing with direct API calls that return JSON data. However, the PR also includes unrelated enhancements to the logging system.

Changes:

Migrated HP index data source from FTP file downloads to GFZ JSON API (https://kp.gfz.de/app/json/)
Refactored data processing to parse JSON responses instead of text files
Enhanced logging system with file output support and additional configuration options

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File	Description
swvo/io/hp/gfz.py	Replaced FTP download logic with API calls; updated data processing to handle JSON format
tests/io/hp/test_hp.py	Updated tests to mock API requests instead of FTP operations; adjusted test data format
swvo/logger.py	Added file logging support, changed level parameter from int to str, added line numbers to log format

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T09:58:41Z

+        data_total = pd.DataFrame(
+            {f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},


Missing error handling: If the API response doesn't contain the expected keys ('datetime' or f'Hp{self.index_number}'), this will raise a KeyError. Consider adding validation to check if the expected keys exist in json_data before accessing them, and raise a more informative error or log a warning if they're missing.

Suggested change

data_total = pd.DataFrame(

{f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},

hp_key = f"Hp{self.index_number}"

required_keys = ["datetime", hp_key]

missing_keys = [key for key in required_keys if key not in json_data]

if missing_keys:

logger.warning(

"Missing expected key(s) %s in JSON file %s; returning empty DataFrame",

", ".join(missing_keys),

json_file,

)

return data_total

data_total = pd.DataFrame(

{hp_key: json_data[hp_key]},

Copilot · 2026-02-13T09:58:42Z

+            {f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},
+            index=pd.to_datetime(json_data["datetime"], utc=True),
+        )
+        data_total.index = data_total.index.tz_convert("UTC")


Redundant timezone conversion: Line 296 converts the timezone to 'UTC', but line 294 already creates the datetime index with utc=True, which makes it UTC-aware. The tz_convert operation is unnecessary and can be removed.

Suggested change

data_total.index = data_total.index.tz_convert("UTC")

Copilot · 2026-02-13T09:58:42Z

+def setup_logging(level: str = "INFO", log_file: Optional[Path] = None, file_mode: str = "w"):
+    """Setup logging for the swvo package and root logger.
+
+    Parameters
+    ----------
+    level : str, optional
+        Logging level, by default is INFO
+    log_file : Path, optional
+        Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None
+    """
+    try:
+        level = getattr(logging, level.upper())
+    except AttributeError:
+        raise ValueError(f"Invalid logging level: {level}. Use one of DEBUG, INFO, WARNING, ERROR, CRITICAL.")


The setup_logging function signature has changed from accepting a logging level constant (int) to accepting a string. This is a breaking API change. Existing code calling setup_logging(level=logging.INFO) will break. Consider maintaining backward compatibility by accepting both int and str types for the level parameter.

Copilot · 2026-02-13T09:58:42Z

+def setup_logging(level: str = "INFO", log_file: Optional[Path] = None, file_mode: str = "w"):
+    """Setup logging for the swvo package and root logger.
+
+    Parameters
+    ----------
+    level : str, optional
+        Logging level, by default is INFO
+    log_file : Path, optional
+        Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None
+    """
+    try:
+        level = getattr(logging, level.upper())
+    except AttributeError:
+        raise ValueError(f"Invalid logging level: {level}. Use one of DEBUG, INFO, WARNING, ERROR, CRITICAL.")
+
+    # Configure root logger so all loggers inherit the formatting
+    root_logger = logging.getLogger()
+
+    # Check if already configured
+    if any(isinstance(h, logging.StreamHandler) for h in root_logger.handlers):
        return
-    handler = logging.StreamHandler(sys.stdout)
-    handler.setFormatter(
-        _ColorFormatter(
-            "[%(levelname)-8s] %(asctime)s - %(name)s - %(message)s",
-            datefmt="%Y-%m-%d %H:%M:%S",
-        )
-    )
-
-    logger.setLevel(level)
-    logger.addHandler(handler)
+
+    # Console handler with colors
+    log_format = "[%(levelname)-8s] %(asctime)s - %(name)s:%(lineno)d - %(message)s"
+    datefmt = "%Y-%m-%d %H:%M:%S"
+
+    console_handler = logging.StreamHandler(sys.stdout)
+    console_handler.setFormatter(_ColorFormatter(log_format, datefmt=datefmt))
+
+    root_logger.setLevel(level)
+    root_logger.addHandler(console_handler)
+
+    if log_file:
+        file_handler = logging.FileHandler(log_file, mode=file_mode)
+        file_handler.setFormatter(logging.Formatter(log_format, datefmt=datefmt))
+        root_logger.addHandler(file_handler)


The logger.py changes appear to be unrelated to the PR title "refactor: use API to fetch Hp index". These changes introduce new functionality (file logging support, file_mode parameter, line number in log format) that should be in a separate PR or the PR description should be updated to mention these improvements.

Copilot · 2026-02-13T09:58:42Z

+            data = response.json()
+            logger.debug(f"Downloaded data: {len(data)} records")

-            with open(local_path, "wb") as f:
-                ftp.retrbinary(f"RETR {filename}", f.write)
+            # Save the JSON response to a temporary file for processing
+            output_file = temporary_dir / f"hp_data_{start_time.year}.json"
+            with open(output_file, "w") as f:
+                json.dump(data, f)


Missing validation: The code doesn't check if the API response contains valid data before saving it. If the API returns an empty response or an error message in JSON format, this could result in invalid data being saved and processed. Consider validating that the response contains the expected keys and non-empty data arrays before saving.

Copilot · 2026-02-13T09:58:43Z

+            json_data = json.load(f)
+
+        data_total = pd.DataFrame(
+            {f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},


Column name mismatch: The processed data uses 'Hp30' as the column name (capitalized), but the rest of the codebase expects 'hp30' (lowercase). The _read_single_file method at line 313 uses str(self.index) which is lowercase ('hp30'), and the read method at line 215 also expects self.index in lowercase. This inconsistency will cause issues when combining DataFrames. The column name should use self.index instead of f"Hp{self.index_number}" to maintain consistency.

Suggested change

{f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},

{str(self.index): json_data[f"Hp{self.index_number}"]},

Copilot · 2026-02-13T09:58:43Z

-        mock_ftp.retrbinary.assert_called()
-        mock_ftp.quit.assert_called()
+        # Verify the API was called with requests
+        assert mocker.patch("swvo.io.hp.gfz.requests.get").called or True


This assertion will always evaluate to True due to the 'or True' clause, making the test meaningless. Either verify the mock was called properly by checking mock_get.called (without the 'or True'), or remove this assertion if the URL verification test at line 167 is sufficient.

Copilot · 2026-02-13T09:58:43Z

+    level : str, optional
+        Logging level, by default is INFO
+    log_file : Path, optional
+        Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None


Missing documentation for the 'file_mode' parameter. The parameter is declared in the function signature but not documented in the docstring.

Suggested change

Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None

Path to log file. If None, only console logging is enabled. If provided, logs will be written to both console and file., by default None

file_mode : str, optional

File open mode used for the log file handler (passed to :class:`logging.FileHandler`), by default "w".

Copilot · 2026-02-13T09:58:43Z

+    level : str, optional
+        Logging level, by default is INFO
+    log_file : Path, optional
+        Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None


Spelling error: Missing space in the docstring. Should be "If None, only console logging is enabled. If provided, logs will be written to both console and file."

Suggested change

Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None

Path to log file. If None, only console logging is enabled. If provided, logs will be written to both console and file., by default None

Copilot · 2026-02-13T09:58:44Z

+            response.raise_for_status()
+
+            data = response.json()
+            logger.debug(f"Downloaded data: {len(data)} records")


Potential data loss: The debug log message attempts to get the length of 'data' which is a dictionary returned from response.json(). This will only count the number of keys (likely 2: 'datetime' and 'Hp30'), not the number of records. Consider using len(data.get('datetime', [])) or len(data.get(f'Hp{self.index_number}', [])) to log the actual number of data records.

Suggested change

logger.debug(f"Downloaded data: {len(data)} records")

records_count = len(data.get("datetime", []))

logger.debug(f"Downloaded data: {records_count} records")

sahiljhawar added 2 commits February 13, 2026 10:52

refactor: use API to fetch Hp index

2476aec

fix(logger): use a root logger and log_file in setup_logging

8e55164

Copilot AI review requested due to automatic review settings February 13, 2026 09:54

sahiljhawar closed this Feb 13, 2026

Copilot started reviewing on behalf of sahiljhawar February 13, 2026 09:54 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: use API to fetch Hp index#53

refactor: use API to fetch Hp index#53
sahiljhawar wants to merge 2 commits into
GFZ:mainfrom
sahiljhawar:logger-with-file

sahiljhawar commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		data_total = pd.DataFrame(
		{f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},

-        data_total = pd.DataFrame(
-            {f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},
+        hp_key = f"Hp{self.index_number}"
+        required_keys = ["datetime", hp_key]
+        missing_keys = [key for key in required_keys if key not in json_data]
+        if missing_keys:
+            logger.warning(
+                "Missing expected key(s) %s in JSON file %s; returning empty DataFrame",
+                ", ".join(missing_keys),
+                json_file,
+            )
+            return data_total
+        data_total = pd.DataFrame(
+            {hp_key: json_data[hp_key]},

	{f"Hp{self.index_number}": json_data[f"Hp{self.index_number}"]},
	{str(self.index): json_data[f"Hp{self.index_number}"]},

-        Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None
+        Path to log file. If None, only console logging is enabled. If provided, logs will be written to both console and file., by default None
+    file_mode : str, optional
+        File open mode used for the log file handler (passed to :class:`logging.FileHandler`), by default "w".

	Path to log file. If None, only console logging is enabled.If provided, logs will be written to both console and file., by default None
	Path to log file. If None, only console logging is enabled. If provided, logs will be written to both console and file., by default None

	logger.debug(f"Downloaded data: {len(data)} records")
	records_count = len(data.get("datetime", []))
	logger.debug(f"Downloaded data: {records_count} records")

Conversation

sahiljhawar commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants