Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authenticated IPFS RPC connections to IPFSStore #35

Merged
merged 18 commits into from
Mar 20, 2025

Conversation

TheGreatAlgo
Copy link
Collaborator

@TheGreatAlgo TheGreatAlgo commented Mar 13, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced authentication options for secure IPFS API interactions. Users can now authenticate using basic credentials, bearer tokens, or API keys with customizable header support.
    • Improved IPFS configuration in the GitHub Actions workflow to support multiple authentication methods.
  • Tests

    • Added a new test function to validate the behavior of the IPFSStore with various authentication methods, ensuring correct handling of valid and invalid credentials.

Copy link

coderabbitai bot commented Mar 13, 2025

Walkthrough

This pull request enhances the IPFSStore class by introducing new authentication parameters in its constructor. The modifications include support for basic authentication, bearer token, API key, and a customizable API key header. A new private method, _get_request_headers, constructs the required request headers based on the provided credentials. Additionally, the save method has been updated to utilize these headers and authentication parameters when interacting with the IPFS API. A new test function for authentication handling and dataset operations has also been added. Furthermore, the GitHub Actions workflow has been updated to install and configure Nginx as a reverse proxy for the IPFS service.

Changes

File Change Summary
py_hamt/store.py Updated the IPFSStore.__init__ method to accept basic_auth, bearer_token, api_key, and api_key_header parameters; added a new private method _get_request_headers to create authentication headers; modified the save method to build the URL separately and include the constructed headers and authentication parameters in the API call.
tests/test_zarr_ipfs.py Updated the random_zarr_dataset fixture to include scope="module"; added a new test function test_authenticated_gateway to validate authentication handling and dataset operations with the updated IPFSStore.
.github/workflows/run-checks.yaml Added steps to install and configure Nginx as a reverse proxy for the IPFS service, enforcing API key authentication.

Possibly related PRs

  • Add network stats to ipfs store #31: The changes in the main PR enhance the IPFSStore class with new authentication methods, while the retrieved PR adds debugging capabilities to the same class, indicating a direct modification to the constructor and methods of IPFSStore.
  • Zarr encryption with transform stage #25: The changes in the main PR, which enhance authentication methods in the IPFSStore class, are related to the new test function in the retrieved PR that tests the behavior of IPFSStore with various authentication methods.
  • Allow for specifying whether to pin in IPFSStore #10: The changes in the main PR enhance the IPFSStore class with new authentication methods, while the retrieved PR modifies the same class to introduce a pinning feature; thus, they are related through modifications to the same class and methods.

Suggested reviewers

  • TheGreatAlgo

Poem

In the code fields where I happily roam,
I found new keys to call my home.
Tokens and headers in a neat parade,
Secure hops across each line are made.
I nibble bugs and skip the rest—
Coding’s a garden, and I'm truly blessed!
Hop on, dear coder, and do your best!

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (70dc3d3) to head (dba091f).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #35   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          447       461   +14     
=========================================
+ Hits           447       461   +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70dc3d3 and 0f17eaa.

📒 Files selected for processing (1)
  • py_hamt/store.py (3 hunks)
🧰 Additional context used
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag add-authentication
py_hamt/store.py

[error] 124-126: Coverage failure: total of 99 is less than fail-under=100. Missing coverage on lines 124 and 126.

🔇 Additional comments (4)
py_hamt/store.py (4)

79-84: Authentication parameters well-structured and documented

The addition of multiple authentication options (basic auth, bearer token, API key) with clear type hints and defaults is a good approach. This provides flexibility for users to choose the authentication method that best suits their IPFS API setup.


108-116: Well-documented authentication attributes

The authentication-related attributes are properly initialized and documented with helpful docstrings that explain their purpose.


144-155: Authentication integration looks good

The updates to the save method correctly integrate the new authentication capabilities by using the headers from _get_request_headers() and the basic authentication when available. This implementation properly separates the URL building from the request execution.


124-126:

❓ Verification inconclusive

Missing test coverage for authentication logic

The CI pipeline shows that lines 124 and 126 lack test coverage, bringing the total coverage to 99% (below the required 100%). These lines handle the bearer token and API key authentication logic.

You should add test cases that cover both authentication methods:

  1. Testing with a bearer token
  2. Testing with an API key

🏁 Script executed:

#!/bin/bash
# Search for existing tests for IPFSStore to see what needs to be extended
rg -A 5 "test.*IPFSStore" --type=python

Length of output: 72


Authentication Test Coverage Missing – Please Add Necessary Tests

The CI pipeline indicates that the authentication logic in py_hamt/store.py (lines 124–126) isn’t exercised by any tests. In your current setup, the initial search for tests using:

rg -A 5 "test.*IPFSStore" --type=python

produced an error (“unrecognized file type: python”). This suggests that either the grep parameters need adjustment or — more critically — that there are no test cases covering this functionality.

Action Items:

  • Add test cases for both branches of the authentication logic:
    • Bearer Token Authentication: Ensure that when a bearer token is provided, the header "Authorization": "Bearer <token>" is correctly set.
    • API Key Authentication: Verify that when an API key is provided, the header defined by self.api_key_header is properly assigned.
  • Revise the test search command if needed (for example, using a glob pattern such as below) to validate the existence of tests in your repository:
    rg -g "*.py" -A5 "IPFSStore"

Please confirm that tests for both methods are added and ensure they are executed in the CI pipeline to boost coverage to 100%.

🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag add-authentication

[error] 124-126: Coverage failure: total of 99 is less than fail-under=100. Missing coverage on lines 124 and 126.

Copy link
Collaborator

@abidsikder abidsikder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good so far, small and functional, the best qualities for a PR.

The tests are currently failing since this does not include code coverage and so does not rigorously test the functionality provided. A test file should be created that creates a temporary authenticated gateway and then creates an IPFSStore that can talk to that authenticated gateway and then tests some ordinary HAMT key setting and retrieval functionality.

py_hamt/store.py Outdated
Comment on lines 118 to 127
def _get_request_headers(self) -> dict[str, str]:
"""Helper method to construct authentication headers"""
headers = {}

# Apply authentication based on provided credentials
if self.bearer_token:
headers["Authorization"] = f"Bearer {self.bearer_token}"
elif self.api_key:
headers[self.api_key_header] = self.api_key
return headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not used elsewhere, nor does it abstract away any complex functionality, for code style I would prefer it including right next to the headers variable initialization in line 147

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/test_zarr_ipfs.py (3)

142-180: Consider reducing code duplication with test_encryption

This new test function duplicates most of the logic from test_encryption with the only difference being the authentication parameters. Consider refactoring to reduce duplication, perhaps by creating a helper function that takes authentication parameters.

-def test_authenticated_gateway(random_zarr_dataset: tuple[str, xr.Dataset]):
-    zarr_path, expected_ds = random_zarr_dataset
-    test_ds = xr.open_zarr(zarr_path)
-
-    with pytest.raises(ValueError, match="Encryption key is not 32 bytes"):
-        create_zarr_encryption_transformers(bytes(), bytes())
-
-    encryption_key = bytes(32)
-    # Encrypt only precipitation, not temperature
-    encrypt, decrypt = create_zarr_encryption_transformers(
-        encryption_key, header="sample-header".encode(), exclude_vars=["temp"]
-    )
-    hamt = HAMT(
-        store=IPFSStore(
-            api_key="test",
-            bearer_token="Test", 
-            basic_auth=("test", "test")
-        ), transformer_encode=encrypt, transformer_decode=decrypt
-    )
-    test_ds.to_zarr(store=hamt, mode="w")
-
-    hamt.make_read_only()
-    loaded_ds = xr.open_zarr(store=hamt)
-    xr.testing.assert_identical(loaded_ds, expected_ds)
-
-    # Now trying to load without a decryptor, xarray should be able to read the metadata and still perform operations on the unencrypted variable
-    print("Attempting to read and print metadata of partially encrypted zarr")
-    ds = xr.open_zarr(
-        store=HAMT(store=IPFSStore(          
-            api_key="test",
-            bearer_token="Test",
-            basic_auth=("test", "test")
-        ), root_node_id=hamt.root_node_id, read_only=True)
-    )
-    print(ds)
-    assert ds.temp.sum() == expected_ds.temp.sum()
-    # We should be unable to read precipitation values which are still encrypted
-    with pytest.raises(Exception):
-        ds.precip.sum()
+def _test_encryption_with_store(random_zarr_dataset: tuple[str, xr.Dataset], store_params: dict = None):
+    """Helper function to test encryption with different store configurations"""
+    zarr_path, expected_ds = random_zarr_dataset
+    test_ds = xr.open_zarr(zarr_path)
+
+    with pytest.raises(ValueError, match="Encryption key is not 32 bytes"):
+        create_zarr_encryption_transformers(bytes(), bytes())
+
+    encryption_key = bytes(32)
+    # Encrypt only precipitation, not temperature
+    encrypt, decrypt = create_zarr_encryption_transformers(
+        encryption_key, header="sample-header".encode(), exclude_vars=["temp"]
+    )
+    
+    store_params = store_params or {}
+    store = IPFSStore(**store_params)
+    
+    hamt = HAMT(
+        store=store, transformer_encode=encrypt, transformer_decode=decrypt
+    )
+    test_ds.to_zarr(store=hamt, mode="w")
+
+    hamt.make_read_only()
+    loaded_ds = xr.open_zarr(store=hamt)
+    xr.testing.assert_identical(loaded_ds, expected_ds)
+
+    # Now trying to load without a decryptor, xarray should be able to read the metadata and still perform operations on the unencrypted variable
+    print("Attempting to read and print metadata of partially encrypted zarr")
+    read_store = IPFSStore(**store_params)
+    ds = xr.open_zarr(
+        store=HAMT(store=read_store, root_node_id=hamt.root_node_id, read_only=True)
+    )
+    print(ds)
+    assert ds.temp.sum() == expected_ds.temp.sum()
+    # We should be unable to read precipitation values which are still encrypted
+    with pytest.raises(Exception):
+        ds.precip.sum()
+
+def test_encryption(random_zarr_dataset: tuple[str, xr.Dataset]):
+    """Test encryption without authentication"""
+    _test_encryption_with_store(random_zarr_dataset)
+
+def test_authenticated_gateway(random_zarr_dataset: tuple[str, xr.Dataset]):
+    """Test encryption with authentication"""
+    auth_params = {
+        "api_key": "test",
+        "bearer_token": "Test",
+        "basic_auth": ("test", "test")
+    }
+    _test_encryption_with_store(random_zarr_dataset, auth_params)
🧰 Tools
🪛 Ruff (0.8.2)

179-179: pytest.raises(Exception) should be considered evil

(B017)


155-159: Test authentication methods separately

Currently, you're testing all authentication methods (API key, bearer token, and basic auth) simultaneously. This doesn't verify that each method works independently. Consider adding separate tests for each authentication method to ensure comprehensive coverage.

def test_api_key_authentication(random_zarr_dataset):
    auth_params = {"api_key": "test"}
    _test_encryption_with_store(random_zarr_dataset, auth_params)

def test_bearer_token_authentication(random_zarr_dataset):
    auth_params = {"bearer_token": "Test"}
    _test_encryption_with_store(random_zarr_dataset, auth_params)

def test_basic_auth_authentication(random_zarr_dataset):
    auth_params = {"basic_auth": ("test", "test")}
    _test_encryption_with_store(random_zarr_dataset, auth_params)

179-180: Use a more specific exception type

Using pytest.raises(Exception) is considered too broad as it will catch any exception, not just the specific one you're expecting. This could mask bugs where the wrong type of exception is raised.

-    with pytest.raises(Exception):
-        ds.precip.sum()
+    # Replace 'Exception' with the specific exception type that's expected, e.g.:
+    with pytest.raises((ValueError, RuntimeError)):
+        ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

179-179: pytest.raises(Exception) should be considered evil

(B017)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4244a20 and 34d89a1.

📒 Files selected for processing (2)
  • py_hamt/store.py (4 hunks)
  • tests/test_zarr_ipfs.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • py_hamt/store.py
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

179-179: pytest.raises(Exception) should be considered evil

(B017)

🔇 Additional comments (1)
tests/test_zarr_ipfs.py (1)

155-174:

❓ Verification inconclusive

Address uncovered lines in store.py

According to the PR objectives, there are 2 uncovered lines in py_hamt/store.py. Consider adding tests that specifically target these uncovered code paths to improve the coverage back to 100%.


🏁 Script executed:

#!/bin/bash
# Script to identify the uncovered lines in store.py
# and analyze what authentication scenarios might be missing

# Check if we can locate the coverage report
fd -t f -e xml "coverage" || echo "No coverage report found"

# Look at the store.py file to understand the authentication logic
cat py_hamt/store.py | grep -A 30 "def _get_request_headers"

# Check for potential conditional branches in authentication logic
ast-grep --pattern 'if $condition { $$$body }' py_hamt/store.py

Length of output: 154


Attention: Uncovered authentication branches in store.py – Manual verification needed
The current investigation did not produce output showing the specific uncovered lines in py_hamt/store.py (notably within the _get_request_headers function). Could you please manually verify the following?

  • Extract and review the entire _get_request_headers function to pinpoint the two uncovered conditional branches in the authentication logic.
  • Confirm whether both the token and basic authentication scenarios have corresponding tests. If not, please add tests to cover these paths and bring the coverage to 100%.

For your inspection, please run these additional shell scripts:

Once you’ve confirmed the uncovered paths and updated the tests accordingly, we can revisit this review item.


🏁 Script executed:

#!/bin/bash
# Extract the _get_request_headers function from py_hamt/store.py for manual review
sed -n '/def _get_request_headers/,/^def /p' py_hamt/store.py

# Additionally, output the conditional branches within that function (if/else statements)
sed -n '/def _get_request_headers/,/^def /p' py_hamt/store.py | grep -E 'if |else'

Length of output: 148


Action Required: Manual Verification of Uncovered Code Paths in store.py
The initial automated checks did not reveal the contents of the _get_request_headers function or its conditional branches. Please manually verify the following in py_hamt/store.py:

  • Confirm that the _get_request_headers function exists and check its implementation for conditional branches (e.g., branches handling token authentication vs. basic authentication).
  • Identify the two uncovered lines mentioned in the report and ensure that tests cover both scenarios.
  • If either authentication scenario is missing corresponding tests, add tests to cover these branches and restore full (100%) coverage.

To aid with this verification, please execute the following additional shell script to try to extract the function and its conditionals:

#!/bin/bash
# Verify the presence and content of the _get_request_headers function in py_hamt/store.py

# Locate the function definition
rg "def\s+_get_request_headers" py_hamt/store.py

# Extract the function content (from its definition until an empty line or next function definition)
sed -n '/def _get_request_headers/,/^\s*$/p' py_hamt/store.py

# Optionally, highlight conditional statements within that function to inspect the authentication branches
sed -n '/def _get_request_headers/,/^\s*$/p' py_hamt/store.py | grep -E 'if |else'

After reviewing and ensuring that both authentication scenarios are properly tested, update the tests to cover any uncovered paths as needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/test_zarr_ipfs.py (2)

142-179: New test for authenticated gateway has code duplication

This new test function largely duplicates the logic from test_encryption, making the same assertions and following the same structure, but with added authentication parameters. Consider refactoring to reduce duplication, possibly using pytest's parameterization.

@pytest.mark.parametrize("auth_params", [
    {},  # No authentication
    {
        "api_key": "test",
        "bearer_token": "Test",
        "basic_auth": ("test", "test")
    }  # With authentication
])
def test_encryption_with_options(random_zarr_dataset: tuple[str, xr.Dataset], auth_params):
    zarr_path, expected_ds = random_zarr_dataset
    test_ds = xr.open_zarr(zarr_path)

    with pytest.raises(ValueError, match="Encryption key is not 32 bytes"):
        create_zarr_encryption_transformers(bytes(), bytes())

    encryption_key = bytes(32)
    # Encrypt only precipitation, not temperature
    encrypt, decrypt = create_zarr_encryption_transformers(
        encryption_key, header="sample-header".encode(), exclude_vars=["temp"]
    )
    hamt = HAMT(
        store=IPFSStore(**auth_params), transformer_encode=encrypt, transformer_decode=decrypt
    )
    test_ds.to_zarr(store=hamt, mode="w")

    hamt.make_read_only()
    loaded_ds = xr.open_zarr(store=hamt)
    xr.testing.assert_identical(loaded_ds, expected_ds)

    # Now trying to load without a decryptor, xarray should be able to read the metadata and still perform operations on the unencrypted variable
    print("Attempting to read and print metadata of partially encrypted zarr")
    ds = xr.open_zarr(
        store=HAMT(store=IPFSStore(**auth_params), root_node_id=hamt.root_node_id, read_only=True)
    )
    print(ds)
    assert ds.temp.sum() == expected_ds.temp.sum()
    # We should be unable to read precipitation values which are still encrypted
    with pytest.raises(Exception):
        ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

178-178: pytest.raises(Exception) should be considered evil

(B017)


155-159: Consider testing each authentication method separately

The test is using all authentication methods (API key, bearer token, basic auth) simultaneously, which doesn't validate that each method works independently. Consider testing each authentication method separately to ensure proper implementation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34d89a1 and 1b7df72.

📒 Files selected for processing (1)
  • tests/test_zarr_ipfs.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

178-178: pytest.raises(Exception) should be considered evil

(B017)

🔇 Additional comments (1)
tests/test_zarr_ipfs.py (1)

142-179: Missing verification for authentication functionality

The test doesn't verify that authentication is actually working. Consider adding assertions to validate that requests are made with proper headers or test with invalid credentials to ensure authentication failures are handled correctly.

Would you like to add a test case with invalid credentials to verify that authentication failure is handled properly?

🧰 Tools
🪛 Ruff (0.8.2)

178-178: pytest.raises(Exception) should be considered evil

(B017)

Comment on lines 170 to 173
store=HAMT(store=IPFSStore(
api_key="test",
basic_auth=("test", "test")
), root_node_id=hamt.root_node_id, read_only=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Authentication parameters inconsistency

The authentication parameters used for reading (lines 170-173) differ from those used for writing (lines 155-159). The read operation omits the bearer_token. This inconsistency may lead to test failures if all authentication parameters are required.

    ds = xr.open_zarr(
        store=HAMT(store=IPFSStore(          
            api_key="test",
+           bearer_token="Test",
            basic_auth=("test", "test")
        ), root_node_id=hamt.root_node_id, read_only=True)
    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
store=HAMT(store=IPFSStore(
api_key="test",
basic_auth=("test", "test")
), root_node_id=hamt.root_node_id, read_only=True)
ds = xr.open_zarr(
store=HAMT(store=IPFSStore(
api_key="test",
bearer_token="Test",
basic_auth=("test", "test")
), root_node_id=hamt.root_node_id, read_only=True)
)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_zarr_ipfs.py (1)

154-160: 🛠️ Refactor suggestion

Test authentication methods separately

To ensure proper test coverage of the authentication functionality, test each authentication method separately rather than combining them. This helps isolate which authentication method is being used and confirms each works independently.

    hamt = HAMT(
        store=IPFSStore(
-            api_key="test",
-            bearer_token="Test",
-            basic_auth=("test", "test")
+            bearer_token="Test"  # Test only bearer token first
        ), transformer_encode=encrypt, transformer_decode=decrypt
    )
🧹 Nitpick comments (3)
tests/test_zarr_ipfs.py (3)

181-183: Authentication parameters inconsistency

When reading the data back, no authentication parameters are provided to IPFSStore, but authentication parameters were used when writing the data. For comprehensive testing, consider using the same authentication parameters for both write and read operations.

    ds = xr.open_zarr(
-        store=HAMT(store=IPFSStore(), root_node_id=hamt.root_node_id, read_only=True)
+        store=HAMT(store=IPFSStore(api_key="test"), root_node_id=hamt.root_node_id, read_only=True)
    )

142-188: Reduce code duplication with test_encryption

The test_authenticated_gateway function duplicates much of the code from test_encryption. Consider refactoring to extract common testing logic into helper functions to improve maintainability.

def _test_encryption_with_store(random_zarr_dataset, store_factory):
    """Helper function to test encryption with a given store factory function"""
    zarr_path, expected_ds = random_zarr_dataset
    test_ds = xr.open_zarr(zarr_path)
    
    encryption_key = bytes(32)
    encrypt, decrypt = create_zarr_encryption_transformers(
        encryption_key, header="sample-header".encode(), exclude_vars=["temp"]
    )
    hamt = HAMT(store=store_factory(), transformer_encode=encrypt, transformer_decode=decrypt)
    
    # Test write and read operations
    test_ds.to_zarr(store=hamt, mode="w")
    hamt.make_read_only()
    loaded_ds = xr.open_zarr(store=hamt)
    xr.testing.assert_identical(loaded_ds, expected_ds)
    
    # Test without decryptor
    ds = xr.open_zarr(
        store=HAMT(store=store_factory(no_auth=True), root_node_id=hamt.root_node_id, read_only=True)
    )
    assert ds.temp.sum() == expected_ds.temp.sum()
    with pytest.raises(ValueError):  # Specify expected exception
        ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

187-187: pytest.raises(Exception) should be considered evil

(B017)


142-188:

❓ Verification inconclusive

Add targeted tests for authentication mechanisms

The new test_authenticated_gateway function is a good addition for testing authenticated operations, but it doesn't explicitly verify the core authentication functionality. The PR shows a coverage gap in py_hamt/store.py (99.78% coverage with 1 line uncovered).

Consider adding more targeted test assertions to verify that the authentication parameters are being correctly used to generate request headers. This could involve mocking the request layer or adding specific tests for each authentication method.


🏁 Script executed:

#!/bin/bash
# Check for uncovered lines in store.py related to authentication
fd -e py "store\.py$" | xargs grep -n -A 5 -B 5 "_get_request_headers"

Length of output: 70


Action: Request Manual Verification for Authentication Header Tests

The current automated check did not locate any instances of _get_request_headers in py_hamt/store.py. This raises uncertainty about the targeted authentication logic mentioned in the review comment. Before finalizing the test improvements, please manually verify the following:

  • Confirm whether a function (such as _get_request_headers or an equivalent mechanism) exists in py_hamt/store.py that handles the generation of authentication request headers.
  • If such a function exists, add tests that explicitly validate the construction of authentication headers (e.g., by mocking the request layer to check for the correct inclusion of API key, bearer token, and basic auth).
  • If the authentication header creation is implemented differently, adjust test assertions accordingly to ensure comprehensive coverage.
🧰 Tools
🪛 Ruff (0.8.2)

187-187: pytest.raises(Exception) should be considered evil

(B017)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b7df72 and 30d13bb.

📒 Files selected for processing (1)
  • tests/test_zarr_ipfs.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

187-187: pytest.raises(Exception) should be considered evil

(B017)

🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag add-authentication
tests/test_zarr_ipfs.py

[error] 1-1: Ruff formatting check failed. The file would be reformatted.

🔇 Additional comments (1)
tests/test_zarr_ipfs.py (1)

187-188: Specify the expected exception type

Using pytest.raises(Exception) is too broad and may catch unexpected exceptions. Specify the exact exception type that should be raised when attempting to access encrypted data.

-    with pytest.raises(Exception):
+    with pytest.raises((ValueError, KeyError, RuntimeError)):  # Replace with actual exception types
        ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

187-187: pytest.raises(Exception) should be considered evil

(B017)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_zarr_ipfs.py (1)

178-188: 🛠️ Refactor suggestion

Specify the expected exception type instead of using pytest.raises(Exception)

Using pytest.raises(Exception) is too broad and may catch unexpected exceptions. This is also flagged by the static analysis tool.

Specify the exact exception type that should be raised when attempting to access encrypted data:

-with pytest.raises(Exception):
+with pytest.raises((ValueError, KeyError, RuntimeError)):  # Replace with actual exception types
    ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

187-187: pytest.raises(Exception) should be considered evil

(B017)

🧹 Nitpick comments (5)
.github/workflows/run-checks.yaml (1)

28-42: Authentication setup looks good but consider adding a small comment about restarts

The GitHub workflow now correctly configures multiple authentication methods for IPFS, which is essential for testing the new authentication features. The implementation sets up Bearer token, API key, and Basic authentication with test credentials.

Consider adding a comment to clarify whether the IPFS daemon needs to be restarted after these configuration changes, as this might be useful information for developers running these commands locally:

 # Basic auth with test:test
 echo "test:test" > ~/.ipfs/credentials
 BASIC_AUTH_BASE64=$(base64 ~/.ipfs/credentials | tr -d '\n')
 ipfs config --json Gateway.HTTPHeaders.Basic-Auth "{\"Authorization\": \"Basic $BASIC_AUTH_BASE64\"}"

+# Note: These changes take effect immediately without needing to restart the IPFS daemon
tests/test_zarr_ipfs.py (1)

142-177: Authentication test setup looks good but inconsistent auth methods in tests

The new test for authenticated gateways effectively tests the authentication functionality. However, the test only verifies a subset of authentication combinations.

Consider testing each authentication method independently to ensure they all work correctly on their own:

 # Test with just api key
 hamt = HAMT(
     store=IPFSStore(api_key="test"),
     transformer_encode=encrypt,
     transformer_decode=decrypt,
 )
 test_ds.to_zarr(store=hamt, mode="w")

 hamt.make_read_only()
 loaded_ds = xr.open_zarr(store=hamt)
 xr.testing.assert_identical(loaded_ds, expected_ds)
+
+# Test with just bearer token
+hamt = HAMT(
+    store=IPFSStore(bearer_token="Test"),
+    transformer_encode=encrypt,
+    transformer_decode=decrypt,
+)
+test_ds.to_zarr(store=hamt, mode="w")
+
+hamt.make_read_only()
+loaded_ds = xr.open_zarr(store=hamt)
+xr.testing.assert_identical(loaded_ds, expected_ds)
+
+# Test with just basic auth
+hamt = HAMT(
+    store=IPFSStore(basic_auth=("test", "test")),
+    transformer_encode=encrypt,
+    transformer_decode=decrypt,
+)
+test_ds.to_zarr(store=hamt, mode="w")
+
+hamt.make_read_only()
+loaded_ds = xr.open_zarr(store=hamt)
+xr.testing.assert_identical(loaded_ds, expected_ds)
py_hamt/store.py (3)

139-149: Authentication implementation looks solid but consider using the load method too

The authentication implementation in the save method looks solid. However, the load method doesn't use the authentication credentials when retrieving data from the gateway.

Consider applying the same authentication logic to the load method to ensure consistent authentication behavior:

def load(self, id: CID) -> bytes:  # type: ignore
    """
    This retrieves the raw bytes by calling the provided HTTP gateway.
    ```python
    from py_hamt import IPFSStore
    from multiformats import CID

    ipfs_store = IPFSStore()
    # This is just an example CID
    cid = CID.decode("bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze")
    data = ipfs_store.load(cid)
    print(data)
    ```
    """
+    # Apply authentication based on provided credentials
+    headers = {}
+    if self.bearer_token:
+        headers["Authorization"] = f"Bearer {self.bearer_token}"
+    elif self.api_key:
+        headers[self.api_key_header] = self.api_key
+
+    # Prepare request parameters
+    auth = self.basic_auth if self.basic_auth else None
+
    response = requests.get(
-        f"{self.gateway_uri_stem}/ipfs/{str(id)}", timeout=self.timeout_seconds
+        f"{self.gateway_uri_stem}/ipfs/{str(id)}",
+        headers=headers,
+        auth=auth,
+        timeout=self.timeout_seconds
    )
    response.raise_for_status()

    if self.debug:
        self.total_received += len(response.content)  # type: ignore

    return response.content

139-145: Consider extracting authentication header logic into a helper method

The authentication header logic is only applied in the save method. If it's used in other methods (like the load method as suggested), it would be better to extract it into a helper method.

Extract the authentication header logic into a helper method:

+def _get_request_headers(self) -> dict[str, str]:
+    """Helper method to construct authentication headers"""
+    headers = {}
+    if self.bearer_token:
+        headers["Authorization"] = f"Bearer {self.bearer_token}"
+    elif self.api_key:
+        headers[self.api_key_header] = self.api_key
+    return headers

def save(self, data: bytes, cid_codec: str) -> CID:
    # ...
    pin_string: str = "true" if self.pin_on_add else "false"

-    # Apply authentication based on provided credentials
-    headers = {}
-    if self.bearer_token:
-        headers["Authorization"] = f"Bearer {self.bearer_token}"
-    elif self.api_key:
-        headers[self.api_key_header] = self.api_key
+    # Get authentication headers
+    headers = self._get_request_headers()

    # Prepare request parameters
    url = f"{self.rpc_uri_stem}/api/v0/add?hash={self.hasher}&pin={pin_string}"
    auth = self.basic_auth if self.basic_auth else None

141-144: Consider handling the case when both bearer_token and api_key are provided

The current implementation uses the bearer token only if it's present, and falls back to the API key. It doesn't handle the case when both are provided, which might be confusing.

Add a comment to clarify the precedence when both authentication methods are provided:

# Apply authentication based on provided credentials
headers = {}
if self.bearer_token:
+    # Bearer token takes precedence over API key if both are provided
    headers["Authorization"] = f"Bearer {self.bearer_token}"
elif self.api_key:
    headers[self.api_key_header] = self.api_key
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30d13bb and 318583b.

📒 Files selected for processing (3)
  • .github/workflows/run-checks.yaml (1 hunks)
  • py_hamt/store.py (4 hunks)
  • tests/test_zarr_ipfs.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

187-187: pytest.raises(Exception) should be considered evil

(B017)

🔇 Additional comments (3)
tests/test_zarr_ipfs.py (1)

181-183:

❓ Verification inconclusive

Authentication parameters inconsistency when reading

The code instantiates IPFSStore for reading without any authentication parameters, which differs from the write operation.

If the IPFS gateway requires authentication for reading as well, this might lead to test failures. Consider modifying the code to include authentication parameters when reading:

ds = xr.open_zarr(
-    store=HAMT(store=IPFSStore(), root_node_id=hamt.root_node_id, read_only=True)
+    store=HAMT(store=IPFSStore(api_key="test"), root_node_id=hamt.root_node_id, read_only=True)
)

Let's verify if the Gateway is configured to require authentication for reads:


🏁 Script executed:

#!/bin/bash
# Check if the Gateway is configured to require authentication for reads
ipfs config show | grep -A 10 Gateway.HTTPHeaders

Length of output: 146


Attention: Manual Verification Needed for IPFS Authentication on Read

The current test instantiates IPFSStore for reading without authentication parameters, which is inconsistent with the write operation. Our attempt to verify the gateway configuration using ipfs config show failed due to the command not being found. This prevents automated confirmation of whether the IPFS gateway requires authentication for reads.

  • Action Required: Please manually verify if the IPFS gateway requires authentication when reading. If authentication is indeed required, update the instantiation accordingly. For example:
    ds = xr.open_zarr(
-       store=HAMT(store=IPFSStore(), root_node_id=hamt.root_node_id, read_only=True)
+       store=HAMT(store=IPFSStore(api_key="test"), root_node_id=hamt.root_node_id, read_only=True)
    )
py_hamt/store.py (2)

65-74: Authentication docstring is thorough and well explained

The updated class docstring clearly explains the authentication features that have been added. This is a good improvement to the documentation.


84-122: Well-structured authentication parameters with clear documentation

The authentication parameters are well-structured and have clear documentation explaining their purpose. Good job on providing a customizable API key header option.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/test_zarr_ipfs.py (1)

197-198: ⚠️ Potential issue

Specify the expected exception type

Using pytest.raises(Exception) is too broad and may catch unexpected exceptions. Specify the exact exception type that should be raised when attempting to save with an invalid API key.

-    with pytest.raises(Exception):
+    with pytest.raises(requests.exceptions.HTTPError, match="401 Client Error: Unauthorized"):
        test_ds.to_zarr(store=hamt, mode="w")

This change will help ensure that the test is specifically catching HTTP 401 errors from the Nginx auth configuration, which is what we expect when using an invalid API key.

🧰 Tools
🪛 Ruff (0.8.2)

197-197: pytest.raises(Exception) should be considered evil

(B017)

🧹 Nitpick comments (2)
.github/workflows/run-checks.yaml (2)

29-63: Great addition of Nginx reverse proxy with auth!

The configuration correctly sets up a reverse proxy on port 5002 with API key authentication using the X-API-Key header. This aligns well with the authentication features being added to the IPFSStore class.

One minor issue to fix:

-          sudo nginx -t
+          sudo nginx -t

Remove the trailing whitespace at line 63.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 63-63: trailing spaces

(trailing-spaces)


64-74: Ensure testing of all authentication methods

The Nginx setup is configured to test API key authentication, but the PR mentions supporting basic authentication and bearer token as well. Consider extending this configuration to test all authentication methods being added to the IPFSStore class.

You could add additional location blocks in the Nginx configuration to test different auth methods, like:

+              location /api/v0/basic/ {
+                  # Basic Auth
+                  auth_basic "Restricted Area";
+                  auth_basic_user_file /etc/nginx/.htpasswd;
+                  
+                  # Proxy to IPFS RPC API
+                  proxy_pass http://127.0.0.1:5001/api/v0/;
+                  proxy_set_header Host $host;
+                  proxy_set_header X-Real-IP $remote_addr;
+              }
+              
+              location /api/v0/bearer/ {
+                  # Bearer token auth
+                  if ($http_authorization != "Bearer test") {
+                      return 401 "Unauthorized: Invalid bearer token";
+                  }
+                  
+                  # Proxy to IPFS RPC API
+                  proxy_pass http://127.0.0.1:5001/api/v0/;
+                  proxy_set_header Host $host;
+                  proxy_set_header X-Real-IP $remote_addr;
+              }

And include a command to create the basic auth credentials:

sudo sh -c "echo -n 'test:' > /etc/nginx/.htpasswd"
sudo sh -c "openssl passwd -apr1 test >> /etc/nginx/.htpasswd"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 318583b and 0c3a7a1.

📒 Files selected for processing (2)
  • .github/workflows/run-checks.yaml (1 hunks)
  • tests/test_zarr_ipfs.py (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/run-checks.yaml

[error] 63-63: trailing spaces

(trailing-spaces)

🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

197-197: pytest.raises(Exception) should be considered evil

(B017)


208-208: pytest.raises(Exception) should be considered evil

(B017)

🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag add-authentication
tests/test_zarr_ipfs.py

[error] 189-189: requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://127.0.0.1:5002/api/v0/add?hash=blake3&pin=false

🔇 Additional comments (2)
tests/test_zarr_ipfs.py (2)

142-170: Good test coverage for API key authentication

This test case effectively validates the IPFSStore with API key authentication against the Nginx reverse proxy. The encryption setup and verification are also well-structured.


201-209:

✅ Verification successful

Inconsistent authentication parameters

The authentication parameters used for reading (lines 203) differ from those used for writing (lines 190-193). The read operation omits the authentication parameters. This works because you're reading from the local store, but it creates confusing test logic.

Additionally, the exception type for line 208 should be specified more precisely.

    # Now trying to load without a decryptor, xarray should be able to read the metadata and still perform operations on the unencrypted variable
    print("Attempting to read and print metadata of partially encrypted zarr")
    ds = xr.open_zarr(
-        store=HAMT(store=IPFSStore(), root_node_id=hamt.root_node_id, read_only=True)
+        store=HAMT(
+            store=IPFSStore(rpc_uri_stem="http://127.0.0.1:5002", api_key="test"),
+            root_node_id=hamt.root_node_id, 
+            read_only=True
+        )
    )
    print(ds)
    assert ds.temp.sum() == expected_ds.temp.sum()
    # We should be unable to read precipitation values which are still encrypted
-    with pytest.raises(Exception):
+    with pytest.raises((ValueError, RuntimeError), match="Unable to decode array"):
        ds.precip.sum()

🌐 Web query:

What exceptions does xarray raise when trying to access encrypted or corrupt data arrays?

💡 Result:

When working with encrypted or corrupt data arrays in Xarray, several exceptions can be raised depending on the underlying file format and the specific issue encountered:

Common Exceptions

  1. OSError/IOError
    Occurs when Xarray cannot access the file due to encryption, incorrect permissions, or file corruption. This is raised by low-level I/O libraries like netCDF4 or h5py.
    Example:

    OSError: Unable to open file (truncated file: e.g., corrupt data)
  2. ValueError
    Triggered during operations like concatenation with inconsistent coordinates or invalid chunks.

    • Example from invalid chunk configuration:
      ValueError: Expected all values to be non-negative. Got (-1, -1) instead.
      ([Source][1])
  3. NetCDF/HDF5 Backend Errors
    Errors from libraries like netCDF4 or h5py propagate through Xarray:

    • RuntimeError for invalid NetCDF4/HDF5 file structures.
    • HDF5ExtError for corrupted HDF5 files.
  4. KeyError
    Raised when accessing variables/dimensions that don’t exist in a corrupt dataset.

  5. NotImplementedError
    Occurs for unsupported operations, such as indexing with method="nearest" on slices.


Handling Encrypted Data

Xarray does not natively support encrypted data. Attempting to open encrypted files without decryption will result in OSError or format-specific parsing errors (e.g., NetCDF: Unknown file format).


Corrupt Data Scenarios

NetCDF/HDF5 Files

  • During Read:

    ValueError: Failed to decode time variable 't' (CF conventions violation)

    Caused by invalid metadata or corrupted time encoding.

  • During Write:

    HDF5ExtError: Unable to create dataset (corrupted chunk)

    Common when writing to partially corrupted Zarr/NetCDF stores ([Example][6]).

Zarr Stores

  • Invalid Chunks:
    ValueError: Chunk size must be a positive integer
  • Consolidated Metadata Issues:
    KeyError: 'Variable not found in consolidated metadata'

Debugging Tips

  1. Use xr.open_dataset(..., engine='h5netcdf') with decode_cf=False to bypass metadata validation.
  2. Check for missing/corrupt chunks using xr.Dataset.chunks.
  3. For Zarr, verify metadata with zarr.open_group(..., mode='r').

For example, corrupted coordinates might raise:

ValueError: Could not find dimension coordinates to concatenate datasets

([Source][4]).

Always handle exceptions with try/except blocks when working with unreliable data sources.

Citations:


Ensure consistent authentication and precise exception checking in test

  • Update the HAMT initialization in tests/test_zarr_ipfs.py (line 203) to include the same authentication parameters used during writing (as set in lines 190–193). This avoids confusion by making the test logic consistent.
  • Replace the broad pytest.raises(Exception) (line 208) with a more precise check that catches ValueError and RuntimeError and verifies the error message contains "Unable to decode array". This aligns with Xarray’s behavior when encountering encrypted (or corrupt) data.
🧰 Tools
🪛 Ruff (0.8.2)

208-208: pytest.raises(Exception) should be considered evil

(B017)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/test_zarr_ipfs.py (2)

200-210: 🛠️ Refactor suggestion

Specify the expected exception type.

Using pytest.raises(Exception) is too broad and may catch unexpected exceptions. Specify the exact exception type that should be raised when authentication fails.

-    with pytest.raises(Exception):
+    with pytest.raises((requests.exceptions.HTTPError)):
        hamt = HAMT(
            store=IPFSStore(      
                rpc_uri_stem = "http://127.0.0.1:5002",     
                api_key="badKey",
            ),
            transformer_encode=encrypt,
            transformer_decode=decrypt,
        )

Remember to add import requests at the top of the file if not already present.

🧰 Tools
🪛 Ruff (0.8.2)

201-201: pytest.raises(Exception) should be considered evil

(B017)


211-220: 🛠️ Refactor suggestion

Specify the expected exception type for encrypted data access.

Using pytest.raises(Exception) is too broad when testing encryption functionality. Specify the exact exception type that should be raised when attempting to access encrypted data.

-    with pytest.raises(Exception):
+    with pytest.raises((ValueError, RuntimeError)):  # Replace with actual exception type
        ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

219-219: pytest.raises(Exception) should be considered evil

(B017)

🧹 Nitpick comments (1)
.github/workflows/run-checks.yaml (1)

80-80: Fix trailing whitespace.

There is a trailing whitespace at the end of line 80 that should be removed.

-      sudo nginx -t
+      sudo nginx -t
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 80-80: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c3a7a1 and 65d33e7.

📒 Files selected for processing (2)
  • .github/workflows/run-checks.yaml (1 hunks)
  • tests/test_zarr_ipfs.py (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/run-checks.yaml

[error] 80-80: trailing spaces

(trailing-spaces)

🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

201-201: pytest.raises(Exception) should be considered evil

(B017)


219-219: pytest.raises(Exception) should be considered evil

(B017)

🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag add-authentication
tests/test_zarr_ipfs.py

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted.

🔇 Additional comments (4)
.github/workflows/run-checks.yaml (2)

28-80: Authentication configuration is well-structured and comprehensive.

The Nginx configuration properly implements multiple authentication methods (API key, Bearer token, Basic Auth) for the IPFS API proxy. This aligns well with the authentication functionality being added to the IPFSStore class.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 80-80: trailing spaces

(trailing-spaces)


81-91: Startup sequence is correctly ordered.

The sequence of starting Nginx, restarting IPFS daemon, and adding a sleep interval ensures that both services are properly initialized before running tests.

tests/test_zarr_ipfs.py (2)

142-170: Test implementation for API key authentication is robust.

The test correctly verifies that the API key authentication works by successfully writing and reading data through the authenticated gateway.


171-198: Authentication method tests are comprehensive.

The tests for bearer token and basic authentication are well-implemented, ensuring all authentication methods supported by the IPFSStore class are verified.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_zarr_ipfs.py (1)

172-172: 🛠️ Refactor suggestion

Specify the expected exception type in pytest.raises()

Using pytest.raises(Exception) is too broad and may catch unexpected exceptions. Specify the exact exception types that you expect to be raised.

-    with pytest.raises(Exception):
+    with pytest.raises((requests.exceptions.HTTPError, RuntimeError)):  # Update with actual exception types

Apply this change to all four instances in the test (lines 172, 199, 225, and 243).

Also applies to: 199-199, 225-225, 243-243

🧰 Tools
🪛 Ruff (0.8.2)

172-172: pytest.raises(Exception) should be considered evil

(B017)

🧹 Nitpick comments (1)
tests/test_zarr_ipfs.py (1)

174-180: Consider testing API key with custom header

The test covers API key authentication with the default header, but doesn't test with a custom api_key_header parameter. Consider adding a test case for this scenario to ensure complete coverage.

+    # Test with API key and custom header
+    hamt = HAMT(
+        store=IPFSStore(
+            rpc_uri_stem="http://127.0.0.1:5002",
+            api_key="test",
+            api_key_header="X-Custom-API-Key",
+        ),
+        transformer_encode=encrypt,
+        transformer_decode=decrypt,
+    )
+    test_ds.to_zarr(store=hamt, mode="w")
+
+    hamt.make_read_only()
+    loaded_ds = xr.open_zarr(store=hamt)
+    xr.testing.assert_identical(loaded_ds, expected_ds)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65d33e7 and c9be3d4.

📒 Files selected for processing (1)
  • tests/test_zarr_ipfs.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

172-172: pytest.raises(Exception) should be considered evil

(B017)


199-199: pytest.raises(Exception) should be considered evil

(B017)


225-225: pytest.raises(Exception) should be considered evil

(B017)


243-243: pytest.raises(Exception) should be considered evil

(B017)

🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag add-authentication
tests/test_zarr_ipfs.py

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted.

🔇 Additional comments (3)
tests/test_zarr_ipfs.py (3)

142-245: LGTM! Good job implementing the authentication test function.

The test covers all three authentication methods: API key, bearer token, and basic auth, including tests for invalid credentials. This provides comprehensive coverage for the authentication functionality.

🧰 Tools
🪛 Ruff (0.8.2)

172-172: pytest.raises(Exception) should be considered evil

(B017)


199-199: pytest.raises(Exception) should be considered evil

(B017)


225-225: pytest.raises(Exception) should be considered evil

(B017)


243-243: pytest.raises(Exception) should be considered evil

(B017)


159-160: Fix spacing around equals signs in keyword arguments

There are inconsistent spaces around the equals signs in keyword arguments. Remove the spaces to match the project's style guidelines.

-            rpc_uri_stem = "http://127.0.0.1:5002",
+            rpc_uri_stem="http://127.0.0.1:5002",

This formatting issue is causing the pipeline failure. Please run ruff format to automatically fix all formatting issues.

Also applies to: 175-176, 187-187, 203-203, 213-213, 229-229


237-239:

❓ Verification inconclusive

Consider adding authentication parameters when reading

The test reads from the IPFS store without providing authentication parameters, which differs from the write operation. Consider adding the same authentication parameters to verify they're required for both operations.


🏁 Script executed:

#!/bin/bash
# Check if authentication is required for reading operations
grep -n "_get_request_headers" $(fd "store.py$") | head -10

Length of output: 118


Review Authentication Consistency Verification

The current test in tests/test_zarr_ipfs.py (lines 237–239) instantiates the IPFS store for reading without supplying authentication parameters. Our initial grep for _get_request_headers in relevant store files (e.g., py_hamt/store.py) did not yield clear evidence regarding the use of authentication in the read path. Please manually verify the following:

  • Confirm whether authentication parameters (or corresponding headers) are indeed required for the read operation, similar to the write path.
  • If authentication is enforced for writing via mechanisms implemented in functions like _get_request_headers, ensure the read operation reflects this requirement for consistency.
  • Update the test or the store instantiation accordingly if the authentication should be applied.

@Faolain Faolain requested review from abidsikder and Faolain March 15, 2025 02:04
@abidsikder abidsikder changed the title fix: add auth Add authenticated IPFS RPC connections to IPFSStore Mar 20, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
py_hamt/store.py (1)

139-155: Authentication integration in the save method looks good, but should consider extracting header logic

The authentication logic is correctly implemented, but consider extracting the header construction into a private method for better maintainability and reusability.

 class IPFSStore(Store):
+    def _get_request_headers(self) -> dict[str, str]:
+        """Helper method to construct authentication headers"""
+        headers = {}
+        if self.bearer_token:
+            headers["Authorization"] = f"Bearer {self.bearer_token}"
+        elif self.api_key:
+            headers[self.api_key_header] = self.api_key
+        return headers

     def save(self, data: bytes, cid_codec: str) -> CID:
         # ...
         pin_string: str = "true" if self.pin_on_add else "false"

-        # Apply authentication based on provided credentials
-        headers = {}
-        if self.bearer_token:
-            headers["Authorization"] = f"Bearer {self.bearer_token}"
-        elif self.api_key:
-            headers[self.api_key_header] = self.api_key
+        # Get authentication headers
+        headers = self._get_request_headers()

         # Prepare request parameters
         url = f"{self.rpc_uri_stem}/api/v0/add?hash={self.hasher}&pin={pin_string}"

Note: I see from past review comments that there was a suggestion both for and against extracting this logic. It's ultimately a matter of code style preference.

.github/workflows/run-checks.yaml (2)

28-80: Good Nginx setup for authentication testing

The Nginx configuration properly sets up a reverse proxy with authentication checks that match the authentication methods implemented in the IPFSStore class.

There's a trailing whitespace on line 80 that should be removed:

- sudo nginx -t
-      
+ sudo nginx -t
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 80-80: trailing spaces

(trailing-spaces)


35-72: Security considerations for authentication credentials

Using hardcoded credentials (test) is acceptable for testing, but consider using environment variables or GitHub secrets for a more secure pattern, especially if this might be reused in other environments.

- if ($http_x_api_key = "test") {
+ if ($http_x_api_key = "$ENV{TEST_API_KEY}") {

And similar changes for the other authentication methods.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9be3d4 and dba091f.

📒 Files selected for processing (3)
  • .github/workflows/run-checks.yaml (1 hunks)
  • py_hamt/store.py (4 hunks)
  • tests/test_zarr_ipfs.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_zarr_ipfs.py
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/run-checks.yaml

[error] 80-80: trailing spaces

(trailing-spaces)

🔇 Additional comments (6)
py_hamt/store.py (5)

69-74: Documentation enhancement looks good!

The expanded class docstring clearly describes the new authentication methods supported by the IPFSStore class.


84-89: Authentication parameters implementation looks appropriate

The constructor has been properly extended with well-typed authentication parameters that match the documentation.


113-122: Well-documented instance variables

Each authentication-related instance variable has a clear docstring explaining its purpose.


147-147: Good variable naming for the URL

Moving the URL to a separate variable with a descriptive name improves readability.


182-205:

❓ Verification inconclusive

Consider adding authentication for load method

The load method uses the HTTP Gateway but doesn't include any authentication parameters. If the gateway also requires authentication, consider updating this method to include authentication headers.


🏁 Script executed:

#!/bin/bash
# Check if the gateway URL might need authentication by testing access patterns in tests
rg -A 3 "gateway_uri_stem" tests/

Length of output: 33


Below is a revised review comment. Since the initial search for test references to authentication (using “gateway_uri_stem”) returned no output, and no evidence of authentication headers is seen elsewhere in the codebase, it remains unclear whether the HTTP gateway requires authentication. In light of this, please manually verify if the gateway requires authentication; if it does, consider enhancing the load method to pass authentication headers (or other credentials) to the GET request.

  • Action Point: Confirm with the HTTP gateway documentation or other parts of the project whether authentication is needed.
  • If required: Update the method to include configurable authentication (e.g., by accepting an optional authentication parameter and passing corresponding headers to requests.get).

Attention: Manual verification is advised for the authentication requirement, as current test searches (e.g. for “gateway_uri_stem” and “Authorization”) do not yield evidence of such use.

.github/workflows/run-checks.yaml (1)

81-92: Service restart logic looks good

The steps to restart the services and wait for them to initialize are appropriate and necessary for the tests to run correctly.

@abidsikder abidsikder merged commit 3f5d0c9 into main Mar 20, 2025
2 checks passed
@abidsikder abidsikder deleted the add-authentication branch March 20, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants