Skip to content

feat(cdk): connector builder support for file uploader #503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

aldogonzalez8
Copy link
Contributor

@aldogonzalez8 aldogonzalez8 commented Apr 23, 2025

What

We want to implement support for the connector builder to communicate about the files that will be uploaded for the streams.

Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/12424

How

Current Implementation:

  • Runtime (emit_connector_builder_messages=False):
    • FileUploader is instantiated with a standard FileWriter, which performs the actual file writing.
  • Connector Builder (emit_connector_builder_messages=True):
    • FileUploader is instantiated with a NoopFileWriter, preventing actual file writes.
    • This FileUploader instance is then wrapped by ConnectorBuilderFileUploader.
    • ConnectorBuilderFileUploader.upload() first calls the wrapped FileUploader.upload() (which populates record.file_reference but doesn't write the file) and then performs the Connector Builder-specific action of copying file_reference attributes into record.data.

Summary by CodeRabbit

  • New Features

    • Introduced a flexible file upload system with support for custom file writing strategies and enhanced file reference metadata in records.
    • Added new file writer options, including a no-operation writer and a local filesystem writer, to control file storage behavior.
    • Improved integration for connector builder workflows, enabling file reference fields to be included in record data.
  • Bug Fixes

    • Enhanced test coverage to ensure correct file reference handling and metadata propagation in connector builder scenarios.

Copy link
Contributor

coderabbitai bot commented Apr 23, 2025

📝 Walkthrough

Walkthrough

This update introduces a modular file uploading architecture by defining new abstract base classes (FileUploader, FileWriter) and several concrete implementations (DefaultFileUploader, ConnectorBuilderFileUploader, LocalFileSystemFileWriter, NoopFileWriter). The control flow for file uploads is refactored to allow injection of different file writing strategies. Type annotations are updated throughout to reflect these changes. A new test is added to verify the correct behavior when emitting connector builder messages, ensuring file reference metadata is copied into record data. Public exports are consolidated via a new __init__.py in the file uploader package.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py,
airbyte_cdk/sources/declarative/extractors/record_selector.py
Updated imports and type annotations to use DefaultFileUploader instead of FileUploader. Refactored logic in create_file_uploader to instantiate and return either a ConnectorBuilderFileUploader or a DefaultFileUploader based on configuration.
airbyte_cdk/sources/declarative/retrievers/file_uploader/__init__.py New module file added to re-export core file uploader and writer classes, defining the package’s public API.
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py Introduced new abstract base class FileUploader with an abstract upload method.
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py Introduced new abstract base class FileWriter with an abstract write method.
airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py Refactored previous FileUploader implementation into DefaultFileUploader, now inheriting from FileUploader and accepting a FileWriter dependency. File writing is delegated to the injected file_writer.
airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py Added new ConnectorBuilderFileUploader class, which wraps a DefaultFileUploader and copies file reference attributes into record data post-upload.
airbyte_cdk/sources/declarative/retrievers/file_uploader/local_file_system_file_writer.py Added LocalFileSystemFileWriter class implementing file writing to the local filesystem.
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py Added NoopFileWriter class implementing a no-op file writer that returns a fixed size (-1).
unit_tests/sources/declarative/file/test_file_stream.py Enhanced tests: imported new classes, refactored assertions, and added a test verifying correct file reference metadata propagation when emitting connector builder messages.

Sequence Diagram(s)

sequenceDiagram
    participant ModelToComponentFactory
    participant DefaultFileUploader
    participant ConnectorBuilderFileUploader
    participant FileWriter
    participant Record

    ModelToComponentFactory->>DefaultFileUploader: Instantiate with FileWriter
    ModelToComponentFactory->>ConnectorBuilderFileUploader: (if emit_connector_builder_messages) Wrap DefaultFileUploader
    Record->>DefaultFileUploader: upload(record)
    DefaultFileUploader->>FileWriter: write(file_path, content)
    FileWriter-->>DefaultFileUploader: file_size
    DefaultFileUploader-->>Record: Update file_reference
    ConnectorBuilderFileUploader->>DefaultFileUploader: upload(record)
    ConnectorBuilderFileUploader->>Record: Copy file_reference fields into data
Loading

Would you like to see a more detailed breakdown of the new test logic or perhaps a diagram focusing on the decorator pattern used for the connector builder uploader? Wdyt?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa0a832 and c8e29d1.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/file/test_file_stream.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • unit_tests/sources/declarative/file/test_file_stream.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1)

10-15: Implementation looks good!

The NoopFileWriter correctly implements the abstract method from BaseFileWriter and returns 0 to indicate no bytes were written, which aligns with its purpose to avoid actual file operations.

Consider enhancing the method docstring to explain behavior rather than class purpose:

    def write(self, file_path: Path, content: bytes) -> int:
        """
-        Noop file writer
+        No-operation implementation that returns 0 bytes written without 
+        writing any content to the file system.
        """
        return 0
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1)

10-18: Implementation looks good but lacks error handling

The FileWriter correctly implements writing bytes to a file and returns the size. However, it doesn't handle potential file I/O exceptions that might occur (disk full, permission issues, etc).

Would adding some basic error handling provide better diagnostics, wdyt?

def write(self, file_path: Path, content: bytes) -> int:
    """
    Writes the file to the specified location
    """
-    with open(str(file_path), "wb") as f:
-        f.write(content)
-
-    return file_path.stat().st_size
+    try:
+        with open(str(file_path), "wb") as f:
+            f.write(content)
+        
+        return file_path.stat().st_size
+    except (IOError, OSError) as e:
+        logger.error(f"Failed to write file to {file_path}: {str(e)}")
+        raise
airbyte_cdk/sources/declarative/retrievers/file_uploader/__init__.py (1)

1-15: Good module exports

The package initialization file correctly exports all relevant classes. One minor suggestion - would it make sense to organize the __all__ list in the same order as the imports for consistency and readability, wdyt?

__all__ = [
-    "FileUploader",
-    "FileWriter",
-    "NoopFileWriter",
-    "ConnectorBuilderFileUploader",
-    "BaseFileUploader",
-    "BaseFileWriter",
+    "BaseFileUploader",
+    "BaseFileWriter",
+    "ConnectorBuilderFileUploader",
+    "FileUploader",
+    "FileWriter",
+    "NoopFileWriter",
]
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_writer.py (1)

14-19: Could we clarify the return value in the docstring?

The write method returns an integer but doesn't specify what this integer represents. Could we update the docstring to clarify that this is the number of bytes written? This would make the contract more explicit for implementers, wdyt?

    def write(self, file_path: Path, content: bytes) -> int:
        """
        Writes the file to the specified location
+       :return: The number of bytes written
        """
        ...
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (1)

90-90: Should we add error handling around file writing?

The file writing operation could fail for various reasons (disk full, permissions, etc.). Would it make sense to add some error handling here? Maybe catch and log specific exceptions or wrap them in a more descriptive custom exception, wdyt?

-            file_size_bytes = self.file_writer.write(full_path, content=response.content)
+            try:
+                file_size_bytes = self.file_writer.write(full_path, content=response.content)
+            except Exception as e:
+                logger.error(f"Failed to write file to {full_path}: {e}")
+                raise ValueError(f"Failed to upload file: {str(e)}") from e
airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1)

22-26: Could we add a docstring to the upload method and handle potential edge cases?

The implementation looks good, but would you consider adding a docstring to explain the behavior more explicitly? Also, what happens if file_reference is None? Could we add a quick check to handle that case, wdyt?

    def upload(self, record: Record) -> None:
+        """
+        Uploads a file using the wrapped file_uploader and copies file_reference attributes to record.data.
+        
+        This enables connector builder to communicate about files without writing them.
+        """
        self.file_uploader.upload(record=record)
+        if record.file_reference is None:
+            return
        for file_reference_key, file_reference_value in record.file_reference.__dict__.items():
            if not file_reference_key.startswith("_"):
                record.data[file_reference_key] = file_reference_value  # type: ignore
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f29eff and 0fbf122.

📒 Files selected for processing (10)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_uploader.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_writer.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (2 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1 hunks)
  • airbyte_cdk/sources/declarative/yaml_declarative_source.py (2 hunks)
  • unit_tests/sources/declarative/file/test_file_stream.py (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
airbyte_cdk/sources/declarative/retrievers/file_uploader/__init__.py (6)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_uploader.py (1)
  • BaseFileUploader (12-22)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_writer.py (1)
  • BaseFileWriter (9-19)
airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1)
  • ConnectorBuilderFileUploader (14-26)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (1)
  • FileUploader (32-101)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1)
  • FileWriter (10-18)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1)
  • NoopFileWriter (10-15)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (2)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_writer.py (2)
  • BaseFileWriter (9-19)
  • write (15-19)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1)
  • write (11-18)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (2)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_writer.py (2)
  • BaseFileWriter (9-19)
  • write (15-19)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1)
  • write (11-15)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_uploader.py (3)
airbyte_cdk/sources/types.py (1)
  • Record (21-72)
airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1)
  • upload (22-26)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (1)
  • upload (55-101)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_uploader.py (1)
  • BaseFileUploader (12-22)
airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1)
  • ConnectorBuilderFileUploader (14-26)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (1)
  • FileUploader (32-101)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • FileUploader (2069-2091)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1)
  • FileWriter (10-18)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1)
  • NoopFileWriter (10-15)
airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (3)
airbyte_cdk/sources/types.py (2)
  • Record (21-72)
  • data (35-36)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_uploader.py (2)
  • BaseFileUploader (12-22)
  • upload (18-22)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (2)
  • FileUploader (32-101)
  • upload (55-101)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (5)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_uploader.py (1)
  • BaseFileUploader (12-22)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_writer.py (2)
  • BaseFileWriter (9-19)
  • write (15-19)
airbyte_cdk/sources/declarative/requesters/requester.py (1)
  • Requester (29-144)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1)
  • write (11-15)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1)
  • write (11-18)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (10)
airbyte_cdk/sources/declarative/retrievers/file_uploader/base_file_uploader.py (1)

11-22: Solid abstract base class definition

The BaseFileUploader abstract base class is well-defined with a clear contract for subclasses. The @dataclass annotation together with the abstract method provides a good foundation for file uploader implementations.

airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)

27-27: LGTM! Clean parameter passing implementation

The parameter addition and passing to the superclass looks good. The or False fallback ensures we always have a boolean value even if None is provided.

Also applies to: 40-40

airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (2)

25-37: LGTM! Clean abstraction implementation

The inheritance from BaseFileUploader and the comprehensive docstring explaining the class's purpose and behavior look great.


42-42: LGTM! Proper dependency injection

Using the abstract BaseFileWriter type for dependency injection follows good design principles, allowing for different file writing strategies to be injected.

airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1)

13-20: LGTM! Well-designed decorator pattern implementation

The class follows the decorator pattern well, wrapping a FileUploader instance and extending its behavior without modifying it. The docstring clearly explains its purpose.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

484-490: Import statement expanded to support file upload abstraction

These import changes are part of the new file upload abstraction layer. Nice refactoring to support the connector builder mode with specialized classes for different responsibilities.


3601-3601: Return type updated to use abstraction instead of concrete implementation

Changing the return type from FileUploader to BaseFileUploader follows good OOP principles by programming to an interface. This allows different implementations to be returned based on the context, which is exactly what you're doing here. Nice job!


3615-3629: Conditional file upload behavior based on connector builder mode

The implementation properly branches based on whether connector builder messages are enabled:

  1. Uses NoopFileWriter to prevent actual file writing in connector builder mode
  2. Wraps the FileUploader with ConnectorBuilderFileUploader to copy file reference metadata into record data

This looks like a clean implementation of the feature described in the PR objectives. The connector builder can now communicate about files without writing them.

unit_tests/sources/declarative/file/test_file_stream.py (2)

37-37: Added emit_connector_builder_messages parameter to helper functions

The test helper functions have been updated to propagate the new emit_connector_builder_messages parameter through to the YamlDeclarativeSource constructor. This is essential for testing both modes of operation.

Also applies to: 46-46, 56-57, 61-65


200-243: Added test for connector builder file upload functionality

Excellent test coverage for the new connector builder file upload functionality. The test:

  1. Verifies file references are created without actual file writing (file_size_bytes == 0)
  2. Confirms file reference attributes are properly copied into the record data
  3. Tests the complete flow with mocked HTTP requests

This test aligns perfectly with the PR objectives and provides good verification of the feature.

aldogonzalez8 and others added 2 commits April 23, 2025 11:41
…file_writer.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@aldogonzalez8
Copy link
Contributor Author

aldogonzalez8 commented Apr 23, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

🟦 Job completed successfully (no changes).

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of comments but overall it looks like we are on track

@@ -24,6 +24,7 @@ def __init__(
catalog: Optional[ConfiguredAirbyteCatalog] = None,
config: Optional[Mapping[str, Any]] = None,
state: Optional[List[AirbyteStateMessage]] = None,
emit_connector_builder_messages: Optional[bool] = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I guess this isn't needed yet (until we do #460) but we wanted to preemptively set this up, is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just needed a better patching, so there's no need to make any changes for now in this class. Removed this.

from pathlib import Path


class BaseFileWriter(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base does not seem in our vocabulary for interfaces. Could we have something like this?

  • Parent: FileWriter
  • LocalFileSystemFileWriter
  • NoopFileWriter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i have fixed names to match our vocabulary.

@aldogonzalez8
Copy link
Contributor Author

/autofix

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py (1)

1-1: 💡 Verification agent

🧩 Analysis chain

Formatting issue flagged by linter

The pipeline shows a Ruff formatting error. Would you mind running the formatter to fix this?


🏁 Script executed:

#!/bin/bash
# Description: Show what formatting issues Ruff is detecting
ruff format --diff airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py

Length of output: 858


Apply Ruff formatting to default_file_uploader.py

It looks like Ruff wants to split that long NotImplementedError call into multiple lines. Could you apply the formatter to fix this? wdyt?

  • File: airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py at line 72

Suggested diff:

@@ -72,7 +72,9 @@
         if self.content_extractor:
-            raise NotImplementedError("Content extraction is not yet implemented. The content_extractor component is currently not supported.")
+            raise NotImplementedError(
+                "Content extraction is not yet implemented. The content_extractor component is currently not supported."
+            )

Maybe run:

ruff format --fix airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py

to apply these changes?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 1-1: Ruff formatting check failed. File would be reformatted.

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/retrievers/file_uploader/local_file_system_file_writer.py (1)

10-18: Implementation looks good, but should we add error handling?

The file writing and size calculation logic is correct, but I'm wondering if we should add some error handling for common file operations issues? If a file can't be written due to permissions or disk space issues, this will propagate exceptions directly to the caller - is that intentional behavior?

The stat() call could also fail if something happens between writing and checking the size. What do you think about adding a try/except block, wdyt?

 def write(self, file_path: Path, content: bytes) -> int:
     """
     Writes the file to the specified location
     """
-    with open(str(file_path), "wb") as f:
-        f.write(content)
-
-    return file_path.stat().st_size
+    try:
+        with open(str(file_path), "wb") as f:
+            f.write(content)
+        
+        return file_path.stat().st_size
+    except (IOError, OSError) as e:
+        # Log the error with more context
+        logging.error(f"Failed to write file {file_path}: {str(e)}")
+        raise
airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py (2)

32-37: Great job on splitting file uploading from file writing!

I like the refactoring that introduces the abstract base class hierarchy. The docstring explains the basic responsibility well, but could we maybe expand it to clarify how this fits into the connector builder pattern? Perhaps mention when you'd use different writer implementations? This would help future devs understand the design choices better, wdyt?


75-75: Error message is clear, but what about alternatives?

The error message is improved, but could we perhaps add a hint about what approaches users should take instead of using content extraction? Something like "Please use X approach instead" would be super helpful for developers hitting this error, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06e5581 and e860e3f.

📒 Files selected for processing (10)
  • airbyte_cdk/sources/declarative/extractors/record_selector.py (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py (3 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/local_file_system_file_writer.py (1 hunks)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1 hunks)
  • unit_tests/sources/declarative/file/test_file_stream.py (5 hunks)
✅ Files skipped from review due to trivial changes (2)
  • airbyte_cdk/sources/declarative/extractors/record_selector.py
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/init.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py
  • unit_tests/sources/declarative/file/test_file_stream.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧰 Additional context used
🧬 Code Graph Analysis (4)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (3)
airbyte_cdk/sources/types.py (1)
  • Record (21-72)
airbyte_cdk/sources/declarative/retrievers/file_uploader/connector_builder_file_uploader.py (1)
  • upload (22-26)
airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py (1)
  • upload (55-101)
airbyte_cdk/sources/declarative/retrievers/file_uploader/local_file_system_file_writer.py (2)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (2)
  • FileWriter (9-19)
  • write (15-19)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1)
  • write (12-16)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (2)
airbyte_cdk/sources/declarative/retrievers/file_uploader/local_file_system_file_writer.py (1)
  • write (11-18)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (1)
  • write (12-16)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (2)
airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (2)
  • FileWriter (9-19)
  • write (15-19)
airbyte_cdk/sources/declarative/retrievers/file_uploader/local_file_system_file_writer.py (1)
  • write (11-18)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py

[error] 1-1: Ruff formatting check failed. File would be reformatted.

airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py

[error] 1-1: Ruff formatting check failed. File would be reformatted.

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/retrievers/file_uploader/noop_file_writer.py (2)

5-7: Looks good, but should we be more explicit with the import?

The import from .file_writer is correct, but a previous reviewer suggested importing directly from .base_file_writer instead. Since we appear to have moved on with this approach, I'm fine with it, but want to ensure this was intentional.


10-12: Good call on setting NOOP_FILE_SIZE to -1

Setting this to -1 makes it very clear in the ConnectorBuilder UI that this is a test/placeholder value rather than a real file size of 0 bytes. This addresses a previous reviewer's question about making it obvious this is a Connector Builder test.

airbyte_cdk/sources/declarative/retrievers/file_uploader/file_writer.py (1)

9-19: Clean abstract interface definition

The abstract base class is well-defined with an appropriate method signature and clear docstring. The return type of int to represent the file size in bytes makes sense and aligns with both implementations.

airbyte_cdk/sources/declarative/retrievers/file_uploader/file_uploader.py (1)

11-22: Well-designed abstract interface with @DataClass

The abstract class definition with the dataclass decorator looks good. This will allow concrete implementations to easily define and initialize their properties.

Your method signature for upload() correctly handles a Record object and updates its file_reference property (based on the implementation in the concrete classes I can see in the snippets).

airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py (2)

42-42: Looks good - type-safe dependency injection

Nice implementation of dependency injection for the file writer. This abstraction will make testing much easier!


90-90: Clean delegation to file writer - nice!

The implementation correctly delegates to the injected file writer. This makes the code more flexible and testable.

@aldogonzalez8 aldogonzalez8 requested a review from maxi297 April 24, 2025 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants