feat: Add `JsonParser` component to declarative framework #166

pnilan · 2024-12-11T16:08:30Z

What

Adds new JsonParser component which inherits from the Parser interface.
Resolves feat(JsonParser) - Create new JsonParser component #164

Recommended Reviewing Order:

declarative_component_schema.yaml
composite_raw_decoder.py
model_to_component_factory.py
test_composite_decoder.py

Summary by CodeRabbit

Release Notes

New Features
- Added JSON parsing capabilities to the declarative components framework.
- Introduced JsonParser for handling JSON data with flexible encoding support.
Improvements
- Enhanced data handling with new parsing methods, including support for various data types and error handling.
- Improved readability and consistency of documentation.
Testing
- Added comprehensive unit tests for new JSON parsing functionality.
- Expanded test coverage for error handling in JSON parsing.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1754-1757: Consider adding parameter validation and documentation, wdyt?

The implementation looks good, but could benefit from:

Docstring explaining the purpose and parameters
Validation for supported encoding values

Here's a suggested improvement:

    @staticmethod
    def create_json_parser(model: JsonParserModel, config: Config, **kwargs: Any) -> JsonParser:
+       """Creates a JsonParser instance that parses JSON data with the specified encoding.
+       
+       Args:
+           model: The JsonParser model containing configuration
+           config: The connector configuration
+           **kwargs: Additional keyword arguments
+           
+       Returns:
+           JsonParser: A configured JSON parser instance
+       """
+       if model.encoding and not model.encoding.lower() in ['utf-8', 'utf-16', 'ascii']:
+           raise ValueError(f"Unsupported encoding: {model.encoding}")
        encoding = model.encoding or "utf-8"
        return JsonParser(encoding=encoding)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51118f1 and 34a710d.

📒 Files selected for processing (2)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-the-guardian-api' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Analyze (python)

🔇 Additional comments (3)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

75-75: LGTM!

The import of JsonParser from the composite_raw_decoder module is correctly placed with other decoder imports.

251-253: LGTM!

The import of JsonParserModel follows the established pattern of importing models from the declarative_component_schema module.

529-529: LGTM!

The JsonParserModel is correctly mapped to its factory method in the PYDANTIC_MODEL_TO_CONSTRUCTOR dictionary.

maxi297

My understanding is that from there, we will start parsing the json twice if there is pagination defined: once during dpath extraction and once during pagination.

I don't think we need to fix now that but I'd like to:

Make sure we use an efficient JSON parser (orjson)
Poke our master performance on this ticket to make sure he knows @artem1205

I think the longer term solution would be to have a response object so that we can keep the result of the parsing in-memory but it will not be easy to do since our interfaces depends on the requests.Response.

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
136-147: Consider adding more edge cases, wdyt?

The test implementation looks solid! A few suggestions to make it even more robust:

Add edge cases like empty objects {} and nested structures

Include error cases (invalid JSON, encoding errors)

Validate that the encoding parameter is being respected

Here's a possible enhancement to the test parameters:
     "data",
     [
         ({"data-type": "string"}),
         ([{"id": 1}, {"id": 2}]),
         ({"id": 170_141_183_460_469_231_731_687_303_715_884_105_727}),
+        ({}),  # empty object
+        ({"nested": {"foo": {"bar": "baz"}}}),  # nested structure
     ],
     ids=[
         "test_with_buffered_io_base_data_containing_string",
         "test_with_buffered_io_base_data_containing_list",
         "test_with_buffered_io_base_data_containing_int128",
+        "test_with_empty_object",
+        "test_with_nested_structure",
     ],
And maybe add a separate test for error cases:
def test_json_parser_with_invalid_data():
    parser = JsonParser(encoding="utf-8")
    with pytest.raises(json.JSONDecodeError):
        next(parser.parse(BufferedReader(BytesIO(b"invalid json"))))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34a710d and 060178a.

📒 Files selected for processing (2)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (2 hunks)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Analyze (python)

🔇 Additional comments (2)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)

7-7: LGTM! Clean import additions.

The new imports are well-organized and necessary for the JsonParser test implementation.

Also applies to: 17-17

123-135: Great test parameterization! 🎯

The test cases cover a good variety of data types and the test IDs are very descriptive.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1756-1760: Consider enhancing error handling for edge cases.

The implementation is clean and follows the pattern of other parser creation methods. However, we could make it more robust by validating the encoding value. What do you think about adding validation to ensure the encoding is supported by Python's codec registry? Here's a suggestion:
 @staticmethod
 def create_json_parser(model: JsonParserModel, config: Config, **kwargs: Any) -> JsonParser:
     encoding = model.encoding or "utf-8"
+    try:
+        "test".encode(encoding)
+    except LookupError as e:
+        raise ValueError(f"Unsupported encoding: {encoding}") from e
     return JsonParser(encoding=encoding)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 060178a and bf8dd26.

📒 Files selected for processing (3)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (12 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml
airbyte_cdk/sources/declarative/models/declarative_component_schema.py

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-the-guardian-api' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)

🔇 Additional comments (3)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

75-75: LGTM!

The import is correctly placed in the decoders section.

251-253: LGTM!

The model import follows the established pattern.

529-529: LGTM!

The constructor mapping is added in alphabetical order.

artem1205

LGTM overall, approving to unblock. please take a look at my latest comment

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ce2c28 and 7e7b2c4.

📒 Files selected for processing (3)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (2 hunks)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-the-guardian-api' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (5)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (2)

10-10: LGTM! The new imports look good.

The added imports support the new JsonParser implementation with error handling capabilities.

Also applies to: 13-15

Line range hint 1582-1582: LGTM! The GzipParser integration is correct.

The JsonParser is properly added as a valid inner_parser type for GzipParser.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

740-748: LGTM! The FlattenFields implementation looks clean.

The class follows the established pattern with proper type hints and default values.

1176-1182: LGTM! The JsonParser model looks good.

The implementation is consistent with other parser models in the file.

1617-1617: LGTM! The component integrations are properly implemented.

JsonParser is correctly added as a valid parser type in the Union types.

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py

docs/RELEASES.md

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2907-2920: Consider enhancing the JsonParser component definition.

The implementation looks good, but we could make it more robust. What do you think about:

Adding validation for the encoding field to restrict it to common encodings (utf-8, ascii, etc.)?

Including an optional allow_invalid_json boolean property to handle malformed JSON gracefully?

Adding an example showing how to handle JSON with different encodings?

Here's a suggested enhancement:
  JsonParser:
    title: JsonParser
    description: Parser used for parsing str, bytes, or bytearray data and returning data in a dictionary format.
    type: object
    additionalProperties: true
    required:
      - type
    properties:
      type:
        type: string
        enum: [JsonParser]
      encoding:
        type: string
        default: utf-8
+       enum: [utf-8, ascii, utf-16, utf-32]
+       description: The character encoding to use when decoding the input data.
+     allow_invalid_json:
+       type: boolean
+       default: false
+       description: When set to true, attempts to handle malformed JSON by skipping invalid records instead of failing.
+     examples:
+       - type: JsonParser
+         encoding: utf-16
+       - type: JsonParser
+         encoding: utf-8
+         allow_invalid_json: true
What do you think about these additions? They would make the component more flexible and easier to use correctly.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
1756-1759: Consider adding a docstring for better documentation?

The implementation looks good, but would you consider adding a docstring to document the parameters and return type? This would help maintain consistency with other factory methods, wdyt?

Here's a suggested addition:
    @staticmethod
    def create_json_parser(model: JsonParserModel, config: Config, **kwargs: Any) -> JsonParser:
+        """Creates a JsonParser instance.
+
+        Args:
+            model: The JsonParser model containing the configuration
+            config: The connector configuration
+            **kwargs: Additional keyword arguments
+
+        Returns:
+            JsonParser: A configured JsonParser instance
+        """
        encoding = model.encoding if model.encoding else "utf-8"
        return JsonParser(encoding=encoding)
2569-2589: Consider making the error message more concise?

The validation logic looks solid, but the error message is quite lengthy. Would you consider a more concise version that maintains clarity? For example:
-    _UNSUPPORTED_DECODER_ERROR = (
-        "Specified decoder of {decoder_type} is not supported for pagination."
-        "Please set as `JsonDecoder`, `XmlDecoder`, or a `CompositeRawDecoder` with an inner_parser of `JsonParser` or `GzipParser` instead."
-        "If using `GzipParser`, please ensure that the lowest level inner_parser is a `JsonParser`."
-    )
+    _UNSUPPORTED_DECODER_ERROR = (
+        "Decoder {decoder_type} is not supported for pagination. Use `JsonDecoder`, `XmlDecoder`, "
+        "or `CompositeRawDecoder` with `JsonParser` or `GzipParser(JsonParser)` as inner_parser."
+    )
1041-1044: Consider extracting common decoder handling logic?

I notice similar code patterns for handling decoders in both create_cursor_pagination and create_offset_increment. Would you consider extracting this into a helper method to reduce duplication? Something like:
def _prepare_pagination_decoder(self, decoder: Decoder) -> Decoder:
    """Prepares a decoder for pagination by wrapping it in PaginationDecoderDecorator if needed."""
    if isinstance(decoder, PaginationDecoderDecorator):
        return decoder
    return PaginationDecoderDecorator(decoder=decoder)
This could simplify both methods:
decoder = self._prepare_pagination_decoder(decoder)
inner_decoder = decoder.decoder if isinstance(decoder, PaginationDecoderDecorator) else decoder
Also applies to: 1954-1961

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f550f2 and 27bf5a7.

📒 Files selected for processing (3)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (8 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

airbyte_cdk/sources/declarative/models/declarative_component_schema.py

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-the-guardian-api' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Analyze (python)

🔇 Additional comments (5)

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

681-681: LGTM! Minor formatting improvement.

The enum formatting has been cleaned up by removing unnecessary whitespace.

Line range hint 2889-2906: LGTM! Consistent component reference.

The JsonParser has been correctly added to the list of allowed parsers in the CompositeRawDecoder component.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

75-76: LGTM! Clean import additions.

The new imports for JsonParser and Parser are well-organized within the existing decoders import block.

252-254: LGTM! Consistent model import.

The JsonParserModel import follows the established pattern for model imports.

530-530: LGTM! Clean mapping addition.

The JsonParserModel to create_json_parser mapping is correctly added to the constructor dictionary.

natikgadzhi

A couple nits, perhaps nuke the allowAdditionalFields? Otherwise looks good, shipit.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

Co-authored-by: Natik Gadzhi <[email protected]>

pnilan added 4 commits December 10, 2024 15:59

initial JsonParser component

e68f36f

update parser

a8a7bb3

add tests for json parser

254f877

update parser and tests to yield empty dict if unparseable.

8df239a

github-actions bot added the enhancement New feature or request label Dec 11, 2024

This comment was marked as off-topic.

Sign in to view

coderabbitai bot approved these changes Dec 11, 2024

View reviewed changes

This comment was marked as off-topic.

Sign in to view

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

pnilan added 6 commits December 11, 2024 15:36

chore: format code

92574df

Merge branch 'main' into pnilan/declarative/parsers

82a15c9

Merge branch 'main' into pnilan/declarative/parsers

0b3b5e1

conform tests

9fd93cb

initial test updates

1892a03

update JsonParser and relevant tests

51118f1

This comment was marked as outdated.

Sign in to view

chore: format/type-check

34a710d

coderabbitai bot reviewed Jan 10, 2025

View reviewed changes

pnilan requested review from maxi297 and artem1205 January 10, 2025 22:56

pnilan changed the title ~~feat: Adds Parser interface and JsonParser component to declarative framework~~ feat: Add JsonParser component to declarative framework Jan 10, 2025

maxi297 reviewed Jan 11, 2025

View reviewed changes

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py Outdated Show resolved Hide resolved

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py Outdated Show resolved Hide resolved

pnilan added 2 commits January 14, 2025 08:24

remove orjson from composite_raw_decoder file

060178a

Merge branch 'main' into pnilan/declarative/parsers

bf8dd26

coderabbitai bot reviewed Jan 14, 2025

View reviewed changes

pnilan added 2 commits January 14, 2025 08:37

chore: format code

d9b6df3

add additional test

f20fffc

pnilan temporarily deployed to DockerHub January 14, 2025 21:02 — with GitHub Actions Inactive

artem1205 approved these changes Jan 14, 2025

View reviewed changes

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py Outdated Show resolved Hide resolved

add JsonParser to GzipDecoder and CompositeRawDecoder "anyOf" list

7e7b2c4

coderabbitai bot requested changes Jan 14, 2025

View reviewed changes

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py Outdated Show resolved Hide resolved

pnilan added 2 commits January 14, 2025 14:08

update to simplify orjson/json parsing

23cbfb7

chore: type-check

1c2a832

pnilan temporarily deployed to DockerHub January 14, 2025 22:15 — with GitHub Actions Inactive

pnilan temporarily deployed to PyPi January 14, 2025 22:15 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

pnilan added 3 commits January 14, 2025 14:36

unlock CompositeRawDecoder w/ JsonParser support for pagination

66aaae9

update conditional validations for decoders/parsers for pagination

00cf7b1

remove errant print

b7aa78f

This comment was marked as outdated.

Sign in to view

chore: coderabbitai suggestions

7b41732

pnilan commented Jan 15, 2025

View reviewed changes

docs/RELEASES.md Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

update parservalidation method

3f550f2

This comment was marked as outdated.

Sign in to view

pnilan temporarily deployed to PyPi January 15, 2025 21:41 — with GitHub Actions Inactive

pnilan temporarily deployed to DockerHub January 15, 2025 21:41 — with GitHub Actions Inactive

Merge branch 'main' into pnilan/declarative/parsers

27bf5a7

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

Merge branch 'main' into pnilan/declarative/parsers

e691f79

natikgadzhi approved these changes Jan 16, 2025

View reviewed changes

airbyte_cdk/sources/declarative/declarative_component_schema.yaml Outdated Show resolved Hide resolved

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Show resolved Hide resolved

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Show resolved Hide resolved

Update airbyte_cdk/sources/declarative/declarative_component_schema.yaml

bb63934

Co-authored-by: Natik Gadzhi <[email protected]>

coderabbitai bot approved these changes Jan 16, 2025

View reviewed changes

pnilan merged commit 40a9f1e into main Jan 16, 2025
19 checks passed

pnilan deleted the pnilan/declarative/parsers branch January 16, 2025 05:42

coderabbitai bot mentioned this pull request Jan 16, 2025

feat: Adds ZipfileDecoder component #169

Merged

coderabbitai bot mentioned this pull request Feb 6, 2025

feat(low-code): Add API Budget #314

Merged

feat: Add JsonParser component to declarative framework #166

feat: Add JsonParser component to declarative framework #166

Uh oh!

Conversation

pnilan commented Dec 11, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Recommended Reviewing Order:

Summary by CodeRabbit

Release Notes

Uh oh!

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as off-topic.

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

maxi297 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

artem1205 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

natikgadzhi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: Add `JsonParser` component to declarative framework #166

feat: Add `JsonParser` component to declarative framework #166

pnilan commented Dec 11, 2024 •

edited by coderabbitai bot

Loading