-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add JsonParser
component to declarative framework
#166
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1754-1757
: Consider adding parameter validation and documentation, wdyt?The implementation looks good, but could benefit from:
- Docstring explaining the purpose and parameters
- Validation for supported encoding values
Here's a suggested improvement:
@staticmethod def create_json_parser(model: JsonParserModel, config: Config, **kwargs: Any) -> JsonParser: + """Creates a JsonParser instance that parses JSON data with the specified encoding. + + Args: + model: The JsonParser model containing configuration + config: The connector configuration + **kwargs: Additional keyword arguments + + Returns: + JsonParser: A configured JSON parser instance + """ + if model.encoding and not model.encoding.lower() in ['utf-8', 'utf-16', 'ascii']: + raise ValueError(f"Unsupported encoding: {model.encoding}") encoding = model.encoding or "utf-8" return JsonParser(encoding=encoding)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
75-75
: LGTM!The import of
JsonParser
from the composite_raw_decoder module is correctly placed with other decoder imports.
251-253
: LGTM!The import of
JsonParserModel
follows the established pattern of importing models from the declarative_component_schema module.
529-529
: LGTM!The
JsonParserModel
is correctly mapped to its factory method in thePYDANTIC_MODEL_TO_CONSTRUCTOR
dictionary.
JsonParser
component to declarative framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that from there, we will start parsing the json twice if there is pagination defined: once during dpath extraction and once during pagination.
I don't think we need to fix now that but I'd like to:
- Make sure we use an efficient JSON parser (orjson)
- Poke our master performance on this ticket to make sure he knows @artem1205
I think the longer term solution would be to have a response
object so that we can keep the result of the parsing in-memory but it will not be easy to do since our interfaces depends on the requests.Response.
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
136-147
: Consider adding more edge cases, wdyt?The test implementation looks solid! A few suggestions to make it even more robust:
- Add edge cases like empty objects
{}
and nested structures- Include error cases (invalid JSON, encoding errors)
- Validate that the encoding parameter is being respected
Here's a possible enhancement to the test parameters:
"data", [ ({"data-type": "string"}), ([{"id": 1}, {"id": 2}]), ({"id": 170_141_183_460_469_231_731_687_303_715_884_105_727}), + ({}), # empty object + ({"nested": {"foo": {"bar": "baz"}}}), # nested structure ], ids=[ "test_with_buffered_io_base_data_containing_string", "test_with_buffered_io_base_data_containing_list", "test_with_buffered_io_base_data_containing_int128", + "test_with_empty_object", + "test_with_nested_structure", ],And maybe add a separate test for error cases:
def test_json_parser_with_invalid_data(): parser = JsonParser(encoding="utf-8") with pytest.raises(json.JSONDecodeError): next(parser.parse(BufferedReader(BytesIO(b"invalid json"))))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
(2 hunks)unit_tests/sources/declarative/decoders/test_composite_decoder.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
7-7
: LGTM! Clean import additions.The new imports are well-organized and necessary for the JsonParser test implementation.
Also applies to: 17-17
123-135
: Great test parameterization! 🎯The test cases cover a good variety of data types and the test IDs are very descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1756-1760
: Consider enhancing error handling for edge cases.The implementation is clean and follows the pattern of other parser creation methods. However, we could make it more robust by validating the encoding value. What do you think about adding validation to ensure the encoding is supported by Python's codec registry? Here's a suggestion:
@staticmethod def create_json_parser(model: JsonParserModel, config: Config, **kwargs: Any) -> JsonParser: encoding = model.encoding or "utf-8" + try: + "test".encode(encoding) + except LookupError as e: + raise ValueError(f"Unsupported encoding: {encoding}") from e return JsonParser(encoding=encoding)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(12 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/declarative_component_schema.yaml
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
75-75
: LGTM!The import is correctly placed in the decoders section.
251-253
: LGTM!The model import follows the established pattern.
529-529
: LGTM!The constructor mapping is added in alphabetical order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, approving to unblock. please take a look at my latest comment
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/declarative_component_schema.yaml
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (2)
10-10
: LGTM! The new imports look good.The added imports support the new JsonParser implementation with error handling capabilities.
Also applies to: 13-15
Line range hint
1582-1582
: LGTM! The GzipParser integration is correct.The JsonParser is properly added as a valid inner_parser type for GzipParser.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
740-748
: LGTM! The FlattenFields implementation looks clean.The class follows the established pattern with proper type hints and default values.
1176-1182
: LGTM! The JsonParser model looks good.The implementation is consistent with other parser models in the file.
1617-1617
: LGTM! The component integrations are properly implemented.JsonParser is correctly added as a valid parser type in the Union types.
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2907-2920
: Consider enhancing the JsonParser component definition.The implementation looks good, but we could make it more robust. What do you think about:
- Adding validation for the
encoding
field to restrict it to common encodings (utf-8, ascii, etc.)?- Including an optional
allow_invalid_json
boolean property to handle malformed JSON gracefully?- Adding an example showing how to handle JSON with different encodings?
Here's a suggested enhancement:
JsonParser: title: JsonParser description: Parser used for parsing str, bytes, or bytearray data and returning data in a dictionary format. type: object additionalProperties: true required: - type properties: type: type: string enum: [JsonParser] encoding: type: string default: utf-8 + enum: [utf-8, ascii, utf-16, utf-32] + description: The character encoding to use when decoding the input data. + allow_invalid_json: + type: boolean + default: false + description: When set to true, attempts to handle malformed JSON by skipping invalid records instead of failing. + examples: + - type: JsonParser + encoding: utf-16 + - type: JsonParser + encoding: utf-8 + allow_invalid_json: trueWhat do you think about these additions? They would make the component more flexible and easier to use correctly.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
1756-1759
: Consider adding a docstring for better documentation?The implementation looks good, but would you consider adding a docstring to document the parameters and return type? This would help maintain consistency with other factory methods, wdyt?
Here's a suggested addition:
@staticmethod def create_json_parser(model: JsonParserModel, config: Config, **kwargs: Any) -> JsonParser: + """Creates a JsonParser instance. + + Args: + model: The JsonParser model containing the configuration + config: The connector configuration + **kwargs: Additional keyword arguments + + Returns: + JsonParser: A configured JsonParser instance + """ encoding = model.encoding if model.encoding else "utf-8" return JsonParser(encoding=encoding)
2569-2589
: Consider making the error message more concise?The validation logic looks solid, but the error message is quite lengthy. Would you consider a more concise version that maintains clarity? For example:
- _UNSUPPORTED_DECODER_ERROR = ( - "Specified decoder of {decoder_type} is not supported for pagination." - "Please set as `JsonDecoder`, `XmlDecoder`, or a `CompositeRawDecoder` with an inner_parser of `JsonParser` or `GzipParser` instead." - "If using `GzipParser`, please ensure that the lowest level inner_parser is a `JsonParser`." - ) + _UNSUPPORTED_DECODER_ERROR = ( + "Decoder {decoder_type} is not supported for pagination. Use `JsonDecoder`, `XmlDecoder`, " + "or `CompositeRawDecoder` with `JsonParser` or `GzipParser(JsonParser)` as inner_parser." + )
1041-1044
: Consider extracting common decoder handling logic?I notice similar code patterns for handling decoders in both
create_cursor_pagination
andcreate_offset_increment
. Would you consider extracting this into a helper method to reduce duplication? Something like:def _prepare_pagination_decoder(self, decoder: Decoder) -> Decoder: """Prepares a decoder for pagination by wrapping it in PaginationDecoderDecorator if needed.""" if isinstance(decoder, PaginationDecoderDecorator): return decoder return PaginationDecoderDecorator(decoder=decoder)This could simplify both methods:
decoder = self._prepare_pagination_decoder(decoder) inner_decoder = decoder.decoder if isinstance(decoder, PaginationDecoderDecorator) else decoderAlso applies to: 1954-1961
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
681-681
: LGTM! Minor formatting improvement.The enum formatting has been cleaned up by removing unnecessary whitespace.
Line range hint
2889-2906
: LGTM! Consistent component reference.The
JsonParser
has been correctly added to the list of allowed parsers in theCompositeRawDecoder
component.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
75-76
: LGTM! Clean import additions.The new imports for
JsonParser
andParser
are well-organized within the existing decoders import block.
252-254
: LGTM! Consistent model import.The
JsonParserModel
import follows the established pattern for model imports.
530-530
: LGTM! Clean mapping addition.The
JsonParserModel
tocreate_json_parser
mapping is correctly added to the constructor dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple nits, perhaps nuke the allowAdditionalFields
? Otherwise looks good, shipit.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
Co-authored-by: Natik Gadzhi <[email protected]>
What
JsonParser
component which inherits from theParser
interface.JsonParser
component #164Recommended Reviewing Order:
Summary by CodeRabbit
Release Notes
New Features
JsonParser
for handling JSON data with flexible encoding support.Improvements
Testing