Skip to content

Conversation

lkuchars
Copy link
Contributor

@lkuchars lkuchars commented Sep 10, 2025

Summary

NIFI-14953

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@lkuchars lkuchars marked this pull request as ready for review September 10, 2025 20:29
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this additional validation @lkuchars.

As this requires writing out temporary files, it seems like it could be a bit expensive as written, given that customValidate can be triggered many times. One option is to implement cached checking of the schema so that it only runs the compilation when the Schema Text property changes. Another option is to implement this as a manual verify method, requiring user interaction. It seems better to consider the verify method approach, to avoid complexity around caching, and because this validation only applies to the Schema Text option, but I'm open to considering either approach.

return compiledSchema;

} catch (final IllegalStateException e) {
throw new SchemaCompilationException(e); // Illegal state exception is thrown by the wire library for schema issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A message should be included along with the IllegalStateException cause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


final ValidationResult invalidResult = verifyExactlyOneValidationError();

assertTrue(invalidResult.getExplanation().contains("Message name 'test.NonExistentMessage' cannot be found"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend reducing the search string:

Suggested change
assertTrue(invalidResult.getExplanation().contains("Message name 'test.NonExistentMessage' cannot be found"));
assertTrue(invalidResult.getExplanation().contains("test.NonExistentMessage"));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return problems;
}

// Compile the schema to validate it's correct. The method is used only for validation purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment mostly reiterates the method name, and the usage in validation is clear, so recommend removing this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


@Override
protected Collection<ValidationResult> customValidate(final ValidationContext validationContext) {
final List<ValidationResult> problems = new ArrayList<>(super.customValidate(validationContext));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results can also indicate a valid status, so the variable name should be changed.

Suggested change
final List<ValidationResult> problems = new ArrayList<>(super.customValidate(validationContext));
final List<ValidationResult> results = new ArrayList<>(super.customValidate(validationContext));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

final PropertyValue schemaTextProperty = validationContext.getProperty(SCHEMA_TEXT);
final String schemaTextValue = schemaTextProperty.getValue();

if (validationContext.isExpressionLanguageSupported(SCHEMA_TEXT.getName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the isExpressionLanguageSupported check needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


if (validationContext.isExpressionLanguageSupported(SCHEMA_TEXT.getName())
&& validationContext.isExpressionLanguagePresent(schemaTextValue)) {
return Collections.emptyList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the results array and then having multiple return statements is not ideal. It would be better to refactor this method to have a single return at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we refactor to have a single return, we would need to use nested if-else statements or boolean flags, which would actually make the code less readable and more complex. The early returns here serve a legitimate purpose - they represent different validation scenarios where further validation is either unnecessary or impossible.

.subject(SCHEMA_TEXT.getDisplayName())
.input(schemaTextValue)
.valid(false)
.explanation("Schema Text cannot be empty when using \"Use 'Schema Text' Property\" strategy")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend avoiding the reference to the other property name:

Suggested change
.explanation("Schema Text cannot be empty when using \"Use 'Schema Text' Property\" strategy")
.explanation("Schema Text value is missing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

try {
// Try to compile the schema to validate it's valid protobuf format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Try to compile the schema to validate it's valid protobuf format
// Try to compile the schema to validate protobuf format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@lkuchars
Copy link
Contributor Author

Thanks for this additional validation @lkuchars.

As this requires writing out temporary files, it seems like it could be a bit expensive as written, given that customValidate can be triggered many times. One option is to implement cached checking of the schema so that it only runs the compilation when the Schema Text property changes. Another option is to implement this as a manual verify method, requiring user interaction. It seems better to consider the verify method approach, to avoid complexity around caching, and because this validation only applies to the Schema Text option, but I'm open to considering either approach.

@exceptionfactory thank you for taking your time to look at this. It would be nice to get the validation of the schema without manual intervention. I think the original ProtobufReader worked that way, and it used cache. I think your suggestion with a cache is spot on.
I'll just remove the throwaway compiler
// Create a throwaway schema compiler for validation to avoid polluting the main compiler cache. final ProtobufSchemaCompiler validationCompiler = new ProtobufSchemaCompiler(getIdentifier() + "_validation", getLogger());
I think my concerns about not polluting the global compiler cache are unjustified.
I can use the schema compiler initialized in onEnabled. That way, subsequent validation calls will use the cache implementation from the ProtobufSchemaCompiler.
WDYT?

@exceptionfactory
Copy link
Contributor

@exceptionfactory thank you for taking your time to look at this. It would be nice to get the validation of the schema without manual intervention. I think the original ProtobufReader worked that way, and it used cache. I think your suggestion with a cache is spot on. I'll just remove the throwaway compiler // Create a throwaway schema compiler for validation to avoid polluting the main compiler cache. final ProtobufSchemaCompiler validationCompiler = new ProtobufSchemaCompiler(getIdentifier() + "_validation", getLogger()); I think my concerns about not polluting the global compiler cache are unjustified. I can use the schema compiler initialized in onEnabled. That way, subsequent validation calls will use the cache implementation from the ProtobufSchemaCompiler. WDYT?

Thanks for the reply @lkuchars. Using the same schemaCompiler instance could work, but it would have to be instantiated somewhere else, because OnEnabled methods are invoked manually, after validation.

@lkuchars lkuchars closed this Sep 23, 2025
@lkuchars lkuchars force-pushed the NIFI-14953-add-proto-schema-validation branch from 78c0d7a to 1c995f4 Compare September 23, 2025 20:41
@lkuchars lkuchars reopened this Sep 23, 2025
@lkuchars
Copy link
Contributor Author

@exceptionfactory thank you for taking your time to look at this. It would be nice to get the validation of the schema without manual intervention. I think the original ProtobufReader worked that way, and it used cache. I think your suggestion with a cache is spot on. I'll just remove the throwaway compiler // Create a throwaway schema compiler for validation to avoid polluting the main compiler cache. final ProtobufSchemaCompiler validationCompiler = new ProtobufSchemaCompiler(getIdentifier() + "_validation", getLogger()); I think my concerns about not polluting the global compiler cache are unjustified. I can use the schema compiler initialized in onEnabled. That way, subsequent validation calls will use the cache implementation from the ProtobufSchemaCompiler. WDYT?

Thanks for the reply @lkuchars. Using the same schemaCompiler instance could work, but it would have to be instantiated somewhere else, because OnEnabled methods are invoked manually, after validation.

I moved the initialization of the schemaCompiler to the init() override.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants