-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at [Ruby][Mongo] validate simple filter rules #3174 #430
Conversation
https://github.com/elastic/enterprise-search-team/issues/3174 write tests for starts_with, ends_with
# Conflicts: # lib/connectors/base/simple_rules_parser.rb # lib/connectors/mongodb/mongo_rules_parser.rb # spec/connectors/mongodb/mongo_rules_parser_spec.rb
https://github.com/elastic/enterprise-search-team/issues/3174 getting in main and fixing 100500 tests
https://github.com/elastic/enterprise-search-team/issues/3174 fixing another 100500 tests
https://github.com/elastic/enterprise-search-team/issues/3174 added pre-validation for starts_with, ends_with
lib/core/filtering/simple_rule.rb
Outdated
raise "#{e.key} is required: #{e.message}" | ||
end | ||
|
||
def self.flex_fetch(hash, key, default_value = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @vidok said this will kill kittens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if we do it in a monkey patch. =) But here, we don't.
Frankly, I don't see a good way to do it with key types unknown. Moreover, even the
hash[key] || hash.fetch(key.to_sym, nil) || hash.fetch(key.to_s, nil)
won't work properly in the case where the key = boolean false (learned the hard way, therefore this ugly code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When resolving merge conflicts, I convinced myself we didn't need this. We can just rely on string keys.
lib/core/filtering/simple_rule.rb
Outdated
@rule = SimpleRule.flex_fetch(rule_hash, RULE) | ||
@value = SimpleRule.flex_fetch(rule_hash, VALUE) | ||
@id = SimpleRule.flex_fetch(rule_hash, ID) | ||
@order = SimpleRule.flex_fetch(rule_hash, ORDER, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to add order
here, I suggest we require it, and not provide a default value. Otherwise we risk non-deterministic ordering in specs if we don't have an explicit order field defined and they all get :order => 0
from this default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that if we don't have order, then the ordering probably doesn't matter (like it doesn't now for pre-processing) and we can safely assume some default value. But you're probably right.
@@ -11,23 +11,144 @@ | |||
|
|||
module Connectors | |||
module Base | |||
class FilteringRulesValidationError < StandardError; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this go in errors.rb?
def initialize(rules) | ||
@rules = (rules || []).map(&:with_indifferent_access).filter { |r| r[:id] != 'DEFAULT' }.sort_by { |r| r[:order] } | ||
begin | ||
sorted = (rules || []).map { |r| SimpleRule.new(r) }.filter { |r| r.id != 'DEFAULT' }.sort_by(&:order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. I'd avoided it because of the order
requirement, but you've solved that now.
Should we unify this sorting with the one done by the PostProcessingEngine to DRY up our logic?
# # check for overlapping ranges | ||
# ranges = field_rules.filter { |r| r[:rule] == '>' || r[:rule] == '<' } | ||
# ranges.each_with_index do |r, i| | ||
# next if i == ranges.size - 1 | ||
# next_r = ranges[i + 1] | ||
# if r[:value] == next_r[:value] | ||
# raise FilteringRulesValidationError.new("Contradicting rules for field: #{field}. Can't have overlapping ranges.") | ||
# end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this block be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it should, it's a WIP to document the thought process. Frankly, the ranges feel so cumbersome to validate... there's just too many cases.
https://github.com/elastic/enterprise-search-team/issues/3174 added some range pre-validation some of the processed cases aren't covered by tests (drop_invalid_range_rules)
…ring-rules # Conflicts: # lib/core/filtering/post_process_engine.rb # lib/core/filtering/simple_rule.rb # spec/connectors/mongodb/mongo_rules_parser_spec.rb
@mchernyavskaya I failed to get this ready to merge. It too me too long to figure out how to resolve the merge conflicts (I couldn't rebase your branch and I still don't really understand why), and I'm just out of time. I think we'll just need to save this issue for 8.7 :( |
@seanstory no worries. Taking out of the DM: I feel like I'm going into a rabbit hole with these filtering rules (and specifically, with validation). I'm second-guessing this pre and post processing model more and more, so maybe that's something we should revisit. |
Checked the PR and spoke to @mchernyavskaya about it. It indeed feels like a rabbit hole - number of possible combinations of invalid rules is potentially really high, the complexity of validating the rules is humongous. I think same problem can be probably approached from a different perspective - let users enter any rules they want but provide good logging and tracing to see what query was generated (maybe even why) and why the expected data was not ingested. It's probably good to validate each individual rule though just for sanity purposes - for example that RANGE filter is actually [X; Y], not [-INF, X], [Y; +INF] |
@artem-shelkovnikov yep - I also proposed something like this in Ingestion Sync agenda - want it at least to be up for a discussion. |
I believe we can close this now, as it was implemented in a separate PR |
Closes https://github.com/elastic/enterprise-search-team/issues/3174
Basic high-level validation of simple filtering rules. It has a lot of edge cases and we can't cover all, so it's a best-effort attempt to provide at least some rough coverage.
The PR also includes pre-processing of
start_with
andend_with
rules in mongo (the first PR for simple rules didn't cover those as they were specified as post-processing in the implementation document, but since technically it's just a kind of a regex, I didn't see why we can't do it in pre-processing for Mongo).Checklists
Pre-Review Checklist
Related Pull Requests
For Elastic Internal Use Only
connectors_utility
andconnectors_service
) and included into Enterprise Search and tested that Enterprise Search works well with new gem versions. Instruction can be found here