Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searches with "not" in a KeywordIndex do not return records that do not contain a value for the index #148

Open
wesleybl opened this issue Jan 29, 2024 · 3 comments
Labels

Comments

@wesleybl
Copy link
Member

BUG/PROBLEM REPORT / FEATURE REQUEST

Searches with "not" in a KeywordIndex do not return records that do not contain a value for the index. For example, Plone's index Subject.

See this issue: plone/Products.CMFPlone#3895

What I did:

  1. Create a document with the Tag: "Bulletin"
  2. Create a Python script in ZMI with the content:
return str(context.portal_catalog(Subject={"not": ["Bulletin"]}))
  1. Run the script multiple times.

What I expect to happen:

The search must return all content that does not contain the "Bulletin" Subject. Including objects that do not have any Subject.

What actually happened:

Content that does not have a Subject is not returned.

What version of Python and Zope/Addons I am using:

Zope 5.9
Python: 3.11

jensens added a commit that referenced this issue Jan 13, 2025
@jensens
Copy link
Member

jensens commented Jan 13, 2025

Actually it is more difficult. We (me and @gogobd) run into the same issue.

We are using a query where one part is {"internal_tags": {"not": "hidden"}}.
This resulted in a behavior where the query randomly delivered 288 or 36 results.
We broke this down to run on a test-instance with a single thread. Here, after site reload it alternates between 288 or 36 results.

Next we found the order of indexes the query plan returns alternates too. One call the internal_tags index was fist, the next further at the end and then first again. So we dug deeper.

Unindex.query_index gets a parameter resultset. If a resultset is given, the code-path is different than if its empty. If the index is the first in the plan, resultset is empty, otherwise it is passed in:

  • If no resultset is passed in, the index creates a query on itself with all keywords except the not ones. It returns all document-ids it knows about minus the ones containing the not values.
  • if a resultset is passed in, it returns all values of the resultset minus the ones containing the not values.

Since an index only knows about values it has indexed, excluding documents with no such value or no such attribute, it is a completely different result.

We wrote a (failing) test to demonstrate this in branch issue-148.

@gogobd
Copy link

gogobd commented Jan 13, 2025

Like @jensens just explainded we had a problem with "not" in combination with "KeywordIndex".

The internal order of the individual keyword indices changes the result. When the "not" KeywordIndex query is done first it "misses" all objects that don't even have that index - but those objects should be part of the result, because they fullfill the "not" requirement. So if the Catalog "Plan" switches the individual queries the result changes.

@jensens jensens added the bug label Jan 13, 2025
@gogobd
Copy link

gogobd commented Jan 16, 2025

We "fixed" this issue by providing an indexer for all Dexterity Types that would index an "empty marker" if no value is present. That way the "not" query can "see" all content - even the content that doesn't even have the indexed field, in our case it is called "internal_tags".

  <adapter
      factory=".internaltags.internal_tags"
      name="internal_tags"
      />
@indexer(IDexterityContent)
def internal_tags(obj):
    """Return the internal_tags field value or '__empty__' if the value is missing or empty"""
    tags = getattr(obj.aq_explicit, "internal_tags", None)
    return tags or "__empty__"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants