Skip to content

Conversation

TheBobBobs
Copy link
Contributor

Summary

Create indexes on columns commonly used in queries.
Tag.name, Tag.shorthand finding tags by name
TagParent.child_id fetching tag hierarchies
TagEntry.entry_id fetching entries with their tags

Change query in Library.search_tags so it will use above indexes.
Sort results before applying limit to prevent truncating tags that should be prioritized.

Tasks Completed

  • Platforms Tested:
    • Linux x86
  • Tested For:
    • Basic functionality
    • PyInstaller executable

Comment on lines 1081 to 1082
if limit > 0 and not name:
query = query.limit(limit).order_by(func.lower(Tag.name))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the sorting to happen after truncating the results, which differs from the behaviour when searching, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this was not sorting by len(text) before truncating. The previous implementation only sorted priority results by len so I've updated sort_key to do that. Which will make this order_by statement and sort_key produce the same results when no query is provided.

Comment on lines 1098 to 1106
tags.sort(key=lambda t: sort_key(t[1]))
seen_ids = set()
tag_ids = []
for row in tags:
id = row[0]
if id in seen_ids:
continue
tag_ids.append(id)
seen_ids.add(id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be written as the following?

tags = dict(tags)
tag_ids = sorted(tags.keys(), key=lambda t: sort_key(tags[t]))
del tags # not sure if this is makes a diff, but `tags` could become quite large and triggering gc on it sooner can't hurt

this is both simpler code wise and should be faster by only sorting the deduplicated list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so it will use the order from Tag.name or TagAlias.name depending on which comes first for each tag.

Copy link
Collaborator

@Computerdores Computerdores Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case the following should work since dict.keys() maintains insertion order and dict deduplicates by key:

tags.sort(key=lambda t: sort_key(t[1]))
tag_ids = dict(tags).keys()  # get the deduplicated list of ids

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than that I think this is good to merge

Comment on lines 223 to 228
direct_tags, ancestor_tags = self.lib.search_tags(name=query, limit=tag_limit)

if query and query.strip():
for tag in raw_results:
if tag.name.lower().startswith(query_lower):
priority_results.add(tag)
all_results = [t for t in direct_tags if t.id not in self.exclude]
for tag in ancestor_tags:
if tag.id not in self.exclude:
all_results.append(tag)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code previously handled self.exclude being None, is there a reason you removed that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its type is list[int] and I couldn't find any code that could cause it to be None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good reason ^^, I missed that

Comment on lines 225 to 228
all_results = [t for t in direct_tags if t.id not in self.exclude]
for tag in ancestor_tags:
if tag.id not in self.exclude:
all_results.append(tag)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
all_results = [t for t in direct_tags if t.id not in self.exclude]
for tag in ancestor_tags:
if tag.id not in self.exclude:
all_results.append(tag)
all_results = [t for t in direct_tags if t.id not in self.exclude]
all_results += [t for t in ancestor_tags if t.id not in self.exclude]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up doing this to avoid creating extra lists.
all_results.extend(t for t in ancestor_tags if t.id not in self.exclude)

@CyanVoxel CyanVoxel added TagStudio: Library Relating to the TagStudio library system TagStudio: Tags Relating to the TagStudio tag system TagStudio: Search The TagStudio search engine Type: Performance An issue or change related to performance labels Sep 15, 2025
@CyanVoxel CyanVoxel moved this to 👀 In review in TagStudio Development Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TagStudio: Library Relating to the TagStudio library system TagStudio: Search The TagStudio search engine TagStudio: Tags Relating to the TagStudio tag system Type: Performance An issue or change related to performance
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

3 participants