PathQL is a declarative, composable, and efficient query language for filesystem operations in Python. It enables expressive, readable, and powerful queries over files and directories, inspired by pathlib and modern query languages. PathQL is designed for performance (stat caching), extensibility, and testability, with robust operator overloading and a familiar, Pythonic API requiring no (or VERY few) interactions and conversions with datetimes, timestaps and vaguaries of the os.stat_result object.
- Declarative Query Syntax: Compose filters using Python operators (
&,|,~, etc.) - Pathlib-like Naming: Filters and queries mimic
pathlibconventions (e.g.,Stem,Suffix,FileType) - Stat Caching: Each file is stat-ed at-most, once, even if there are many checks against "stat-able" parts of a file.
- Threaded Filesystem Search: Query engine uses a producer-consumer model with a dedicated thread for filesystem crawling and a main thread for filtering, improving responsiveness and throughput for large directory trees.
- Extensible Filters: Easily add new filters for custom logic
- Comprehensive Testing: Robust, parameterized pytest suite with coverage
PathQL lets you write queries against the file system in a composable form:
from pathql import AgeYears, Suffix, Query
for f in Query(where_expr=(AgeYears() > 1) & (Suffix() == ".bak"), from_paths="c:/logs").files():
print(f"Files to delete - {f.resolve()}")from pathql import AgeDays, Size, FileType, Suffix, Query
for f in Query(where_expr=(AgeDays() == 0) & (Size() > "10 mb") & (Suffix()=="log"), from_paths="C:/logs", threaded=True).files():
print(f"Files to zip - {f.resolve()}")from pathql import AgeDays, Size, Suffix, Query,ResultField
# Use fluent interface
result_set = (
Query()
.from_paths("C:/logs")
.where((AgeDays() == 0) & (Size() > "10 mb") & (Suffix() == "log"))
.threaded(True)
.select()
)
print(f"Number of files to zip: {result_set.count_()}")
print(f"Largest file size: {result_set.max(ResultField.SIZE)} bytes")
print(f"Oldest file: {result_set.min(ResultField.MTIME)}")These examples show how PathQL hides the guts of the pathlib module and the os module from you with readable filter expressions. The real power comes with actions that allow you to apply a function to all of the files that match your query, in parallel.
Filters are composable objects that match files based on many of the attributes exposed by the pathlib Package. Attributes such as size, age, suffix, stem, or type. Each filter can be combined using Python's over-loadable boolean operators. Those filters can be composed in arbitrarily complex ways using boolean operators and methods that operate on queries.
PathQL allows you to compose complex queries using the & (AND), | (OR), and ~ (NOT) operators. These operators work analogously to Python's built-in and, or, and not, but are implemented using operator overloading to allow chaining and composition of filter objects.
&(AND): Combines two filters so that both must match for a file to be included.|(OR): Combines two filters so that either filter matching will include the file.~(NOT): Inverts the result of a filter, matching files that do not satisfy the filter.- Any(q1,q2,..) and All(q1,q2,... composable queries allow for short circuited sub query composition.
Precedence
The query language implemented by PathQL should be thought of as a mathematical system that uses pythons mathematical operators, as such the rules of precedence follow those for '&', | and ^, which have different precedence than the and/or operators (which cannot be overloaded).
You must say
(AgeDays() < 2) & (Suffix() =="txt")
for precedence to work as you expect, or use the All/Any methods.
Short-Circuiting Behavior
PathQL's filter composition is designed to short-circuit, just like Python's boolean operators:
- For
A & B, if filterAdoes not match, filterBis not evaluated. - For
A | B, if filterAmatches, filterBis not evaluated. - For chained filters (e.g.,
A & B & C), evaluation stops at the first filter that fails for AND, or the first that succeeds for OR. - For
All(A,B,C)orAny(A,B,C), the operation will short circuit once aAllsees aFalseorAnysees aTrue.
Place expensive filters at right most position in expressions so they can be short circuited.
Example 1
from pathql.filters.base import Filter, AndFilter, OrFilter
class ExpensiveFilter(Filter):
def match(self, path, now=None, stat_result=None):
# Simulate a slow operation
import time
time.sleep(1)
return True
class TrueFilter(Filter):
def match(self, path, now=None, stat_result=None):
return True
class FalseFilter(Filter):
def match(self, path, now=None, stat_result=None):
return False
# ExpensiveFilter will NOT be called because CheapFilter fails
combined = FalseFilter() & ExpensiveFilter()
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delay
# ExpensiveFilter will NOT be called because CheapFilter succeeds
combined = TrueFilter() | ExpensiveFilter()
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delayExample 2
# ExpensiveFilter will NOT be called because CheapFilter fails. All Function mimics 'and'
combined = All(FalseFilter(), ExpensiveFilter())
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delay
# ExpensiveFilter will NOT be called because CheapFilter succeeds. Any function mimics 'or'
combined = Any(TrueFilter(), ExpensiveFilter())
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delayNote:
Filters should be pure functions without side effects, as short-circuiting means some filters may not be executed depending on the logic. While most filters that come with pathql run fairly quickly, real world use cases involve opening files and scanning contents for state information. Eliminating these checks can be VERY valuable.
Summary:
- Use
&,|, and~to build complex queries. - Follow precedence rules (Size()>"1mb")
- Short-circuiting ensures efficient evaluation and skips unnecessary work.
- Place expensive filters down stream in your logic chains Fast=(Ext,Name,FileDate) slow = (Size,Type,Date,Age)
Size() <= 1_000_000— files up to 1MBSuffix() == [".png", ".jpg"}]— files with .png or .jpg extensionStem()=="report_*")— files whose stem matches a glob pattern (e.g., starts with "report_")FileType().file— 'file, 'dir', 'link'AgeMinutes() < 10— modified in the last 10 minutesBetween(Size(), 1000, 2000)— files with size >= 1000 and < 2000 bytes (inclusive lower, exclusive upper)YearFIlter(2024)MonthFIlter("May")DayFilter(23)HourFilter(12)MinuteFilter(59)SecondFilter(40)FileDate().modified > dt.datetime(2025,1,1)
The Between filter matches files whose attribute (such as size or age) falls within a specified range: inclusive on the lower bound, exclusive on the upper bound. This filter works with values that can be reduced to comparable values. At this time Size and FileDate work with between.
Example: Find files with size >= 1KB and < 2KB:
from pathql.filters import Between, Size
from pathql.query import Query
for path in Query("/some/dir", Between(Size(), 1000, 2000) & Between(FileDate(),dt.datetime(2024,1,1),dt.datetime(2024,2,1))):
print(path)
else:
print("No files were found")The Query class is the driver for PathQL. It walks the filesystem, gathers stat metadata, and applies filter expressions to decide which files to yield. It exposes a small API and supports both single-threaded and producer/consumer (threaded) modes for flexible performance characteristics.
- Purpose: efficiently find filesystem entries that match a filter expression. - Model: the filesystem walk (producer) is separated from filtering/consumption (consumer) so IO-bound directory traversal can proceed concurrently with CPU bound filter evaluation.
- Key features: stat caching, optional threading, recursion control, and
eager or lazy collection via
selectandfilesrespectively.
Query(filter_expr)— construct a Query with a filter expression (anyFilter).Query.match(path, now=None, stat_result=None)— run the filter expression against a singlepath. Useful for programmatic checks or tests.nowdefaults to the current datetimestat_resultcan be provided to avoid an additionalstat()call.Query.files(path, recursive=True, files_only=True, now=None, threaded=False)— lazily yield matchingpathlib.Pathobjects. Setthreaded=Trueto enable the producer/consumer mode.Query.select(path, ...)— eagerly collect matches into aResultSet.
With pathql you get threading basically for free. Just ask for it when you run a query and your query will use 5 threads, though you have control over this. For custom filters that are IO bound threading will help, if you are running a free-threaded version of python it will also help on compute bound problems.
- Single-threaded (
threaded=False) walks the tree and filters each entry in a single loop. Simpler and predictable; good for small trees or when threading is not desired. - Threaded (
threaded=True) starts a producer thread that walks the filesystem and pushes(path, stat_result)tuples into a boundedqueue.Queue(default maxsize: 10). The consumer (caller) reads from the queue and runs the filter expression. This can improve throughput when the filesystem is slow or when filter evaluation is relatively expensive. - The producer places a
Nonesentinel on the queue when traversal completes; the consumer stops after seeing the sentinel and joins the producer thread.
- The producer (threaded mode) or the single-threaded walker performs
path.stat()once per file (at most) and passes thestat_resultalong with the path to the filters. Filters should accept and refer the providedstat_resultto avoid extra system calls. stat-resultis only generated when it is actually used and if other filter expressions require stat, they use a cached version.- All filters in PathQL accept optional
nowandstat_resultparameters so that a singlenowtimestamp can be used for consistent comparisons and so callers can reuse stat results.
IN large directory trees with 100's or 1000's of files the time changes between the beginning and end of a query, sometimes by many seconds or more. This can lead to race conditions when ages are calculated. The design decision that was made is that at the start of a query the current time is read (or a user provided one is used) for all datetime calculations. This is imperfect because there are many possible issues. For example, a file existed when you started the run but was deleted when you finished. There are many ways that this problem could be solved with varying degrees of "quality".
recursive=True(default) usesrglob("*")to walk the tree; setrecursive=Falseto restrict to the top-level directory viaglob("*").files_only=True(default) filters out non-files; setfiles=Falseto include directories in the results.
PathQL’s ResultSet is a materialized list of pathlib.Path objects returned by Query.select(). It provides aggregation, sorting, and statistical methods for analyzing query results. You can use ResultField to specify which file attribute to operate on.
| Operation | Description |
|---|---|
count_() |
Number of items in the result set |
sort_(field, ascending) |
Sort by a field (ascending/descending) |
top_n(field, n) |
Top N items by field |
bot_n(field, n) |
Bottom N items by field |
max(field) |
Maximum value for a field |
min(field) |
Minimum value for a field |
average(field) |
Average value for a field |
median(field) |
Median value for a field |
| Field | Description |
|---|---|
SIZE |
File size in bytes |
MTIME |
Modification time (timestamp) |
CTIME |
Creation time (timestamp) |
ATIME |
Access time (timestamp) |
MTIME_DT |
Modification time (datetime) |
CTIME_DT |
Creation time (datetime) |
ATIME_DT |
Access time (datetime) |
NAME |
File name |
SUFFIX |
File extension |
STEM |
File stem (name without extension) |
PATH |
Full path as string |
PARENT |
Parent directory |
PARENTS |
All parent directories (tuple) |
PARENTS_STEM_SUFFIX |
(Parents, stem, suffix) tuple |
from pathql.query import Query
from pathql.result_fields import ResultField
# Get a result set of all .log files
result_set = Query(where_expr=Suffix(".log"), from_paths="./logs").select()
# Count files
print(result_set.count_())
# Get top 3 largest files
top_3 = result_set.top_n(ResultField.SIZE, 3)
# Sort by creation time (ascending)
sorted_by_ctime = result_set.sort_(ResultField.CTIME, ascending=True)
# Get file with latest modification time
latest = result_set.max(ResultField.MTIME)The File filter matches the full filename using shell-style glob patterns (via fnmatch). This means you can use wildcards like *, ?, and character sets, but curly-brace expansion (e.g., foo.{jpg,png}) is not supported. Matching is case-insensitive.
Examples:
File("*.jpg") # matches "photo.jpg", "archive.backup.jpg"
File("foo.*") # matches "foo.jpg", "foo.png", "foo.bmp"
File("*.tar.gz") # matches "archive.tar.gz", "foo.bar.tar.gz"
File("foo.*.back") # matches "foo.tif.back", "foo.txt.back"
File("*report*") # matches any file with "report" in the nameNote: Patterns like foo.{jpg,png} will NOT expand to match multiple extensions. Use wildcards or run multiple queries if you need to match several extensions.
The Suffix filter match file extensions by checking if the filename ends with the full extension, including the dot. Patterns like 'bmp' or '.bmp' are both accepted, but are normalized internally to '.bmp'.
- Any leading dot in the extension pattern is added if missing, so 'bmp' and '.bmp' are treated identically.
- Matching is case-insensitive.
- The filter matches if the filename ends with the normalized pattern (e.g., '.bmp'), regardless of how many dots precede it.
- Wildcards (e.g.,
*,?) are supported in Suffix patterns viafnmatch.
Examples:
Suffix(".jpg") # matches "photo.jpg", "archive.backup.jpg"
Suffix("jpg") # also matches "photo.jpg", "archive.backup.jpg"
Suffix(".tar.gz") # matches "archive.tar.gz", "foo.bar.tar.gz"
Suffix(".tif.back") # matches "image.tif.back", "foo.tif.back"
Suffix(".back") # matches "foo.back", "image.tif.back", "foo.txt.back"
Suffix(".foo") # matches "bar.foo", "baz.bar.foo"
Suffix("*.gz") # matches any file ending in ".gz"Note on curly-brace expansion: Suffix supports simple brace expansion inside a single string. For example, Suffix("{tif,jpg}") expands to the two patterns
".tif" and ".jpg" (you can also include a base, e.g. ".{tif,jpg}"). This is a convenience for common multi-extension cases. By contrast, the
File filter does NOT perform curly-brace expansion (it uses fnmatch on the entire filename) — use multiple Suffix filters or explicit patterns for complex filename matching.
Multi-part extensions like '.tar.gz' or '.tif.back' are supported and will match files ending with those exact strings. Wildcards are supported for flexible matching.
For more advanced extension filtering, you can combine Suffix with other filters:
from pathql.query import Query
for path in Query("/photos", Suffix('.jpg') & CameraMakeFilter("Canon")):
print(path)Example: brace expansion with Suffix:
# matches files ending with .tif or .jpg
for p in Query("/images", Suffix("{tif,jpg}")):
print(p)For more advanced logic, subclass Filter and implement the match() method. This gives you full control over how files are evaluated.
Example: Custom Filter for minimum line count
from pathql.filters.base import Filter
from pathql.query import Query
from pathql.filters import Suffix
class MinLinesFilter(Filter):
def __init__(self, min_lines: int):
self.min_lines = min_lines
def match(self, path: pathlib.Path, now=None, stat_result=None) -> bool:
try:
return sum(1 for _ in path.open()) >= self.min_lines
except (IOError,PermissionError,FileNotFoundError):
return False
# Place Suffix first for short-circuiting, then MinLinesFilter
for p in Query(where_expr=Suffix(".txt") & MinLinesFilter(100), from_paths="./docs"):
print(p)See test/test_override.py for more examples and patterns.
PathQL provides powerful filters for matching files based on specific parts of their modification, creation, or access datetime. These filters allow precise queries for year, month, day, hour, minute, and second.
You almost never need to interact with the raw datetime objects if you are dealing with time scales relative to today. The filter constructors and defaults are designed so that typical queries (by year, month, day, etc.) will just work. You specify the part you care about (e.g., YearFilter(2025), MonthFilter("may")), and PathQL will filter data in that time period. If that time period is in the future (you ask for May and it is April)
- base: The reference datetime for the filter. By default, this is the current date/time, but you can set it to any
datetimeobject. This is useful for queries like "files from the same month last year" or "files from a specific day." - offset: An integer that shifts the
baseby the specified number of years, months, days, etc., depending on the filter. For example,offset=-1with aMonthFilterand the default base will match last month. - attr: Which file timestamp to use. Accepts user-friendly names (
"modified","created","accessed") or raw stat attribute names ("st_mtime", etc.). This lets you filter by modification, creation, or access time as needed.
YearFilter(year, base=None, offset=0, attr="modified")MonthFilter(month, base=None, offset=0, attr="created")DayFilter(day, base=None, offset=0, attr="accessed")HourFilter(hour, base=None, offset=0, attr="modified")MinuteFilter(minute, base=None, offset=0, attr="modified")SecondFilter(second, base=None, offset=0, attr="modified")
- value (
intorstr): The part to match (e.g., year=2025, month="may" or 5, day=16, etc.) - base (
datetime, optional): Reference datetime for offset calculation. Defaults to now. - offset (
int, optional): Offset to apply to the base datetime (e.g., offset=1 for next year/month/day). - attr (
str, optional): Which stat attribute to use. Accepts "modified", "created", "accessed" or "st_mtime", "st_ctime", "st_atime".
- The filter extracts the relevant part(s) from the file's stat timestamp (using the specified attr).
- It compares the extracted part(s) to the filter's value(s).
- For
MonthFilter, both string names ("may", "january") and integers (1-12) are supported. - For
DayFilter,HourFilter,MinuteFilter,SecondFilter, the filter matches only if all parts (year, month, day, etc.) match exactly.
| FilterName | base_time | offset | attr | Description |
|---|---|---|---|---|
| HourFilter | datetime(2022,10,28,11,0) | 2 | mtime | Files will match if modified in the hour 13:00 on 2022-10-28 |
| HourFilter | None (current_time) | 0 | ctime | Files will match if created in the current hour |
| DayFilter | datetime(2022,1,1) | 0 | atime | Files will match if accessed on 2022-01-01 |
| DayFilter | None (current_time) | -1 | mtime | Files will match if modified yesterday |
| MonthFilter | datetime(2023,6,1) | -1 | ctime | Files will match if created in May 2023 |
| MonthFilter | None (current_time) | 0 | mtime | Files will match if modified this month |
| YearFilter | datetime(2020,1,1) | 1 | mtime | Files will match if modified in 2021 |
| YearFilter | None (current_time) | 0 | ctime | Files will match if created this year |
Example usage:
HourFilter(base_time=datetime(2022,10,28,11,0), offset=2, attr="mtime")
HourFilter(base_time=None, offset=0, attr="ctime")
DayFilter(base_time=datetime(2022,1,1), offset=0, attr="atime")
DayFilter(base_time=None, offset=-1, attr="mtime")
MonthFilter(base_time=datetime(2023,6,1), offset=-1, attr="ctime")
MonthFilter(base_time=None, offset=0, attr="mtime")
YearFilter(base_time=datetime(2020,1,1), offset=1, attr="mtime")
YearFilter(base_time=None, offset=0, attr="ctime")PathQL provides convenient age-based filters that match files based on their modification age. These are higher-level than the datetime-part filters and are useful for queries like "files older than 30 days" or "files modified in the last 2 hours".
Supported filters:
AgeMinutes— age in minutesAgeHours— age in hoursAgeDays— age in daysAgeYears— age in years
Comparison semantics:
- Only
<,<=,>, and>=style comparisons are supported for age filters. The library treats<and>as inclusive (so they behave like<=and>=respectively) for convenience. Equality (==) and inequality (!=) comparisons are intentionally unsupported and will raiseTypeError.
Examples:
from pathql import AgeDays, AgeHours, AgeYears,Query
# Files older than 30 days
for p in Query("/var/log", AgeDays() > 30):
print(p)
# Files modified in the last 2 hours
for p in Query("/tmp", AgeHours() < 2):
print(p)
# Files older than one year
for p in Query("/archive", AgeYears() >= 1):
print(p)
Note: age filters use the file modification time (st_mtime) and accept an optional now parameter when used programmatically (useful for reproducible tests). They also accept an optional stat_result to avoid extra stat() calls when you already have file metadata.
- "created" →
st_ctime - "accessed" →
st_atime - You can also use the raw stat attribute names directly.
from pathql import YearFilter, MonthFilter, DayFilter, Query
import datetime as dt
# Files modified in 2025
query = YearFilter(2025)
for path in Query("/data", query):
print(path)
# Files created in May
query = MonthFilter("may", attr="created")
for path in Query("/data", query):
print(path)
# Files modified on October 16, 2025
query = DayFilter(base=dt.datetime(2025, 10, 16))
for path in Query("/data", query):
print(path)import datetime as dt
from pathql import DayFilter, MonthFilter,Query
# Files modified yesterday (-1day from now)
query = DayFilter(offset=-1)
for path in Query("/data", query).files():
print(path)
# Files modified last month (-1 month from this month)
query = MonthFilter(offset=-1)
for path in Query("/data" query).files():
print(path)
# Files modified the day before Christmas 2023
christmas = dt.datetime(2022, 12, 25)
query = DayFilter(base=christmas, offset=-1)
for path in Query("/data", query).files():
print(path)Ideally, if you have a datetime value in your code you will never need to deal with time spans or deltas or complex data calculations.
PathQL supports filtering files by age using either filesystem timestamps or the date encoded in the filename. Using filename dating allows for a (possibly) more reliable way to target creation time in log files. Instead of using the operating system timing data we use the file naming convention where the file name starts with YYYY-MM-DD-HH-, YYYY-MM-DD_, YY-MM_, YY_ Using this time can get around issues with creation tme not being reliable across operating systems, modification times being changed by mistake. System clocks not be correct.
For the purpose of matching if a valid file date is not found the match does not occur, it is not an error.
These filters use the date encoded in the filename (e.g. YYYY-MM-DD_HH_{ArchiveName}.{EXT}):
FilenameAgeMinutesFilenameAgeHoursFilenameAgeDaysFilenameAgeYears
Example usage:
from pathql.filters.fileage import FilenameAgeDays
# Find files whose filename date is less than 10 days old
filter = FilenameAgeDays(operator.lt, 10)
filter.match("2024-05-01_report.txt")Filename-based filters allow you to query files based on their logical age, independent of filesystem metadata.
PathQL provides utilities for batch file operations on lists of pathlib.Path objects (or ResultSet). You can use a query to select files, then copy, move, delete or zip them:
from pathql.actions import copy_files, move_files, delete_files
from pathql.query import Query
from pathql.filters import Suffix
# Select all .txt files recursively from 'src'
files = Query(Suffix() == "txt").select(src, recursive=True)
destination = Path("dst")
copy_files(files, destination)
move_files(files, destination)
delete_files(files)All file actions (copy_files, move_files, delete_files) in PathQL are designed for robust batch processing. When you perform an action on a list of files:
- Exceptions are caught per file: If an error occurs (such as
IOError,PermissionError,OSError,FileNotFoundError, orNotADirectoryError), it is caught for that specific file. - Result reporting: The return value is a
FileActionResultobject, which contains:success: a list of files that were processed successfully.failed: a list of files that failed to process.errors: a dictionary mapping each failed file to the exception that was raised.status: True iffailedlist is empty.
- No silent failures: By default, exceptions are caught and reported in the result object. If you set
ignore_access_exception=False, the first exception encountered will be raised immediately. - Status property: You can check
result.statusto see if all files were processed successfully (Trueif no failures).
Example:
result = copy_files(files, destination)
if not result.status:
print("Some files failed:")
for path, exc in result.errors.items():
print(f"{path}: {exc}")This design ensures you always know which files succeeded, which failed, and why—making batch file operations safe and transparent.
Note: If you perform a combined action (such as
zip_move_files), the returnedFileActionResultwill reflect the status for both the zip and the move actions. Thesuccess,failed, anderrorsfields will include results from all underlying operations.
You can create zip archives from lists of files, with options to preserve directory structure and chain actions (zip+delete, zip+move, zip+copy):
from pathql.actions import zip_files, zip_delete_files
from pathql.query import Query
from pathql.filters import Suffix
# Select all .txt files recursively from 'src'
files = Query(Suffix() == "txt").select(src, recursive=True)
root = Path("src")
target_zip = Path("archive.zip")
# Zip files to a target folder
zip_files(files, root, target_zip, preserve_dir_structure=True)
# Zip files to target folder and delete files left behind.
zip_delete_files(files, root, target_zip)PathQL has a small CLI app called pql.py1 that can be run from the command line using uv. It is a basic app that pulls in pathql and PIL and runs from the command line. It has a custom filter object that looks into image file metadata and extracts the color depth allowing you filter files by color depth. The demo enables color depth age and size filtering.
cli % uv run --script pql.py "*.jpg" RGB
black_1kb.jpg
This command will:
- Build a virtual environment with python 3.14 and install pathql and PIL
- Search for files matching the pattern
*.jpgin the current directory - Filter for images with color mode
RGB
You can add more options, such as file size and age:
cli % uv run --script pql.py "*.jpg" RGB --size-min 10kb --size-max 2mb --min-age 0 --max-age 365This will:
- Build a virtual environment with python 3.14 and install pathql and PIL
- Find JPEG images with RGB color mode
- Only include files between 10KB and 2MB
- Only include files created between 0 and 365 days ago
- Return nothing because the file size constraint missed.
- pattern (positional): Glob pattern for files (e.g.,
"*.jpg") - col_mode (positional): Comma-separated color modes (e.g.,
"RGB,RGBA,L") - --root: Folder to search (default: current directory)
- --size-min: Minimum file size in bytes (default: 0)
- --size-max: Maximum file size in bytes (default: 1TB)
- --min-age: Minimum file creation age in days (default: 0)
- --max-age: Maximum file creation age in days (default: 100000)
PathQL filter classes (like Size, AgeDays, etc.) intentionally override Python's comparison and boolean operator methods (__eq__, __lt__, etc.) to return filter objects, not bool. This enables expressive, composable query logic:
Size() > "10MB" # returns a filter, not a boolThis pattern is common in query builders and DSLs, but it conflicts with Python's type system and expectations for these dunder methods (which are supposed to return bool). As a result, type checkers (like mypy, Pyright) and some linters will report errors or warnings about return types and operator overloads.
Why?
- Python expects
__eq__,__lt__, etc. to returnboolfor built-in types. - DSLs return filter objects so you can build up complex queries using operators.
- This is deliberate and necessary for PathQL's design, but it is not type-safe according to Python's current type system.
What to do?
- Ignore these type errors for filter classes, or use
# type: ignorecomments. - Document this behavior for users and contributors.
- If you need strict
boolresults, use explicit methods or evaluate filters directly.
Community Note: This is a known limitation in Python's type system. A future PEP may address DSL-friendly operator overloads, but for now, this pattern will always cause typechecker friction.
Project-level conventions, contributor guidance, and release steps are maintained in AI_CONTEXT.md in the repository root. Please consult that file for the latest instructions on coding style, testing, and release procedures.
- Added fluent interface for query.
query =Query().from_paths("c:/logs").where(Suffix() == 'txt').select(). - Updated Stem()/Suffix() to support
==/!=operators allowingStem() == "txt" - Improved test coverage.