Skip to content

Conversation

@KKould
Copy link
Member

@KKould KKould commented Nov 6, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

upgrade nom to version 8.0.0. Use the first token check to reduce branch traversal in expr_element.

before:
├─ deep_function_call  802.2 µs      │ 1.207 ms      │ 842 µs        │ 850.6 µs      │ 100     │ 100
├─ deep_query          242.3 µs      │ 426.3 µs      │ 254.2 µs      │ 257.3 µs      │ 100     │ 100
├─ large_query         1.104 ms      │ 1.264 ms      │ 1.14 ms       │ 1.142 ms      │ 100     │ 100
├─ large_statement     1.097 ms      │ 1.2 ms        │ 1.15 ms       │ 1.148 ms      │ 100     │ 100
╰─ wide_expr           282.4 µs      │ 368.6 µs      │ 298 µs        │ 298.7 µs      │ 100     │ 100

after update nom to 8.0.0
├─ deep_function_call  747.4 µs      │ 1.1 ms        │ 771 µs        │ 776.5 µs      │ 100     │ 100
├─ deep_query          102.4 µs      │ 171.2 µs      │ 108 µs        │ 109.4 µs      │ 100     │ 100
├─ large_query         630.8 µs      │ 733 µs        │ 650 µs        │ 652.4 µs      │ 100     │ 100
├─ large_statement     621.5 µs      │ 687.5 µs      │ 642.9 µs      │ 645 µs        │ 100     │ 100
╰─ wide_expr           212.4 µs      │ 461.1 µs      │ 223.4 µs      │ 229.8 µs      │ 100     │ 100

after(nom_language version)
├─ deep_function_call  242.8 µs      │ 525.3 µs      │ 258.9 µs      │ 262.8 µs      │ 100     │ 100
├─ deep_query          235.6 µs      │ 364.8 µs      │ 244.8 µs      │ 249.3 µs      │ 100     │ 100
├─ large_query         362.9 µs      │ 451.6 µs      │ 376.5 µs      │ 379.7 µs      │ 100     │ 100
├─ large_statement     364.8 µs      │ 418.4 µs      │ 380.2 µs      │ 382.8 µs      │ 100     │ 100
╰─ wide_expr           96.97 µs      │ 270.2 µs      │ 102.8 µs      │ 105.3 µs      │ 100     │ 100

after(pratt parser version) now
├─ deep_function_call  81.55 µs      │ 290.1 µs      │ 91.42 µs      │ 94.28 µs      │ 100     │ 100
├─ deep_query          237.3 µs      │ 460.7 µs      │ 250.4 µs      │ 255.7 µs      │ 100     │ 100
├─ large_query         150.1 µs      │ 282.1 µs      │ 163.9 µs      │ 167 µs        │ 100     │ 100
├─ large_statement     149.7 µs      │ 185.4 µs      │ 161.4 µs      │ 161.8 µs      │ 100     │ 100
├─ wide_embedding      2.435 ms      │ 2.8 ms        │ 2.503 ms      │ 2.514 ms      │ 100     │ 100
╰─ wide_expr           30.73 µs      │ 45.14 µs      │ 31.18 µs      │ 31.78 µs      │ 100     │ 100

By determining the index of the first token of the current input to the relevant possible branches, we avoid trying all branches, which brings a significant improvement, reducing the time spent in the branch's wire_expr by almost half.

if let Some(token_0) = i.tokens.first() {
        use TokenKind::*;

        macro_rules! try_dispatch {
            ($($pat:pat => $body:expr),+ $(,)?) => {{
                if let Some(result) = try_token!(token_0, $($pat => $body),+) {
                    if matches!(&result, Ok(_) | Err(nom::Err::Failure(_))) {
                        return result;
                    }
                }
            }};
        }

        try_dispatch!(
            IS => with_span!(rule!(#is_null | #is_distinct_from)).parse(i),
            IN => with_span!(rule!(#in_list | #in_subquery)).parse(i),
            LIKE => with_span!(rule!(#like_subquery | #binary_op)).parse(i),
            EXISTS => with_span!(exists).parse(i),
            BETWEEN => with_span!(between).parse(i),
            CAST | TRY_CAST => with_span!(cast).parse(i),
            ....
}
// The try-parse operation in the function call is very expensive, easy to stack overflow
// so we manually check here whether the second token exists in LParen to avoid entering the loop
if i.tokens
    .get(1)
    .map(|token| token.kind == LParen)
    .unwrap_or(false)
{
    return with_span!(function_call).parse(i);
}

with_span!(alt((rule!(
    #column_ref : "<column>"
    | #map_access : "[<key>] | .<key> | :<key>"
    | #literal : "<literal>"
),)))
.parse(i)

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@KKould KKould changed the title perf: improve the performance of time functions by parsing them using… perf: improve the performance of time functions by parsing them using hard code Nov 6, 2025
@KKould KKould changed the title perf: improve the performance of time functions by parsing them using hard code : improve the performance of time functions by parsing them using hard code Nov 6, 2025
@KKould KKould changed the title : improve the performance of time functions by parsing them using hard code feat: improve the performance of time functions by parsing them using hard code Nov 6, 2025
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 6, 2025
@KKould KKould self-assigned this Nov 6, 2025
…ith nom_language. Use the first token check to reduce branch traversal in expr_element.
@KKould KKould force-pushed the perf/parse_time_function branch from 6114ae0 to e4187d3 Compare November 10, 2025 06:54
@KKould KKould changed the title feat: improve the performance of time functions by parsing them using hard code feat: upgrade nom to version 8.0.0 and accelerate expr_element using the first token. Nov 10, 2025
@KKould KKould requested review from TCeason, b41sh, Copilot and sundy-li and removed request for sundy-li November 10, 2025 11:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR upgrades the nom parser combinator library from version 7 to version 8, along with updating nom-rule from 0.4 to 0.5.1. The upgrade includes API migrations to accommodate nom 8's new trait system and error reporting improvements.

Key changes:

  • Migration from nom 7 to nom 8 API, including the new Input trait and Parser trait with associated types
  • Addition of .parse() calls throughout the codebase to invoke parsers using nom 8's Parser trait
  • Implementation of custom Input trait for the token slice type
  • Performance optimizations through dispatch macros that reduce backtracking
  • Improved error messages in test outputs with more specific token expectations

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Cargo.toml Updated nom to 8.0.0 and nom-rule to 0.5.1
Cargo.lock Locked updated dependency versions
src/query/ast/src/parser/input.rs Implemented nom 8's Input trait for custom Input type
src/query/ast/src/parser/common.rs Updated parser signatures to use nom 8's Parser trait with associated types, added parser_fn helper
src/query/ast/src/parser/expr.rs Contains bugs: Incorrect use of INVERTED keyword instead of INTERVAL for interval literal parsing, added dispatch optimizations
src/query/ast/src/parser/*.rs Added .parse() calls to invoke parsers using nom 8's API
src/query/ast/tests/it/testdata/*.txt Updated error messages reflecting nom 8's improved error reporting
src/query/ast/benches/bench.rs Updated benchmark results showing significant performance improvements
Comments suppressed due to low confidence (1)

src/query/ast/src/parser/expr.rs:1492

  • The variable name inverted_expr is misleading. This parser handles the INVERTED keyword followed by a literal string and casts it to an Interval type, but the name suggests it's related to inverted indexes or inverted data structures. The previous code used interval_expr for handling INTERVAL <literal_string> syntax. Consider renaming this to interval_string_expr or similar to clarify that it parses interval literals in string form.
    let inverted_expr = map(
        rule! {
            INVERTED ~ #consumed(literal_string)
        },
        |(_, (span, date))| ExprElement::Cast {
            expr: Box::new(Expr::Literal {
                span: transform_span(span.tokens),
                value: Literal::String(date),
            }),
            target_type: TypeName::Interval,
        },
    );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@KKould KKould force-pushed the perf/parse_time_function branch from 82aecc5 to 21eb23f Compare November 11, 2025 03:44
… number of branches and stack usage (otherwise, stack overflow is extremely likely).
@KKould KKould force-pushed the perf/parse_time_function branch from 5dbc507 to e6e01b6 Compare November 11, 2025 18:45
@KKould KKould marked this pull request as ready for review November 12, 2025 07:36
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@KKould KKould force-pushed the perf/parse_time_function branch 2 times, most recently from ccb06c2 to 1d1a8e4 Compare November 13, 2025 04:01
…in try_dispatch, create array_number function for numeric arrays, handle negative numbers directly
@KKould KKould force-pushed the perf/parse_time_function branch from 1d1a8e4 to 6b623ab Compare November 13, 2025 05:45
@KKould KKould requested a review from b41sh November 13, 2025 08:59
@BohuTANG BohuTANG merged commit 525ef26 into databendlabs:main Nov 14, 2025
87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants