Skip to content
F.Moser edited this page Apr 20, 2021 · 5 revisions

Lucene Query Parser for Javascript

This parser supports many of the constructs contained in the Lucene Query Syntax.

Supported features

  • conjunction operators: AND, OR, ||, &&, NOT
  • prefix operators: +, -
  • quoted values: "foo bar"
  • named fields: foo:bar
  • range expressions: foo:[bar TO baz], foo:{bar TO baz}
  • proximity search expressions: "foo bar"~5
  • boost expressions: foo^5, "foo bar"^5
  • fuzzy search expressions: foo~, foo~0.5
  • parentheses grouping: (foo OR bar) AND baz
  • field groups: foo:(bar OR baz)

Expression Tree Structure and Usage

The parser returns an expression tree for the query in the form of a tree of expression nodes, which are each dictionaries.

There are three basic types of expression dictionaries; node, field and range expressions

Node Expression

A node expression generally has the following structure:

{ 
    'left' : dictionary,     // field expression or node
    'operator': string,      // operator value
    'right': dictionary,     // field expression OR node 
    'field': string          // field name (for field group syntax) [OPTIONAL]
}

Field Expression

A field expression has the following structure:

{
    'field': string,         // field name
    'term': string,          // term value
    'prefix': string         // prefix operator (+/-) [OPTIONAL]
    'boost': float           // boost value, (value > 1 must be integer) [OPTIONAL]
    'similarity': float      // similarity value, (value must be > 0 and < 1) [OPTIONAL]
    'proximity': integer     // proximity value [OPTIONAL]
}

Range Expression

A range expression has the following structure:

{
    'field': string,         // field name
    'term_min': string,      // minimum value (left side) of range
    'term_max': string,      // maximum value (right side) of range
    'inclusive': boolean     // true: range is inclusive ([...]) or false: exclusive ({...}) 
    'inclusive_min': boolean     // true: min value is inclusive ([...) or false: exclusive ({...)
    'inclusive_max': boolean     // true: max value is inclusive (...]) or false: exclusive (...})
}

Other Notes and Concerns

Default Field Name

For any field name, unnamed/default fields will have the value <implicit>.

Wildcards

Wildcards (fo*, f?o) will be part of the term value.

Escaping Special Characters

Escaping Special Characters as described in the Lucene Documentation is not supported and generally speaking, will break the parser.

See issue #1 for more details.

Incomplete Implementation of - and + Operators

Currently, the - and + operators only function on the terms themselves, not on field names or clauses.

So, field:-"term" or field:+term or field:(+term -"term2") will work fine, but -(field:term) +(field:term2) or -field:term +field:term2 will not.

See issue #3 for more details.

Conjunction Operators Before Terms and Multiple Operators

Conjunction operators that appear at the beginning of the query violate the logic of the syntax, and are currently "mostly" ignored. The last element will be returned.

Also, multiple operators in any position are not supported and only the first one will be returned. No error will be generated.

For example:

    
      Query: OR
      Return: { "operator": "OR" } 

      Query: OR AND
      Return: { "operator": "AND" } 

      Query: OR AND foo
      Return: { "left": { "field": "<implicit>", "term": "foo" } } 

      Query: foo:bar OR NOT baz:qux
      Return: { "operator": "NOT" } 

See issue #19 for more details.

Unit Tests

To run the unit tests, just open SpecRunner.html in any browser. Unit tests are built with Jasmine.

Grammar

The parser is auto-generated from a PEG implementation in Javascript called PEG.js using the grammar file lucene-query-parser.grammar.

To test the grammar without using the generated parser, or if you want to modify it, try out PEG.js online. This is a handy way to test an arbitrary query and see what the results will be like or debug a problem with the parser for a given piece of data.