-
Notifications
You must be signed in to change notification settings - Fork 45
Home
This parser supports many of the constructs contained in the Lucene Query Syntax.
- conjunction operators:
AND
,OR
,||
,&&
,NOT
- prefix operators:
+
,-
- quoted values:
"foo bar"
- named fields:
foo:bar
- range expressions:
foo:[bar TO baz]
,foo:{bar TO baz}
- proximity search expressions:
"foo bar"~5
- boost expressions:
foo^5
,"foo bar"^5
- fuzzy search expressions:
foo~
,foo~0.5
- parentheses grouping:
(foo OR bar) AND baz
- field groups:
foo:(bar OR baz)
The parser returns an expression tree for the query in the form of a tree of expression nodes, which are each dictionaries.
There are three basic types of expression dictionaries; node, field and range expressions
A node expression generally has the following structure:
{
'left' : dictionary, // field expression or node
'operator': string, // operator value
'right': dictionary, // field expression OR node
'field': string // field name (for field group syntax) [OPTIONAL]
}
A field expression has the following structure:
{
'field': string, // field name
'term': string, // term value
'prefix': string // prefix operator (+/-) [OPTIONAL]
'boost': float // boost value, (value > 1 must be integer) [OPTIONAL]
'similarity': float // similarity value, (value must be > 0 and < 1) [OPTIONAL]
'proximity': integer // proximity value [OPTIONAL]
}
A range expression has the following structure:
{
'field': string, // field name
'term_min': string, // minimum value (left side) of range
'term_max': string, // maximum value (right side) of range
'inclusive': boolean // true: range is inclusive ([...]) or false: exclusive ({...})
'inclusive_min': boolean // true: min value is inclusive ([...) or false: exclusive ({...)
'inclusive_max': boolean // true: max value is inclusive (...]) or false: exclusive (...})
}
For any field name, unnamed/default fields will have the value <implicit>
.
Wildcards (fo*
, f?o
) will be part of the term value.
Escaping Special Characters as described in the Lucene Documentation is not supported and generally speaking, will break the parser.
See issue #1 for more details.
Currently, the -
and +
operators only function on the terms themselves, not on field names or clauses.
So, field:-"term"
or field:+term
or field:(+term -"term2")
will work fine, but -(field:term) +(field:term2)
or -field:term +field:term2
will not.
See issue #3 for more details.
Conjunction operators that appear at the beginning of the query violate the logic of the syntax, and are currently "mostly" ignored. The last element will be returned.
Also, multiple operators in any position are not supported and only the first one will be returned. No error will be generated.
For example:
Query: OR
Return: { "operator": "OR" }
Query: OR AND
Return: { "operator": "AND" }
Query: OR AND foo
Return: { "left": { "field": "<implicit>", "term": "foo" } }
Query: foo:bar OR NOT baz:qux
Return: { "operator": "NOT" }
See issue #19 for more details.
To run the unit tests, just open SpecRunner.html in any browser. Unit tests are built with Jasmine.
The parser is auto-generated from a PEG implementation in Javascript called PEG.js using the grammar file lucene-query-parser.grammar.
To test the grammar without using the generated parser, or if you want to modify it, try out PEG.js online. This is a handy way to test an arbitrary query and see what the results will be like or debug a problem with the parser for a given piece of data.