|
| 1 | +[[phrase-matching]] |
| 2 | +=== Phrase Matching |
| 3 | + |
| 4 | +In the same way that the `match` query is the go-to query for standard |
| 5 | +full-text search, the `match_phrase` query((("proximity matching", "phrase matching")))((("phrase matching")))((("match_phrase query"))) is the one you should reach for |
| 6 | +when you want to find words that are near each other: |
| 7 | + |
| 8 | +[source,js] |
| 9 | +-------------------------------------------------- |
| 10 | +GET /my_index/my_type/_search |
| 11 | +{ |
| 12 | + "query": { |
| 13 | + "match_phrase": { |
| 14 | + "title": "quick brown fox" |
| 15 | + } |
| 16 | + } |
| 17 | +} |
| 18 | +-------------------------------------------------- |
| 19 | +// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json |
| 20 | + |
| 21 | +Like the `match` query, the `match_phrase` query first analyzes the query |
| 22 | +string to produce a list of terms. It then searches for all the terms, but |
| 23 | +keeps only documents that contain _all_ of the search terms, in the same |
| 24 | +_positions_ relative to each other. A query for the phrase `quick fox` |
| 25 | +would not match any of our documents, because no document contains the word |
| 26 | +`quick` immediately followed by `fox`. |
| 27 | + |
| 28 | +[TIP] |
| 29 | +================================================== |
| 30 | + |
| 31 | +The `match_phrase` query can also be written as a `match` query with type |
| 32 | +`phrase`: |
| 33 | + |
| 34 | +[source,js] |
| 35 | +-------------------------------------------------- |
| 36 | +"match": { |
| 37 | + "title": { |
| 38 | + "query": "quick brown fox", |
| 39 | + "type": "phrase" |
| 40 | + } |
| 41 | +} |
| 42 | +-------------------------------------------------- |
| 43 | +// SENSE: 120_Proximity_Matching/05_Match_phrase_query.json |
| 44 | + |
| 45 | +================================================== |
| 46 | + |
| 47 | +==== Term Positions |
| 48 | + |
| 49 | +When a string is analyzed, the analyzer returns not((("phrase matching", "term positions")))((("match_phrase query", "position of terms")))((("position-aware matching"))) only a list of terms, but |
| 50 | +also the _position_, or order, of each term in the original string: |
| 51 | + |
| 52 | +[source,js] |
| 53 | +-------------------------------------------------- |
| 54 | +GET /_analyze?analyzer=standard |
| 55 | +Quick brown fox |
| 56 | +-------------------------------------------------- |
| 57 | +// SENSE: 120_Proximity_Matching/05_Term_positions.json |
| 58 | + |
| 59 | +This returns the following: |
| 60 | + |
| 61 | +[role="pagebreak-before"] |
| 62 | +[source,js] |
| 63 | +-------------------------------------------------- |
| 64 | +{ |
| 65 | + "tokens": [ |
| 66 | + { |
| 67 | + "token": "quick", |
| 68 | + "start_offset": 0, |
| 69 | + "end_offset": 5, |
| 70 | + "type": "<ALPHANUM>", |
| 71 | + "position": 1 <1> |
| 72 | + }, |
| 73 | + { |
| 74 | + "token": "brown", |
| 75 | + "start_offset": 6, |
| 76 | + "end_offset": 11, |
| 77 | + "type": "<ALPHANUM>", |
| 78 | + "position": 2 <1> |
| 79 | + }, |
| 80 | + { |
| 81 | + "token": "fox", |
| 82 | + "start_offset": 12, |
| 83 | + "end_offset": 15, |
| 84 | + "type": "<ALPHANUM>", |
| 85 | + "position": 3 <1> |
| 86 | + } |
| 87 | + ] |
| 88 | +} |
| 89 | +-------------------------------------------------- |
| 90 | +<1> The `position` of each term in the original string. |
| 91 | + |
| 92 | +Positions can be stored in the inverted index, and position-aware queries like |
| 93 | +the `match_phrase` query can use them to match only documents that contain |
| 94 | +all the words in exactly the order specified, with no words in-between. |
| 95 | + |
| 96 | +==== What Is a Phrase |
| 97 | + |
| 98 | +For a document to be considered a((("match_phrase query", "documents matching a phrase")))((("phrase matching", "criteria for matching documents"))) match for the phrase ``quick brown fox,'' the following must be true: |
| 99 | + |
| 100 | +* `quick`, `brown`, and `fox` must all appear in the field. |
| 101 | + |
| 102 | +* The position of `brown` must be `1` greater than the position of `quick`. |
| 103 | + |
| 104 | +* The position of `fox` must be `2` greater than the position of `quick`. |
| 105 | + |
| 106 | +If any of these conditions is not met, the document is not considered a match. |
| 107 | + |
| 108 | +[TIP] |
| 109 | +================================================== |
| 110 | + |
| 111 | +Internally, the `match_phrase` query uses the low-level `span` query family to |
| 112 | +do position-aware matching. ((("match_phrase query", "use of span queries for position-aware matching")))((("span queries")))Span queries are term-level queries, so they have |
| 113 | +no analysis phase; they search for the exact term specified. |
| 114 | + |
| 115 | +Thankfully, most people never need to use the `span` queries directly, as the |
| 116 | +`match_phrase` query is usually good enough. However, certain specialized |
| 117 | +fields, like patent searches, use these low-level queries to perform very |
| 118 | +specific, carefully constructed positional searches. |
| 119 | + |
| 120 | +================================================== |
0 commit comments