|
1 |
| -[[fuzziness]] |
2 |
| -=== Fuzziness |
| 1 | +[[模糊]] |
| 2 | +=== 模糊 |
3 | 3 |
|
4 |
| -_Fuzzy matching_ treats two words that are ``fuzzily'' similar as if they were |
5 |
| -the same word.((("typoes and misspellings", "fuzziness, defining"))) First, we need to define what((("fuzziness"))) we mean by _fuzziness_. |
| 4 | +_模糊匹配_ 视两个单词 ``模糊'' 相似,正好像它们是同一个词. |
| 5 | +((("typoes and misspellings", "fuzziness, defining"))) 首先, 我们需要通过_fuzziness_ 来定义什么是((("fuzziness"))). |
6 | 6 |
|
7 |
| -In 1965, Vladimir Levenshtein developed the |
8 |
| -http://en.wikipedia.org/wiki/Levenshtein_distance[Levenshtein distance], which |
9 |
| -measures ((("Levenshtein distance")))the number of single-character edits required to transform |
10 |
| -one word into the other. He proposed three types of one-character edits: |
| 7 | +1965年, Vladimir Levenshtein 开发了 |
| 8 | +http://en.wikipedia.org/wiki/Levenshtein_distance[Levenshtein distance(Levenshtein距离)], 用来度量把一个单词转换为另一个单词需要的单字符编辑次数 ((("Levenshtein distance"))). |
| 9 | +他提出了3种单字符编辑: |
11 | 10 |
|
12 |
| -* _Substitution_ of one character for another: _f_ox -> _b_ox |
| 11 | +* _替换_ 一个字符到另一个字符: _f_ox -> _b_ox |
13 | 12 |
|
14 |
| -* _Insertion_ of a new character: sic -> sic_k_ |
| 13 | +* _插入_ 一个新字符: sic -> sic_k_ |
15 | 14 |
|
16 |
| -* _Deletion_ of a character:: b_l_ack -> back |
| 15 | +* _删除_ 一个字符:: b_l_ack -> back |
17 | 16 |
|
18 | 17 | http://en.wikipedia.org/wiki/Frederick_J._Damerau[Frederick Damerau]
|
19 |
| -later expanded these operations ((("Damerau, Frederick J.")))to include one more: |
| 18 | +稍后扩展了这些操作并包含了1个新的 ((("Damerau, Frederick J."))): |
20 | 19 |
|
21 |
| -* _Transposition_ of two adjacent characters: _st_ar -> _ts_ar |
| 20 | +* _换位_ 调整字符: _st_ar -> _ts_ar |
22 | 21 |
|
23 |
| -For example, to convert the word `bieber` into `beaver` requires the |
24 |
| -following steps: |
| 22 | +例如,把 `bieber` 转换为 `beaver` 需要以下几步: |
25 | 23 |
|
26 |
| -1. Substitute `v` for `b`: bie_b_er -> bie_v_er |
27 |
| -2. Substitute `a` for `i`: b_i_ever -> b_a_ever |
28 |
| -3. Transpose `a` and `e`: b_ae_ver -> b_ea_ver |
| 24 | +1. 用 `v` 替换掉 `b`: bie_b_er -> bie_v_er |
| 25 | +2. 用 `a` 替换掉 `i`: b_i_ever -> b_a_ever |
| 26 | +3. 换位 `a` 和 `e` : b_ae_ver -> b_ea_ver |
29 | 27 |
|
30 |
| -These three steps represent a |
31 |
| -http://bit.ly/1ymgZPB[Damerau-Levenshtein edit distance] |
32 |
| -of 3. |
| 28 | +以上的3步代表了3个 |
| 29 | +http://bit.ly/1ymgZPB[Damerau-Levenshtein edit distance(Damerau-Levenshtein编辑距离)]. |
33 | 30 |
|
34 |
| -Clearly, `bieber` is a long way from `beaver`—they are too far apart to be |
35 |
| -considered a simple misspelling. Damerau observed that 80% of human |
36 |
| -misspellings have an edit distance of 1. In other words, 80% of misspellings |
37 |
| -could be corrected with a _single edit_ to the original string. |
| 31 | +显然, `bieber` 距 `beaver`—很远;远得无法被认为是一个简单的拼写错误. |
| 32 | +Damerau发现 80% 的人类拼写错误的编辑距离都是1. 换句话说, 80% 的拼写错误都可以通过 _单次编辑_ |
| 33 | +修改为原始的字符串. |
38 | 34 |
|
39 |
| -Elasticsearch supports a maximum edit distance, specified with the `fuzziness` |
40 |
| -parameter, of 2. |
| 35 | +通过指定 `fuzziness` 参数为 2,Elasticsearch 支持最大的编辑距离. |
41 | 36 |
|
42 |
| -Of course, the impact that a single edit has on a string depends on the |
43 |
| -length of the string. Two edits to the word `hat` can produce `mad`, so |
44 |
| -allowing two edits on a string of length 3 is overkill. The `fuzziness` |
45 |
| -parameter can be set to `AUTO`, which results in the following maximum edit distances: |
| 37 | +当然, 一个字符串的单次编辑次数依赖于它的长度. 对 `hat` 进行两次编辑可以得到 `mad`, |
| 38 | +所以允许对长度为3的字符串进行两次修改就太过了. `fuzziness` |
| 39 | +参数可以被设置成 `AUTO`, 结果会在下面的最大编辑距离中: |
46 | 40 |
|
47 |
| -* `0` for strings of one or two characters |
48 |
| -* `1` for strings of three, four, or five characters |
49 |
| -* `2` for strings of more than five characters |
| 41 | +* `0` 1或2个字符的字符串 |
| 42 | +* `1` 3、4或5个字符的字符串 |
| 43 | +* `2` 多于5个字符的字符串 |
50 | 44 |
|
51 |
| -Of course, you may find that an edit distance of `2` is still overkill, and |
52 |
| -returns results that don't appear to be related. You may get better results, |
53 |
| -and better performance, with a maximum `fuzziness` of `1`. |
| 45 | +当然, 你可能发现编辑距离为`2` 仍然是太过了, 返回的结果好像并没有什么关联. |
| 46 | +把 `fuzziness` 设置为 `1` ,你可能会获得更好的结果和性能. |
0 commit comments