Skip to content

Commit eecb1b2

Browse files
committed
Update 20_Fuzziness.md
add translation for 270_Fuzzy_matching/20_Fuzziness
1 parent 9ae88cc commit eecb1b2

File tree

1 file changed

+30
-37
lines changed

1 file changed

+30
-37
lines changed

270_Fuzzy_matching/20_Fuzziness.md

Lines changed: 30 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,46 @@
1-
[[fuzziness]]
2-
=== Fuzziness
1+
[[模糊]]
2+
=== 模糊
33

4-
_Fuzzy matching_ treats two words that are ``fuzzily'' similar as if they were
5-
the same word.((("typoes and misspellings", "fuzziness, defining"))) First, we need to define what((("fuzziness"))) we mean by _fuzziness_.
4+
_模糊匹配_ 视两个单词 ``模糊'' 相似,正好像它们是同一个词.
5+
((("typoes and misspellings", "fuzziness, defining"))) 首先, 我们需要通过_fuzziness_ 来定义什么是((("fuzziness"))).
66

7-
In 1965, Vladimir Levenshtein developed the
8-
http://en.wikipedia.org/wiki/Levenshtein_distance[Levenshtein distance], which
9-
measures ((("Levenshtein distance")))the number of single-character edits required to transform
10-
one word into the other. He proposed three types of one-character edits:
7+
1965年, Vladimir Levenshtein 开发了
8+
http://en.wikipedia.org/wiki/Levenshtein_distance[Levenshtein distance(Levenshtein距离)], 用来度量把一个单词转换为另一个单词需要的单字符编辑次数 ((("Levenshtein distance"))).
9+
他提出了3种单字符编辑:
1110

12-
* _Substitution_ of one character for another: _f_ox -> _b_ox
11+
* _替换_ 一个字符到另一个字符: _f_ox -> _b_ox
1312

14-
* _Insertion_ of a new character: sic -> sic_k_
13+
* _插入_ 一个新字符: sic -> sic_k_
1514

16-
* _Deletion_ of a character:: b_l_ack -> back
15+
* _删除_ 一个字符:: b_l_ack -> back
1716

1817
http://en.wikipedia.org/wiki/Frederick_J._Damerau[Frederick Damerau]
19-
later expanded these operations ((("Damerau, Frederick J.")))to include one more:
18+
稍后扩展了这些操作并包含了1个新的 ((("Damerau, Frederick J."))):
2019

21-
* _Transposition_ of two adjacent characters: _st_ar -> _ts_ar
20+
* _换位_ 调整字符: _st_ar -> _ts_ar
2221

23-
For example, to convert the word `bieber` into `beaver` requires the
24-
following steps:
22+
例如,把 `bieber` 转换为 `beaver` 需要以下几步:
2523

26-
1. Substitute `v` for `b`: bie_b_er -> bie_v_er
27-
2. Substitute `a` for `i`: b_i_ever -> b_a_ever
28-
3. Transpose `a` and `e`: b_ae_ver -> b_ea_ver
24+
1. `v` 替换掉 `b`: bie_b_er -> bie_v_er
25+
2. `a` 替换掉 `i`: b_i_ever -> b_a_ever
26+
3. 换位 `a` `e` : b_ae_ver -> b_ea_ver
2927

30-
These three steps represent a
31-
http://bit.ly/1ymgZPB[Damerau-Levenshtein edit distance]
32-
of 3.
28+
以上的3步代表了3个
29+
http://bit.ly/1ymgZPB[Damerau-Levenshtein edit distance(Damerau-Levenshtein编辑距离)].
3330

34-
Clearly, `bieber` is a long way from `beaver`—they are too far apart to be
35-
considered a simple misspelling. Damerau observed that 80% of human
36-
misspellings have an edit distance of 1. In other words, 80% of misspellings
37-
could be corrected with a _single edit_ to the original string.
31+
显然, `bieber``beaver`&#x2014很远;远得无法被认为是一个简单的拼写错误.
32+
Damerau发现 80% 的人类拼写错误的编辑距离都是1. 换句话说, 80% 的拼写错误都可以通过 _单次编辑_
33+
修改为原始的字符串.
3834

39-
Elasticsearch supports a maximum edit distance, specified with the `fuzziness`
40-
parameter, of 2.
35+
通过指定 `fuzziness` 参数为 2,Elasticsearch 支持最大的编辑距离.
4136

42-
Of course, the impact that a single edit has on a string depends on the
43-
length of the string. Two edits to the word `hat` can produce `mad`, so
44-
allowing two edits on a string of length 3 is overkill. The `fuzziness`
45-
parameter can be set to `AUTO`, which results in the following maximum edit distances:
37+
当然, 一个字符串的单次编辑次数依赖于它的长度. 对 `hat` 进行两次编辑可以得到 `mad`,
38+
所以允许对长度为3的字符串进行两次修改就太过了. `fuzziness`
39+
参数可以被设置成 `AUTO`, 结果会在下面的最大编辑距离中:
4640

47-
* `0` for strings of one or two characters
48-
* `1` for strings of three, four, or five characters
49-
* `2` for strings of more than five characters
41+
* `0` 1或2个字符的字符串
42+
* `1` 3、4或5个字符的字符串
43+
* `2` 多于5个字符的字符串
5044

51-
Of course, you may find that an edit distance of `2` is still overkill, and
52-
returns results that don't appear to be related. You may get better results,
53-
and better performance, with a maximum `fuzziness` of `1`.
45+
当然, 你可能发现编辑距离为`2` 仍然是太过了, 返回的结果好像并没有什么关联.
46+
`fuzziness` 设置为 `1` ,你可能会获得更好的结果和性能.

0 commit comments

Comments
 (0)