Merge branch 'master' of https://github.com/aiddroid/elasticsearch-definitive-guide-cn

aiddroid · aiddroid · commit 0d3e01641656 · 2015-12-24T21:48:03.000+08:00
diff --git a/270_Fuzzy_matching/10_Intro.md b/270_Fuzzy_matching/10_Intro.md
@@ -1,29 +1,19 @@
-[[fuzzy-matching]]
-== Typoes and Mispelings
+[[模糊-匹配]]
+== 打字错误 和 拼写错误
 
-We expect a query on structured data like dates and prices to return only
-documents that match exactly. ((("typoes and misspellings", "fuzzy matching")))((("fuzzy matching"))) However, good full-text search shouldn't have the
-same restriction. Instead, we can widen the net to include words that _may_
-match, but use the relevance score to push the better matches to the top
-of the result set.
+我们希望在结构化数据上的查询（如日期和价格）仅返回精确匹配的文档. 
+((("typoes and misspellings", "fuzzy matching")))((("fuzzy matching"))) 然而, 好的全文检索不应该有同样的限制. 相反, 我们能拓宽网络以包含那些 _可能的_匹配, 并且利用相关性分数把更好的匹配结果放在结果集的前面.
 
-In fact, full-text search ((("full text search", "fuzzy matching")))that only matches exactly will probably frustrate
-your users. Wouldn't you expect a search for ``quick brown fox'' to match a
-document containing ``fast brown foxes,'' ``Johnny Walker'' to match
-``Johnnie Walker,'' or ``Arnold Shcwarzenneger'' to match ``Arnold
-Schwarzenegger''?
+事实上, 仅能精确匹配的全文检索 ((("full text search", "fuzzy matching")))可能会让你的用户感到失望. 难道你不希望一个对 ``quick brown fox'' 的检索能匹配包含
+``fast brown foxes,'' 的文档，对 ``Johnny Walker'' 的检索能匹配包含
+``Johnnie Walker,'' 的文档 或  ``Arnold Shcwarzenneger'' 能匹配 ``Arnold
+Schwarzenegger''吗?
 
-If documents exist that _do_ contain exactly what the user has queried,
-they should appear at the top of the result set, but weaker matches can be
-included further down the list.  If no documents match exactly, at least we
-can show the user potential matches; they may even be what the user
-originally intended!
+如果文档中存在 _确切_ 包含于用户查询的内容,它们应该出现在结果集的前面, 但更弱的匹配可能在下面的列表中.  
+如果没有精确匹配的文档, 至少我们应该为用户显示可能的匹配结果; 它们甚至有可能就是用户原来想要的!
 
-We have already looked at diacritic-free matching in <<token-normalization>>,
-word stemming in <<stemming>>, and synonyms in <<synonyms>>, but all of those
-approaches presuppose that words are spelled correctly, or that there is only
-one way to spell each word.
+我们已经在 <<token-normalization>> 看过 diacritic-free 匹配,
+在 <<stemming>> 看过 词干提取, 在 <<synonyms>> 看过 同义词, 但是所有的这些方法都预先假定了单词是正确拼写的, 或者每个词只有一种拼写方式.
 
-Fuzzy matching allows for query-time matching of misspelled words, while
-phonetic token filters at index time can be used for _sounds-like_ matching.
+模糊匹配允许 查询-时 匹配拼写错误的单词, 音标表征过滤器能在索引时用于 _发音-相似_ 的匹配.
 
diff --git a/270_Fuzzy_matching/20_Fuzziness.md b/270_Fuzzy_matching/20_Fuzziness.md
@@ -1,53 +1,46 @@
-[[fuzziness]]
-=== Fuzziness
+[[模糊]]
+=== 模糊
 
-_Fuzzy matching_ treats two words that are ``fuzzily'' similar as if they were
-the same word.((("typoes and misspellings", "fuzziness, defining"))) First, we need to define what((("fuzziness"))) we mean by _fuzziness_.
+_模糊匹配_ 视两个单词 ``模糊'' 相似,正好像它们是同一个词.
+((("typoes and misspellings", "fuzziness, defining"))) 首先, 我们需要通过_fuzziness_ 来定义什么是((("fuzziness"))).
 
-In 1965, Vladimir Levenshtein developed the
-http://en.wikipedia.org/wiki/Levenshtein_distance[Levenshtein distance], which
-measures ((("Levenshtein distance")))the number of single-character edits required to transform
-one word into the other. He proposed three types of one-character edits:
+1965年, Vladimir Levenshtein 开发了
+http://en.wikipedia.org/wiki/Levenshtein_distance[Levenshtein distance(Levenshtein距离)], 用来度量把一个单词转换为另一个单词需要的单字符编辑次数 ((("Levenshtein distance"))).
+他提出了3种单字符编辑:
 
-* _Substitution_ of one character for another: _f_ox -> _b_ox
+* _替换_ 一个字符到另一个字符: _f_ox -> _b_ox
 
-* _Insertion_ of a new character: sic -> sic_k_
+* _插入_ 一个新字符: sic -> sic_k_
 
-* _Deletion_ of a character:: b_l_ack -> back
+* _删除_ 一个字符:: b_l_ack -> back
 
 http://en.wikipedia.org/wiki/Frederick_J._Damerau[Frederick Damerau]
-later expanded these operations ((("Damerau, Frederick J.")))to include one more:
+稍后扩展了这些操作并包含了1个新的 ((("Damerau, Frederick J."))):
 
-* _Transposition_ of two adjacent characters: _st_ar -> _ts_ar
+* _换位_ 调整字符: _st_ar -> _ts_ar
 
-For example, to convert the word `bieber` into `beaver` requires the
-following steps:
+例如,把 `bieber` 转换为 `beaver` 需要以下几步:
 
-1. Substitute `v` for `b`: bie_b_er -> bie_v_er
-2. Substitute `a` for `i`: b_i_ever -> b_a_ever
-3. Transpose `a` and `e`:  b_ae_ver -> b_ea_ver
+1. 用 `v` 替换掉 `b`: bie_b_er -> bie_v_er
+2. 用 `a` 替换掉 `i`: b_i_ever -> b_a_ever
+3. 换位 `a` 和 `e` :  b_ae_ver -> b_ea_ver
 
-These three steps represent a
-http://bit.ly/1ymgZPB[Damerau-Levenshtein edit distance]
-of 3.
+以上的3步代表了3个
+http://bit.ly/1ymgZPB[Damerau-Levenshtein edit distance(Damerau-Levenshtein编辑距离)].
 
-Clearly, `bieber` is a long way from `beaver`&#x2014;they are too far apart to be
-considered a simple misspelling.  Damerau observed that 80% of human
-misspellings have an edit distance of 1. In other words, 80% of misspellings
-could be corrected with a _single edit_ to the original string.
+显然, `bieber` 距 `beaver`&#x2014很远;远得无法被认为是一个简单的拼写错误.
+Damerau发现 80% 的人类拼写错误的编辑距离都是1. 换句话说, 80% 的拼写错误都可以通过 _单次编辑_ 
+修改为原始的字符串.
 
-Elasticsearch supports a maximum edit distance, specified with the `fuzziness`
-parameter, of 2.
+通过指定 `fuzziness` 参数为 2,Elasticsearch 支持最大的编辑距离.
 
-Of course, the impact that a single edit has on a string depends on the
-length of the string.  Two edits to the word `hat` can produce `mad`, so
-allowing two edits on a string of length 3 is overkill. The `fuzziness`
-parameter can be set to `AUTO`, which results in the following maximum edit distances:
+当然, 一个字符串的单次编辑次数依赖于它的长度.  对 `hat` 进行两次编辑可以得到 `mad`,
+所以允许对长度为3的字符串进行两次修改就太过了. `fuzziness`
+参数可以被设置成 `AUTO`, 结果会在下面的最大编辑距离中:
 
-* `0` for strings of one or two characters
-* `1` for strings of three, four, or five characters
-* `2` for strings of more than five characters
+* `0` 1或2个字符的字符串
+* `1` 3、4或5个字符的字符串
+* `2` 多于5个字符的字符串
 
-Of course, you may find that an edit distance of `2` is still overkill, and
-returns results that don't appear to be related. You may get better results,
-and better performance, with a maximum `fuzziness` of `1`.
+当然, 你可能发现编辑距离为`2` 仍然是太过了, 返回的结果好像并没有什么关联. 
+把 `fuzziness` 设置为 `1` ,你可能会获得更好的结果和性能.