PyThaiNLP
diff --git a/‎AUTHORS.rst
Lines changed: 4 additions & 0 deletions b/‎AUTHORS.rst
Lines changed: 4 additions & 0 deletions
diff --git a/‎MANIFEST.in
Lines changed: 1 addition & 1 deletion b/‎MANIFEST.in
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 14 additions & 17 deletions b/‎README.md
Lines changed: 14 additions & 17 deletions
diff --git a/‎README.rst
Lines changed: 4 additions & 11 deletions b/‎README.rst
Lines changed: 4 additions & 11 deletions
diff --git a/‎appveyor.yml
Lines changed: 1 addition & 1 deletion b/‎appveyor.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/pythainlp-1-5-eng.md
Lines changed: 23 additions & 1 deletion b/‎docs/pythainlp-1-5-eng.md
Lines changed: 23 additions & 1 deletion
diff --git a/‎docs/pythainlp-1-5-thai.md
Lines changed: 34 additions & 13 deletions b/‎docs/pythainlp-1-5-thai.md
Lines changed: 34 additions & 13 deletions
@@ -7,6 +7,10 @@ Development Lead
 
 * Wannaphong Phatthiyaphaibun <[email protected]>
 
+* Korakot Chaovavanich
+
+* Charin Polpanumas
+
 TCC & THAI SOUNDEX CODE
 ------------
 
 
@@ -5,4 +5,4 @@ include README.rst
 
 recursive-include tests *
 recursive-exclude * __pycache__
-recursive-exclude * *.py[co]
+recursive-exclude * *.py[co]
@@ -1,12 +1,11 @@
+![PyThaiNLP Logo](https://avatars0.githubusercontent.com/u/32934255?s=200&v=4)
+
 # PyThaiNLP
-[![PyPI Downloads](https://img.shields.io/pypi/dm/pythainlp.png)]
-[![Codacy Badge](https://api.codacy.com/project/badge/Grade/50fa9d87f4fb4a95aac62b398aa374fa)](https://www.codacy.com/app/wannaphongcom/pythainlp?utm_source=github.com&utm_medium=referral&utm_content=wannaphongcom/pythainlp&utm_campaign=badger)
-[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
-[![Build Status](https://travis-ci.org/wannaphongcom/pythainlp.svg?branch=develop)](https://travis-ci.org/wannaphongcom/pythainlp)
-[![Build status](https://ci.appveyor.com/api/projects/status/uxerymgggp1uch0p?svg=true)](https://ci.appveyor.com/project/wannaphongcom/pythainlp)
-[![Coverage Status](https://coveralls.io/repos/github/wannaphongcom/pythainlp/badge.svg)](https://coveralls.io/github/wannaphongcom/pythainlp)
 
-Homepags : https://sites.google.com/view/pythainlp/
+[![Codacy Badge](https://api.codacy.com/project/badge/Grade/cb946260c87a4cc5905ca608704406f7)](https://www.codacy.com/app/pythainlp/pythainlp_2?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=PyThaiNLP/pythainlp&amp;utm_campaign=Badge_Grade)[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
+[![Build Status](https://travis-ci.org/PyThaiNLP/pythainlp.svg?branch=develop)](https://travis-ci.org/PyThaiNLP/pythainlp)
+[![Build status](https://ci.appveyor.com/api/projects/status/9g3mfcwchi8em40x?svg=true)](https://ci.appveyor.com/project/wannaphongcom/pythainlp-9y1ch)
+[![Coverage Status](https://coveralls.io/repos/github/PyThaiNLP/pythainlp/badge.svg?branch=dev)](https://coveralls.io/github/PyThaiNLP/pythainlp?branch=dev)
 
 Thai natural language processing in Python.
 
@@ -17,7 +16,7 @@ It supports both Python 2.7 and Python 3.
 
 ### Capability
 
-- Thai segmentation
+- Thai word segmentation
 - Thai wordnet
 - Thai Character Clusters (TCC) and ETCC
 - Thai stop word
@@ -37,11 +36,9 @@ and much more.
 $ pip install pythainlp
 ```
 
-more read on [https://sites.google.com/view/pythainlp/english/install](https://sites.google.com/view/pythainlp/english/install)
-
 ### Documentation
 
-Read on [https://sites.google.com/view/pythainlp/english](https://sites.google.com/view/pythainlp/english)
+Read on [https://github.com/PyThaiNLP/pythainlp/tree/dev/docs](https://github.com/PyThaiNLP/pythainlp/tree/dev/docs)
 
 ### License
 
@@ -57,7 +54,7 @@ Natural language processing หรือ การประมวลภาษา
 
 รองรับ Python 2.7 และ Python 3
 
-  - หน้าหลัก GitHub :  [https://github.com/wannaphongcom/pythainlp](https://github.com/wannaphongcom/pythainlp)
+  - หน้าหลัก GitHub :  [https://github.com/PyThaiNLP/pythainlp/](https://github.com/PyThaiNLP/pythainlp/)
 
 ### ความสามารถ
   - ตัดคำภาษาไทย
@@ -84,19 +81,19 @@ Natural language processing หรือ การประมวลภาษา
 $ pip install pythainlp
 ```
 
-อ่านเพิ่มเติมใน [https://sites.google.com/view/pythainlp/ภาษาไทย/install](https://sites.google.com/view/pythainlp/ภาษาไทย/install)
-
-
 ### เอกสารการใช้งาน
 
-อ่านได้ที่  [https://sites.google.com/view/pythainlp/ภาษาไทย](https://sites.google.com/view/pythainlp/ภาษาไทย)
+อ่านได้ที่  [https://github.com/PyThaiNLP/pythainlp/tree/dev/docs](https://github.com/PyThaiNLP/pythainlp/tree/dev/docs)
 
 ### License
 
 Apache Software License 2.0
 
+พัฒนาโดย PyThaiNLP
+
+### Logo
 
-พัฒนาโดย นาย วรรณพงษ์  ภัททิยไพบูลย์
+ออกแบบโดยคุณ วรุตม์ พสุธาดล จากการประกวดที่ https://www.facebook.com/groups/408004796247683/permalink/475864542795041/ และ https://www.facebook.com/groups/408004796247683/permalink/474262752955220/
 
 ### สนับสนุน
 
 
@@ -1,6 +1,6 @@
-=========
-PyThaiNLP 1.5
-=========
+================
+PyThaiNLP 1.6.0
+================
 
 Thai natural language processing in Python.
 
@@ -15,17 +15,10 @@ It supports both Python 2.7 and Python 3.
 
 pip install pythainlp
 
-Homepags : `https://sites.google.com/view/pythainlp/ <https://sites.google.com/view/pythainlp/>`_
 
 GitHub : https://github.com/wannaphongcom/pythainlp
 
-Development Lead
-----------------
 
-* Wannaphong Phatthiyaphaibun <[email protected]>
-
-
-License
-~~~~~~~
+**License**
 
 Apache Software License 2.0
@@ -18,4 +18,4 @@ install:
 
 test_script:
   - "%PYTHON%/Scripts/pip.exe --version"
-  - "%PYTHON%/python.exe setup.py test"
+  - "%PYTHON%/python.exe setup.py test"
@@ -41,6 +41,28 @@ d=word_tokenize(text,engine='pylexto') # ['ผม', 'รัก', 'คุณ', '
 e=word_tokenize(text,engine='newmm') # ['ผม', 'รัก', 'คุณ', 'นะ', 'ครับ', 'โอเค', 'บ่', 'พวกเรา', 'เป็น', 'คนไทย', 'รัก', 'ภาษาไทย', 'ภาษา', 'บ้านเกิด']
 ```
 
+#### dict_word_tokenize
+
+```python
+from pythainlp.tokenize import dict_word_tokenize
+dict_word_tokenize(text,file,engine)
+```
+
+A command for tokenize by using user-defined information.
+
+text : str
+
+file : name file data using in tokenize.
+
+engine
+
+- newmm
+- wordcutpy : using wordcutpy (https://github.com/veer66/wordcutpy)
+- mm
+- longest-matching
+
+Example https://gist.github.com/wannaphongcom/1e862583051bf0464b6ef4ed592f739c
+
 #### sent_tokenize
 
 Thai Sentence Tokenizer
@@ -238,7 +260,7 @@ credit Korakot Chaovavanich (from https://gist.github.com/korakot/0b772e09340cac
 **Example**
 
 ```python
->>> from pythainlp.soundex import LK82
+>>> from pythainlp.soundex import LK82,Udom83
 >>> print(LK82('รถ'))
 ร3000
 >>> print(LK82('รด'))
 
@@ -16,22 +16,20 @@ pip install pythainlp
 
 **วิธีติดตั้งสำหรับ Windows**
 
-ให้ทำการติดตั้ง pyicu โดยใช้ไฟล์ .whl จาก [http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyicu](http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyicu) 
+การติดตั้ง pythainlp บน windows ต้องติดตั้ง pyicu ซึ่งทำได้ยากมาก
+วิธีที่ง่ายที่สุดคือใช้ wheel
 
-หากใช้ python 3.5 64 bit ให้โหลด PyICU‑1.9.7‑cp35‑cp35m‑win_amd64.whl แล้วเปิด cmd ใช้คำสั่ง
+1. [http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyicu](http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyicu) แล้ว download wheel ตาม python ตัวเองเช่น
+  ผมใช้ python x64 3.6.1 บน Windows ก็ให้ใช้ PyICU‑1.9.7‑cp36‑cp36m‑win_amd64.whl
 
-```
-pip install PyICU‑1.9.7‑cp35‑cp35m‑win_amd64.whl
-```
+2. `pip install PyICU‑1.9.7‑cp36‑cp36m‑win_amd64.whl`
 
-แล้วจึงใช้ 
-
-```
-pip install pythainlp
-```
+3. `pip install pythainlp`
 
 **ติดตั้งบน Mac**
 
+** แนะนำให้ใช้ icu 58.2 เนื่องจาก icu 59.1 มาปัญหากับ PyICU **
+
 ```sh
 $ brew install icu4c --force
 $ brew link --force icu4c
@@ -57,7 +55,7 @@ text คือ ข้อความในรูปแบบสตริง str
 engine คือ ระบบตัดคำไทย ปัจจุบันนี้ PyThaiNLP ได้พัฒนามี 6 engine ให้ใช้งานกันดังนี้
 
 1. icu -  engine ตัวดั้งเดิมของ PyThaiNLP (ความแม่นยำต่ำ) และเป็นค่าเริ่มต้น
-2. dict - เป็นการตัดคำโดยใช้พจานุกรมจาก thaiword.txt ใน corpus  (ความแม่นยำปานกลาง) จะคืนค่า False หากข้อความนั้นไม่สามารถตัดคำได้
+2. dict - เป็นการตัดคำโดยใช้พจานุกรมจาก thaiword.txt ใน corpus  (ความแม่นยำปานกลาง) **จะคืนค่า False หากข้อความนั้นไม่สามารถตัดคำได้**
 3. longest-matching ใช้ Longest matching ในการตัดคำ
 4. mm - ใช้ Maximum Matching algorithm ในการตัดคำภาษาไทย - API ชุดเก่า
 5. newmm - ใช้ Maximum Matching algorithm ในการตัดคำภาษาไทย โค้ดชุดใหม่ โดยใช้โค้ดคุณ Korakot Chaovavanich  จาก https://www.facebook.com/groups/408004796247683/permalink/431283740586455/ มาพัฒนาต่อ
@@ -77,8 +75,31 @@ b=word_tokenize(text,engine='dict') # ['ผม', 'รัก', 'คุณ', 'น
 c=word_tokenize(text,engine='mm') # ['ผม', 'รัก', 'คุณ', 'นะ', 'ครับ', 'โอเค', 'บ่', 'พวกเรา', 'เป็น', 'คนไทย', 'รัก', 'ภาษาไทย', 'ภาษา', 'บ้านเกิด']
 d=word_tokenize(text,engine='pylexto') # ['ผม', 'รัก', 'คุณ', 'นะ', 'ครับ', 'โอเค', 'บ่', 'พวกเรา', 'เป็น', 'คนไทย', 'รัก', 'ภาษาไทย', 'ภาษา', 'บ้านเกิด']
 e=word_tokenize(text,engine='newmm') # ['ผม', 'รัก', 'คุณ', 'นะ', 'ครับ', 'โอเค', 'บ่', 'พวกเรา', 'เป็น', 'คนไทย', 'รัก', 'ภาษาไทย', 'ภาษา', 'บ้านเกิด']
+g=word_tokenize(text,engine='wordcutpy') # ['ผม', 'รัก', 'คุณ', 'นะ', 'ครับ', 'โอเค', 'บ่', 'พวกเรา', 'เป็น', 'คน', 'ไทย', 'รัก', 'ภาษา', 'ไทย', 'ภาษา', 'บ้านเกิด']
 ```
 
+#### dict_word_tokenize
+
+```python
+from pythainlp.tokenize import dict_word_tokenize
+dict_word_tokenize(text,file,engine)
+```
+
+เป็นคำสั่งสำหรับตัดคำโดยใช้ข้อมูลที่ผู้ใช้กำหนด
+
+text คือ ข้อความที่ต้องการตัดคำ
+
+file คือ ที่ตั้งไฟล์ที่ต้องการมาเป็นฐานข้อมูลตัดคำ
+
+engine คือ เครื่องมือตัดคำ
+
+- newmm ตัดคำด้วย newmm
+- wordcutpy ใช้ wordcutpy (https://github.com/veer66/wordcutpy) ในการตัดคำ
+- mm ตัดคำด้วย mm
+- longest-matching ตัดคำโดยใช้ longest matching
+
+ตัวอย่างการใช้งาน https://gist.github.com/wannaphongcom/1e862583051bf0464b6ef4ed592f739c
+
 #### sent_tokenize
 
 ใช้ตัดประโยคภาษาไทย
@@ -338,7 +359,7 @@ from pythainlp.change import *
 **การใช้งาน**
 
 ```python
->>> from pythainlp.soundex import LK82
+>>> from pythainlp.soundex import LK82,Udom83
 >>> print(LK82('รถ'))
 ร3000
 >>> print(LK82('รด'))
@@ -374,7 +395,7 @@ from pythainlp.sentiment import sentiment
 sentiment(str)
 ```
 
-รับค่า str ส่งออกเป็น pos , neg หรือ neutral
+รับค่า str ส่งออกเป็น pos , neg
 
 ### Util