From 62bbb08326ac87838a259661f8866f25e1f475cf Mon Sep 17 00:00:00 2001 From: Ping Zhou Date: Sun, 24 Sep 2023 18:36:45 -0700 Subject: [PATCH] typo --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 181f51a..614e05d 100644 --- a/README.md +++ b/README.md @@ -102,7 +102,7 @@ I've included downloading and preprocessing steps in the `prepare_data.sh` scrip ## Tokenization -Each Chinease character in the text is regarded as a token, so I don't use a tokenizer in this project. IMHO, a Chinese character is not a "letter" - it's more like a word or subword. Therefore, each Chinese character should be treated as a token. +Each Chinese character in the text is regarded as a token, so I don't use a tokenizer in this project. IMHO, a Chinese character is not a "letter" - it's more like a word or subword. Therefore, each Chinese character should be treated as a token. ## Shuffling & Sampling @@ -201,4 +201,4 @@ Lots of stuff I'm thinking about! Just to name a few: ## Contact -Interested in discussion? Just connect with me and let's chat! \ No newline at end of file +Interested in discussion? Just connect with me and let's chat!