Skip to content

Commit

Permalink
typo
Browse files Browse the repository at this point in the history
  • Loading branch information
zhoupingjay committed Sep 25, 2023
1 parent 8007a8e commit 62bbb08
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ I've included downloading and preprocessing steps in the `prepare_data.sh` scrip

## Tokenization

Each Chinease character in the text is regarded as a token, so I don't use a tokenizer in this project. IMHO, a Chinese character is not a "letter" - it's more like a word or subword. Therefore, each Chinese character should be treated as a token.
Each Chinese character in the text is regarded as a token, so I don't use a tokenizer in this project. IMHO, a Chinese character is not a "letter" - it's more like a word or subword. Therefore, each Chinese character should be treated as a token.

## Shuffling & Sampling

Expand Down Expand Up @@ -201,4 +201,4 @@ Lots of stuff I'm thinking about! Just to name a few:

## Contact

Interested in discussion? Just connect with me and let's chat!
Interested in discussion? Just connect with me and let's chat!

0 comments on commit 62bbb08

Please sign in to comment.