Background Information

A comparitive table of jōyō kanji, tradional Chinese characters and simplified Chinese characters may be found at this drive link, linked to from this blog post. It contains 1945 kanji and information regarding the corresponding Chinese characters.

This table has not been updated to match the late 2010 revision of the jōyō kanji, in which 196 characters were added and 5 characters were removed, for a total of 2136 characters.

Process Overview

Data is been exported from the PDF and cleaned up. More information for each kanji is collected from online. The end result is a csv table that can be imported into Anki.

This transformation is done with a series of scripts.

Scripts

We start with the input file comparitive jōyō kanji table.txt and apply the following scripts (in order):

comparative_list_text_to_csv.py

This converts the extracted jōyō kanji table into a csv table with a format that is more suitable for flashcards. Each row has the actual characters in it, if it exists (rather than a quotation mark to indicate it is the same as the kanji). There is an extra entry_type field, which indicates how the three characters relate to one another (i.e. which are same and which are different).

This produces comparitive jōyō kanji table.csv.

main.js

This is a web scraper for getting info about kanji off this site. It appends to the csv generated by the previous script. All of the info it gathers is added together as a single json entry in the csv table, so that it can be easily reformatted offline for practical use.

This produces web_scraper_output.csv.

web_info_parser.py

Takes the csv with the added web information (in a json format) and turns it into something that can be imported into Anki as flashcards.

This produces parsed_output.csv.

TODOs

It would be if the changes to the jōyō kanji set were accounted for.

There is more information that could be scraped from online, namely the stroke order.

May be able to collect more useful information (e.g. common compounds, example sentences) from other flashcard sets and combine it with this set.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Anki card layout		Anki card layout
data files		data files
.gitignore		.gitignore
README.md		README.md
comparative_list_text_to_csv.py		comparative_list_text_to_csv.py
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
web_info_parser.py		web_info_parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background Information

Process Overview

Scripts

TODOs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Background Information

Process Overview

Scripts

TODOs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages