A comparitive table of jōyō kanji, tradional Chinese characters and simplified Chinese characters may be found at this drive link, linked to from this blog post. It contains 1945 kanji and information regarding the corresponding Chinese characters.
This table has not been updated to match the late 2010 revision of the jōyō kanji, in which 196 characters were added and 5 characters were removed, for a total of 2136 characters.
Data is been exported from the PDF and cleaned up. More information for each kanji is collected from online. The end result is a csv table that can be imported into Anki.
This transformation is done with a series of scripts.
We start with the input file comparitive jōyō kanji table.txt and apply the following scripts (in order):
- comparative_list_text_to_csv.py
This converts the extracted jōyō kanji table into a csv table with a format that is more suitable for flashcards. Each row has the actual characters in it, if it exists (rather than a quotation mark to indicate it is the same as the kanji). There is an extra entry_type field, which indicates how the three characters relate to one another (i.e. which are same and which are different).
This produces comparitive jōyō kanji table.csv.
- main.js
This is a web scraper for getting info about kanji off this site. It appends to the csv generated by the previous script. All of the info it gathers is added together as a single json entry in the csv table, so that it can be easily reformatted offline for practical use.
This produces web_scraper_output.csv.
- web_info_parser.py
Takes the csv with the added web information (in a json format) and turns it into something that can be imported into Anki as flashcards.
This produces parsed_output.csv.
It would be if the changes to the jōyō kanji set were accounted for.
There is more information that could be scraped from online, namely the stroke order.
May be able to collect more useful information (e.g. common compounds, example sentences) from other flashcard sets and combine it with this set.