(Currently, mainly for myself, so not yet well organized or documented)
Collection of R libraries and resources for text analysis (mining) and vizualization.
With emphasis on processing multilingual texts (in UTF-8, specifically Russian and French)
Easy to run codes provided, including for getting texts from web and spitting them by sections for easier section-by-section analysis provided.
Includes:
- various sentiment/emotion analysis techniques.
- compileble code from Vignettes from udpipe and quanteda- All redone with Russian texts.
Based on:
See also https://github.com/gorodnichy/LA-R-Keras for using Neural Network (Tensorflow) based techniques for text clasification.
Raison d'être:
One day I will be able to produce the R code to automatically say that two books written in different languages (e.g. "Book of Joy" by Dalai Lama and Desmond Tutu written in English in 2017 and "Two Lives" by Concordia Antarova, written in Russian fifty years before that) or written for diffent audiences (e.g. "The Chronicles of Narnia" by C. S. Lewis, and "The Tibetan Book of Living and Dying" by Sogyal Rinpoche) are actually about the same...