LA-R-text

Learn and Apply Text Analysis in R, including from Web

(Currently, mainly for myself, so not yet well organized or documented)

Collection of R libraries and resources for text analysis (mining) and vizualization,
including processing of multilingual texts (in UTF-8, specifically Russian and French)

Easy to run codes provided, including for getting texts from web and spitting them by sections for easier section-by-section analysis provided.

Books:

https://www.tidytextmining.com

Text mining packages (used in tidytext book)

tm
quanteda
lexicon?
qdap
syuzhet
https://github.com/trinker/sentimentr ** <-- Vinette! it Compares sentimentr, syuzhet, meanr, and Stanford

https://www.datacamp.com/courses/intro-to-text-mining-bag-of-words https://www.datacamp.com/courses/string-manipulation-in-r-with-stringr https://www.datacamp.com/courses/sentiment-analysis-in-r-the-tidy-way - by Julia Silge - Ch1 DONE

Includes:

various sentiment/emotion analysis techniques.
compileable code from Vignettes from udpipe and quanteda- All redone with Russian texts.

Based on:

See also https://github.com/gorodnichy/LA-R-Keras for using Neural Network (Tensorflow) based techniques for text clasification.

Plagiarism detection:

Data-sets:

https://cran.r-project.org/web/packages/Rpoet - Wrapper for the 'PoetryDB' API http://poetrydb.org
https://cran.r-project.org/web/packages/gutenbergr/vignettes/intro.html - Wrapper for http://www.gutenberg.org/
https://cran.rstudio.com/web/packages/rplos/index.html

You may also find these resources useful:

CRAN The Natural Language Processing View (https://cran.r-project.org/web/views/NaturalLanguageProcessing.html) suggests many R packages related to text mining, especially around the tm package.
You could match the wikipedia column in gutenberg_author to Wikipedia content with the WikipediR package - https://cran.r-project.org/web/packages/WikipediR/index.html or to pageview statistics with the wikipediatrend package - https://cran.r-project.org/web/packages/wikipediatrend/index.html
If you’re considering an analysis based on author name, you may find the humaniformat (for extraction of first names) and gender (prediction of gender from first names) packages useful. (Note that humaniformat has a format_reverse function for reversing “Last, First” names).

Processing Texts from Social media

Facebook: Rfacebook provides an interface to the Facebook API. (K)
Google+: plusser has been designed to to facilitate the retrieval of Google+ profiles, pages and posts. It also provides search facilities. Currently a Google+ API key is required for accessing Google+ data. tuber provides bindings for YouTube API. Only on Github for now. (K)
RedditExtractoR can retrieve data from the Reddit API.
Rlinkedin: is an R client for the LinkedIn API.
tumblr: tumblR (GitHub): R client for the Tumblr API ( https://www.tumblr.com/docs/en/api/v2). Tumblr is a microblogging platform and social networking website https://www.tumblr.com. (K)
Twitter: RTwitterAPI (not on CRAN) and twitteR provide an interface to the Twitter web API. streamR: This package provides a series of functions that allow R users to access Twitter's filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported. (K) Additionally, RKlout is an interface to Klout API v2. It fetches Klout Score for a Twitter Username/handle in real time. Klout is a silly ranking of Twitter influence.
SocialMediaLab provides a convenient wrapper around many other social media clients and enables the construction of network structures from those data.
SocialMediaMineR is an analytic tool that returns information about the popularity of a URL on social media sites.

https://cran.r-project.org/web/views/:

Topic modeling

Hands-on: a five day text mining course for humanists and social scientistsin R
Reading Tea Leaves: How Humans Interpret Topic Models
Excellent one: https://tm4ss.github.io/docs/Tutorial_1_Web_scraping.html - https://github.com/tm4ss / https://github.com/tm4ss/tm4ss.github.io
- Tutorial 1: Web crawling and scraping
- Tutorial 2: Processing of textual data
- Tutorial 3: Frequency analysis
- Tutorial 4: Key term extraction
- Tutorial 5: Co-occurrence analysis
- Tutorial 6: Topic Models
- Tutorial 7: Classification
- Tutorial 8: Part-of-Speech tagging / Named Entity Recognition
Another one: https://slcladal.github.io/topicmodels.html#ref-silge2017text - https://github.com/SLCLADAL / https://github.com/SLCLADAL/SLCLADAL.github.io Text Analysis and Distant Reading - Concordancing (keywords-in-context) - Network Analysis - Co-occurrence and Collocation Analysis - Topic Modeling - Sentiment Analysis - Tagging and Parsing

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Workshop-Scraping-Twitter		Workshop-Scraping-Twitter
r-web-scraping-cheat-sheet-master		r-web-scraping-cheat-sheet-master
rstudio.com_training		rstudio.com_training
user2016-tutorial-master		user2016-tutorial-master
.gitignore		.gitignore
LA-R-text.Rproj		LA-R-text.Rproj
LICENSE		LICENSE
README.md		README.md
datacamp-Working_with_Web_Data_in_R.R		datacamp-Working_with_Web_Data_in_R.R
datacamp1.R		datacamp1.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LA-R-text

Learn and Apply Text Analysis in R, including from Web

Processing Texts from Social media

Topic modeling

About

Releases

Packages

Contributors 2

Languages

License

gorodnichy/LA-R-text

Folders and files

Latest commit

History

Repository files navigation

LA-R-text

Learn and Apply Text Analysis in R, including from Web

Processing Texts from Social media

Topic modeling

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages