Skip to content

gorodnichy/LA-R-text

Repository files navigation

LA-R-text

Learn and Apply Text Analysis in R, including from Web

(Currently, mainly for myself, so not yet well organized or documented)

Collection of R libraries and resources for text analysis (mining) and vizualization,
including processing of multilingual texts (in UTF-8, specifically Russian and French)

Easy to run codes provided, including for getting texts from web and spitting them by sections for easier section-by-section analysis provided.

Books:

Text mining packages (used in tidytext book)

https://www.datacamp.com/courses/intro-to-text-mining-bag-of-words https://www.datacamp.com/courses/string-manipulation-in-r-with-stringr https://www.datacamp.com/courses/sentiment-analysis-in-r-the-tidy-way - by Julia Silge - Ch1 DONE

Includes:

  • various sentiment/emotion analysis techniques.
  • compileable code from Vignettes from udpipe and quanteda- All redone with Russian texts.

Based on:

See also https://github.com/gorodnichy/LA-R-Keras for using Neural Network (Tensorflow) based techniques for text clasification.

Plagiarism detection:

Data-sets:


You may also find these resources useful:


Processing Texts from Social media

  • Facebook: Rfacebook provides an interface to the Facebook API. (K)
  • Google+: plusser has been designed to to facilitate the retrieval of Google+ profiles, pages and posts. It also provides search facilities. Currently a Google+ API key is required for accessing Google+ data. tuber provides bindings for YouTube API. Only on Github for now. (K)
  • RedditExtractoR can retrieve data from the Reddit API.
  • Rlinkedin: is an R client for the LinkedIn API.
  • tumblr: tumblR (GitHub): R client for the Tumblr API ( https://www.tumblr.com/docs/en/api/v2). Tumblr is a microblogging platform and social networking website https://www.tumblr.com. (K)
  • Twitter: RTwitterAPI (not on CRAN) and twitteR provide an interface to the Twitter web API. streamR: This package provides a series of functions that allow R users to access Twitter's filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported. (K) Additionally, RKlout is an interface to Klout API v2. It fetches Klout Score for a Twitter Username/handle in real time. Klout is a silly ranking of Twitter influence.
  • SocialMediaLab provides a convenient wrapper around many other social media clients and enables the construction of network structures from those data.
  • SocialMediaMineR is an analytic tool that returns information about the popularity of a URL on social media sites.

https://cran.r-project.org/web/views/:

Topic modeling

About

Learn and Apply Text Analysis in R, Efficiently

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages