Metadata for R packages: This is our final project submission for GR5702: Exploratory Data Analysis and Visualization course at Columbia University.
Finding a good R library is demanding. ggplot2 is might not be the answer for everything. So, we have tasked ourselves to use statistical methods to explore and visualize entirety of CRAN packages using what we learnt from STATGR5702 at Columbia University.
Results (our findings) will influence our builds.
- Data: Find appropriate parameters to judge a package.
- Clean the
available
package's data. - Retain features that well describe a package's relevance.
- Clean the
- Results: Research the data.
- Relevance is measure via trending usage data and a package usage history available via
pkgsearch
package. - Provide a comprehensive understanding of factors influencing a package's popularity and characteristics within the ecosystem.
- Relevance is measure via trending usage data and a package usage history available via
- Interactive Graphs:
- Build a keystroke animation using
d3
which animates dependency graph of queried package. - Build a package suggester by
tokenizing
keywords associated with each package's metadata. - Build a package backlink tracer, which shows how many times another package found this package useful.
- Build a keystroke animation using
- Data Cleaning is in Data section.
- Research Results are in Results section.
- Playable Graphs are in Interactive Graph section.
- Bhargav Kantheti (bk2899)
- Ryuichiro Sonoda (rs4493)
We are using data from two R packages:
- available CRAN: This package let us "Check if the Title of a Package is Available, Appropriate and Interesting".
- pkgsearch CRAN: This package helped us "Search CRAN metadata about packages by keyword, popularity, recent activity, package name and more. Uses the 'R-hub' search server, see https://r-pkg.org and the CRAN metadata database, that contains information about CRAN packages."