June 1, 2016
-
Collecting data from an API (All exercises with http://www.omdbapi.com/ )
- What is an API? (https://zapier.com/learn/apis/)
-
High level summary of HTTP that includes how it works, what it looks like, and how students are using it daily even if they do not realize it. B. HTTP and R with
httr
(https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html) -
Install and load
httr
-
HTTP verbs
- GET, POST, PUT, DELETE
GET()
- saves a response object, whatever is returned from the server- How to assemble a query
-
HTTP structure
-
The data sent back from the server consists of three parts:
-
The status line
status()
- deciphering status codes
-
Headers
headers()
-
The body
content()
-
-
Data Formats
-
JSON
- Context: How to interpret the structure. When it is used. Why use it.
content(json, as = "parsed", type = "application/json")
-
XML
- Context: How to interpret the structure. When it is used. Why use it.
content(xml, as = "parsed")
-
-
Other Verbs
- A brief reminder and description/demo of
POST()
PUT()
DELETE()
HEAD()
PATCH()
- A brief reminder and description/demo of
-
- What is an API? (https://zapier.com/learn/apis/)
-
Wrapping an API with an R package (All exercises build on code in previous part)
-
Packages that wrap web API's
- Tour several inspiring examples from ROpenSci
-
The quickest way to make a package (https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-programming-part-3/)
-
R Packages - a reminder that an R package is just an agreed upon file format that does not necessarily need to be hosted on CRAN. An example of an R package on github.
-
To make a package (with a handout)
- In the RStudio IDE, open a new session
- File > New Project > New Directory > Package
- Edit DESCRIPTION
- Edit R files
- Build > More > Document
- Build > More > Build Source Package
-
To load a package (on same handout)
- Packages > Install > Install From > Package Archive File > Install
-
-
Tips for APIs (https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html)
-
Strategy
-
Negotiating Content
- What is content negotiation?
accept_json()
accept_xml()
-
Handling errors
warn_for_status()
stop_for_status()
-
Authentication
-
Background context for web authentication
-
Basic authentication
- How it works
authenticate("username", "password")
- Weaknesses
-
OAuth1
- How it works
- No longer common
oauth1.0_token()
-
OAuth2
- How it works
- Strengths
oauth2.0_token()
-
Best practices for API keys
-
-
-
-
Web Scraping with R (All exercises with IMDB)
-
What if data is on a web page but there is no API? You can attempt to extract the data from the structure of the web page.
- What is a web page?
- HTML basics
- CSS basics
- Strategy: identify information by it's CSS selector
-
rvest
(https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html)-
selectorGadget
- Installation instructions
- Demo
-
Reading a page's DOM
readHTML()
-
Extracting information
html_nodes()
html_text()
html_name()
html_attr()
html_children()
html_table()
-
-
Perhaps more practice here than in the other parts
-