-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
120 lines (81 loc) · 6.46 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(comotext)
```
# comotext: R Tools to Extract COVID-19-Related Resolutions and Policies in the Philippines <img src="man/figures/comotext.png" width = "200px" align="right" />
<!-- badges: start -->
[](https://www.tidyverse.org/lifecycle/#experimental)
[](https://travis-ci.org/como-ph/comotext)
[](https://ci.appveyor.com/project/como-ph/comotext)
[](https://github.com/como-ph/comotext/actions)
[](https://codecov.io/gh/como-ph/comotext?branch=master)
[](https://zenodo.org/badge/latestdoi/255823130)
<!-- badges: end -->
To assess possible impact of various COVID-19 prediction models on Philippine government response, text from various resolutions, press releases and legislation has been collected using data mining approaches implemented in R. This package includes functions used for this data mining process and datasets of text that have been collected and processed for use in text analysis.
## Installation
`comotext` is still in active development. You can install the development version of `comotext` from [GitHub](https://github.com/como-ph/comotext) with:
```{r install, echo = TRUE, eval = FALSE}
if(!require(remotes)) install.packages("remotes")
remotes::install_github("como-ph/comotext")
```
## Usage
### Datasets
`comotext` currently has 28 datasets of COVID-19-related resolutions and policies in the Philippines. These datasets are 28 resolutions made by the Inter-Agency Task Force for the Management of Emerging Infectious Diseases (IATF).
A description of the available datasets can be found [here](https://como-ph.github.io/comotext/reference/index.html#section-datasets).
A table of the ```r nrow(iatfLinks)``` IATF resolutions and the URLs to download them can be generated using the function `get_iatf_links()` as follows:
```{r usage, echo = TRUE, eval = TRUE}
get_iatf_links()
```
`comotext` also holds 1 dataset of all [Department of Health](http://www.doh.gov.ph) press releases to date. A description of the `dohRelease` dataset can be found [here](https://como-ph.github.io/comotext/reference/dohRelease.html). This dataset has been generated using the `get_doh_release()` function (see description below) included in `comotext`. Related to this is the dataset `dohLinks` which holds the relative URL links for each of the press releases in the [Department of Health](http://www.doh.gov.ph) website to date. This dataset has been produced using the `get_doh_links()` function (see description below) included in `comotext`. A description of the `dohLinks` dataset can be found [here](https://como-ph.github.io/comotext/reference/dohLinks.html).
### Extracting text data from press releases
Press releases issued by the [Department of Health](https://www.doh.gov.ph) are available publicly via their [website](https://www.doh.gov.ph/press-releases). The structure of the press releases page is that the section with the links to the press releases text is in a panel within the web page with the panel itself having pagination with each page containing links to 28 press releases with press releases ordered in reverse chronological order.
The function `get_doh_links()` extracts the relative URL links to each of the press releases on a current page within the press releases panel. If we want to get the absolute URL links for the press releases on page 1 of the press releases panel, we use:
```{r usage1, echo = TRUE, eval = TRUE}
get_doh_links(pages = 1)
```
The function `get_doh_release()` creates a dataset of text of press releases given a URL of a specific press release text and the date of release. This information is provided for by `get_doh_links()`. If we want to get the text data of the press releases from page 1 of the press release panel, we use:
```{r usage2, echo = TRUE, eval = TRUE}
## Extract URLs from DoH press releases page 1
prURL <- get_doh_links(pages = 1)
## Extract text from first press release
get_doh_release(df = prURL[1, ])
```
To get all the [DoH](https://www.doh.gov.ph) press releases available from their [website](https://www.doh.gov.ph/press-releases), use:
```{r usage3, echo = TRUE, eval = FALSE}
## Extract URLs
pr <- get_doh_links(pages = 1:25)
## Extract all press releases text
pressRelease <- NULL
for(i in 1:nrow(pr)) {
currentPR <- get_doh_release(df = pr[i, ])
pressRelease <- rbind(pressRelease, currentPR)
}
```
```{r usage4, echo = FALSE, eval = TRUE}
dohRelease
```
This produces the same dataset as `dohRelease` included in `comotext`.
### Concatenating text datasets
The datasets described above can be processed and analysed on their own or as a combined corpus of text data. `comotext` provides convenience functions that concatenates all or specific text datasets available from the `comotext` package.
#### Concatenating datasets based on a specific search term
The `combine_docs` function allows the user to specify search terms to use in identifying datasets provided by the `comotext` package. The `docs` argument allows the specification of a vector of search terms to use to identify the names of datasets to concatenate. If the name/s of the datasets contain these search terms, the datasets with these name/s will be returned.
```{r usage5, echo = TRUE, eval = TRUE}
combine_docs(docs = "resolution")
```
The `combine_iatf` function is a specialised wrapper of the `combine_docs` function that specifically returns datasets containing IATF resolutions. An additional argument `res` allows users to specify which IATF resolutions to return. To get IATF resolution 10, 11, and 12, the following call to `combine_iatf` is made as follows:
```{r usage6, echo = TRUE, eval = TRUE}
combine_iatf(docs = "resolution", res = 10:12)
```
To check if only resolutions 10 to 12 have been returned:
```{r usage7, echo = TRUE, eval = TRUE}
combine_iatf(docs = "resolution", res = 10:12)[ , c("type", "id")]
```