Skip to content

Commit

Permalink
shorten readme for CRAN, minor corrections
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesdunham committed May 30, 2017
1 parent 7aba43f commit 6523686
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 736 deletions.
326 changes: 30 additions & 296 deletions README.Rmd
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
output:
md_document:
variant: markdown_github
output: github_document
---
[![Build Status](https://travis-ci.org/jamesdunham/dgo.svg?branch=master)](https://travis-ci.org/jamesdunham/dgo)
[![Build status](https://ci.appveyor.com/api/projects/status/1ta36kmoqen98k87?svg=true)](https://ci.appveyor.com/project/jamesdunham/dgo)
[![codecov](https://codecov.io/gh/jamesdunham/dgo/branch/master/graph/badge.svg)](https://codecov.io/gh/jamesdunham/dgo)

# Introduction

dgo is an R package for the dynamic estimation of group-level opinion. The
package can be used to estimate subpopulation groups' average latent
conservatism (or other latent trait) from individuals' responses to dichotomous
Expand Down Expand Up @@ -44,317 +44,53 @@ knitr::opts_chunk$set(

# Installation

dgo requires a working installation of [RStan](http://mc-stan.org/interfaces/rstan.html).
If you don't have already have RStan, follow its
"[Getting Started](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started)"
guide before continuing.

dgo can be installed from [GitHub](https://github.com/jamesdunham/dgo) using
[devtools](https://github.com/hadley/devtools/):
dgo can be installed from CRAN:

```{r, eval = FALSE}
if (!require(devtools, quietly = TRUE)) install.packages("devtools")
devtools::install_github("jamesdunham/dgo")
install.packages("dgo")
```

# Getting started
Or get the latest version from [GitHub](https://github.com/jamesdunham/dgo)
using [devtools](https://github.com/hadley/devtools/):

```{r}
library(dgo)
```{r, eval = FALSE}
if (!require(devtools, quietly = TRUE)) install.packages("devtools")
devtools::install_github("jamesdunham/dgo")
```

The minimal workflow from raw data to estimation is:

1. shape input data using the `shape` function; and
2. pass the result to the `dgirt` function to estimate a latent trait (e.g.,
conservatism) or `dgmrp` function to estimate opinion on a single survey
question.

dgo requires a working installation of [RStan](http://mc-stan.org/interfaces/rstan.html).
If you don't have already have RStan, follow its
"[Getting Started](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started)" guide.

### Set RStan options
# Usage

These are RStan's recommended options on a local, multicore machine with excess
RAM:
Load the package and set RStan's recommended options for a local, multicore
machine with excess RAM:

```{r}
library(dgo)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
```

## Abortion Attitudes

### Prepare input data with `shape`

DGIRT models are *dynamic*, so we need to specify which variable in the data
represents time. They are also *group-level*, with groups defined by one
variable for respondents' local geographic area and one or more variables for
respondent characteristics.

The `time_filter` and `geo_filter` arguments optionally subset the data.
Finally, `shape` requires the names of the survey identifier and survey weight
variables in the data.

```{r}
dgirt_in_abortion <- shape(opinion,
item_names = "abortion",
time_name = "year",
geo_name = "state",
group_names = "race3",
geo_filter = c("CA", "GA", "LA", "MA"),
id_vars = "source")
```

The reshaped and subsetted data can be summarized in a few ways before model
fitting.

```{r}
summary(dgirt_in_abortion)
```

Response counts by state:

```{r}
get_n(dgirt_in_abortion, by = c("state"))
```

Response counts by item-year:

```{r}
get_item_n(dgirt_in_abortion, by = "year")
```

### Fit a model with `dgirt` or `dgmrp`

`dgirt` and `dgmrp` fit estimation models to data from `shape`. `dgirt` can be
used to estimate a latent variable based on responses to multiple survey
questions (e.g., latent policy conservatism), while `dgmrp` can be used to
estimate public opinion on an individual survey question (e.g., abortion) using
a dynamic multi-level regression and post-stratification (MRP) model. In this
case, we use `dgmrp` to model abortion attitudes.

Under the hood, these functions use RStan for MCMC sampling, and arguments can
be passed to RStan's `stan` via the `...` argument of `dgirt` and `dgmrp`. This
will almost always be desirable, at a minimum to specify the number of sampler
iterations, chains, and cores.

```{r, warning = FALSE, message = FALSE, results = 'hide'}
dgmrp_out_abortion <- dgmrp(dgirt_in_abortion, iter = 1500, chains = 4, cores =
4, seed = 42)
```

The model results are held in a `dgirtfit` object. Methods from RStan like
`extract` are available if needed because `dgirtfit` is a subclass of `stanfit`.
But dgo provides its own methods for typical post-estimation tasks.

### Work with `dgirt` or `dgmrp` results

For a high-level summary of the result, use `summary`.

```{r}
summary(dgmrp_out_abortion)
```

To summarize posterior samples, use `summarize`. The default output gives
summary statistics for the `theta_bar` parameters, which represent the mean of
the latent outcome for the groups defined by time, local geographic area, and
the demographic characteristics specified in the earlier call to `shape`.

```{r}
head(summarize(dgmrp_out_abortion))
```

Alternatively, `summarize` can apply arbitrary functions to posterior samples
for whatever parameter is given by its `pars` argument. Enclose function names
with quotes. For convenience, `"q_025"` and `"q_975"` give the 2.5th and 97.5th
posterior quantiles.

```{r}
summarize(dgmrp_out_abortion, pars = "xi", funs = "var")
```

To access posterior samples in tabular form use `as.data.frame`. By default,
this method returns post-warmup samples for the `theta_bar` parameters, but like
other methods takes a `pars` argument.

```{r}
head(as.data.frame(dgmrp_out_abortion))
```

To poststratify the results use `poststratify`. The following example uses the
group population proportions bundled as `annual_state_race_targets` to reweight
and aggregate estimates to strata defined by state-years.

Read `help("poststratify")` for more details.

```{r}
poststratify(dgmrp_out_abortion, annual_state_race_targets, strata_names =
c("state", "year"), aggregated_names = "race3")
```

To plot the results use `dgirt_plot`. This method plots summaries of posterior
samples by time period. By default, it shows a 95% credible interval around
posterior medians for the `theta_bar` parameters, for each local geographic
area. For this (unconverged) toy example we omit the CIs.

```{r dgmrp_plot, fig.show = 'hide'}
dgirt_plot(dgmrp_out_abortion, y_min = NULL, y_max = NULL)
```

![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgmrp_plot-1.png)

Output from `dgirt_plot` can be customized to some extent using objects from the
ggplot2 package.

```{r dgmrp_plot_plus, fig.show = 'hide'}
dgirt_plot(dgmrp_out_abortion, y_min = NULL, y_max = NULL) + theme_classic()
```

![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgmrp_plot_plus-1.png)

`dgirt_plot` can also plot the `data.frame` output from `poststratify`. This
requires arguments that identify the relevant variables in the `data.frame`.
Below, `poststratify` aggregates over the demographic grouping variable `race3`,
resulting in a `data.frame` of estimates by state-year. So, in the subsequent
call to `dgirt_plot`, we pass the names of the state and year variables. The
`group_names` argument is `NULL` because there are no grouping variables left
after aggregating over `race3`.

```{r dgmrp_plot_ps, fig.show = 'hide'}
ps <- poststratify(dgmrp_out_abortion, annual_state_race_targets, strata_names =
c("state", "year"), aggregated_names = "race3")
head(ps)
dgirt_plot(ps, group_names = NULL, time_name = "year", geo_name = "state")
```

![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgmrp_plot_ps-1.png)

## Policy Liberalism

### Prepare input data with `shape`

```{r}
dgirt_in_liberalism <- shape(opinion, item_names = c("abortion",
"affirmative_action","stemcell_research" , "gaymarriage_amendment",
"partialbirth_abortion") , time_name = "year", geo_name = "state",
group_names = "race3", geo_filter = c("CA", "GA", "LA", "MA"))
```

The reshaped and subsetted data can be summarized in a few ways before model
fitting.

```{r}
summary(dgirt_in_liberalism)
```

Response counts by item-year:

```{r}
get_item_n(dgirt_in_liberalism, by = "year")
```

### Fit a model with `dgirt`

`dgirt` and `dgmrp` fit estimation models to data from `shape`. `dgirt` can be
used to estimate a latent variable based on responses to multiple survey
questions (e.g., latent policy conservatism), while `dgmrp` can be used to
estimate public opinion on an individual survey question using a dynamic
multi-level regression and post-stratification (MRP) model.

Under the hood, these functions use RStan for MCMC sampling, and arguments can
be passed to RStan's `stan` via the `...` argument of `dgirt` and `dgmrp`. This
will almost always be desirable, at a minimum to specify the number of sampler
iterations, chains, and cores.

```{r, warning = FALSE, message = FALSE, results = 'hide'}
dgirt_out_liberalism <- dgirt(dgirt_in_liberalism, iter = 3000, chains = 4,
cores = 4, seed = 42)
```

The model results are held in a `dgirtfit` object. Methods from RStan like
`extract` are available if needed because `dgirtfit` is a subclass of `stanfit`.
But dgo provides its own methods for typical post-estimation tasks.

### Work with `dgirt` results

For a high-level summary of the result, use `summary`.

```{r}
summary(dgirt_out_liberalism)
```

To summarize posterior samples, use `summarize`. The default output gives
summary statistics for the `theta_bar` parameters, which represent the mean of
the latent outcome for the groups defined by time, local geographic area, and
the demographic characteristics specified in the earlier call to `shape`.

```{r}
head(summarize(dgirt_out_liberalism))
```

Alternatively, `summarize` can apply arbitrary functions to posterior samples
for whatever parameter is given by its `pars` argument. Enclose function names
with quotes. For convenience, `"q_025"` and `"q_975"` give the 2.5th and 97.5th
posterior quantiles.

```{r}
summarize(dgirt_out_liberalism, pars = "xi", funs = "var")
```

To access posterior samples in tabular form use `as.data.frame`. By default,
this method returns post-warmup samples for the `theta_bar` parameters, but like
other methods takes a `pars` argument.

```{r}
head(as.data.frame(dgirt_out_liberalism))
```

To poststratify the results use `poststratify`. The following example uses the
group population proportions bundled as `annual_state_race_targets` to reweight and aggregate
estimates to strata defined by state-years. Read `help("poststratify")` for more
details.

```{r}
poststratify(dgirt_out_liberalism, annual_state_race_targets, strata_names = c("state",
"year"), aggregated_names = "race3")
```

To plot the results use `dgirt_plot`. This method plots summaries of posterior
samples by time period. By default, it shows a 95% credible interval around
posterior medians for the `theta_bar` parameters, for each local geographic
area. For this (unconverged) toy example we omit the CIs.

```{r dgirt_plot, fig.show = 'hide'}
dgirt_plot(dgirt_out_liberalism, y_min = NULL, y_max = NULL)
```

![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgirt_plot-1.png)

`dgirt_plot` can also plot the `data.frame` output from `poststratify`. This
requires arguments that identify the relevant variables in the `data.frame`.
Below, `poststratify` aggregates over the demographic grouping variable `race3`,
resulting in a `data.frame` of estimates by state-year. So, in the subsequent
call to `dgirt_plot`, we pass the names of the state and year variables. The
`group_names` argument is `NULL` because there are no grouping variables left
after aggregating over `race3`.
The minimal workflow from raw data to estimation is:

```{r dgirt_plot_ps, fig.show = 'hide'}
ps <- poststratify(dgirt_out_liberalism, annual_state_race_targets, strata_names = c("state",
"year"), aggregated_names = "race3")
head(ps)
dgirt_plot(ps, group_names = NULL, time_name = "year", geo_name = "state")
```
1. shape input data using the `shape()` function; and
2. pass the result to the `dgirt()` function to estimate a latent trait (e.g.,
conservatism) or `dgmrp()` function to estimate opinion on a single survey
question.

![](https://raw.githubusercontent.com/jamesdunham/dgo/master/README/dgirt_plot_ps-1.png)
See the [package site](https://jdunham.io/dgo) for worked examples.

## Troubleshooting
# Troubleshooting

Please [report issues](https://github.com/jamesdunham/dgo/issues) that you
encounter.

* OS X only: RStan creates temporary files during estimation in a location
given by `tempdir`, typically an arbitrary location in `/var/folders`. If a
model runs for days, these files can be cleaned up while still needed, which
induces an error. A good solution is to set a safer path for temporary
given by `tempdir()`, typically an arbitrary location in `/var/folders`. If
a model runs for days, these files can be cleaned up while still needed,
which induces an error. A good solution is to set a safer path for temporary
files, using an environment variable checked at session startup. For help
setting environment variables, see the Stack Overflow question
[here](https://stackoverflow.com/questions/17107206/change-temporary-directory).
Expand All @@ -363,7 +99,7 @@ encounter.

* Models fitted before October 2016 (specifically <
[#8e6a2cf](https://github.com/jamesdunham/dgo/commit/8e6a2cfbe00b2cd4a908b3067241e06124d143cd))
using dgirtfit are not fully compatible with dgo. Their contents can be
using dgirt are not fully compatible with dgo. Their contents can be
extracted without using dgo, however, with the `$` indexing operator. For
example: `as.data.frame(dgirtfit_object$stan.cmb)`.

Expand All @@ -372,13 +108,11 @@ encounter.
compilation. These are safe to ignore, or can be suppressed by following the
linked instructions.

## Contributing and citing
# Contributing and citing

dgo is under development and we welcome
[suggestions](https://github.com/jamesdunham/dgo/issues).

The package citation is

> Dunham, James, Devin Caughey, and Christopher Warshaw. 2017. dgo: Dynamic
> Estimation of Group-level Opinion. R package.
> https://jamesdunham.github.io/dgo/.
> Estimation of Group-level Opinion. R package. https://jdunham.io/dgo/.
Loading

0 comments on commit 6523686

Please sign in to comment.