Skip to content

Commit

Permalink
Merge pull request #14 from PIP-Technical-Team/DEV_website_mean
Browse files Browse the repository at this point in the history
Dev website mean
  • Loading branch information
giorgiacek authored Feb 8, 2024
2 parents a09525b + 442ba64 commit 1737d5e
Show file tree
Hide file tree
Showing 2 changed files with 130 additions and 30 deletions.
1 change: 1 addition & 0 deletions R/pipgd_pov.R
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ pipgd_pov_gap_nv <- function(params = NULL,
weight = weight,
complete = TRUE,
popshare = popshare,
mean = mean,
povline = povline,
lorenz = lorenz)
} else {
Expand Down
159 changes: 129 additions & 30 deletions vignettes/gd_functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,20 @@ library(pipster)

## Overview

This vignette shows an overview of the `pipster` package functions for grouped data. Grouped data are consumption expenditure or income organized in intervals or bins, such as deciles or percentiles. In order to estimate poverty and inequality measures from grouped data, one has to derive a continuous Lorenz curve and use it together with mean welfare to build a full distribution.
`pipster` provides a series of functions to estimate poverty and inequality measures, based on the methodology of [Datt (1998)](http://ebrary.ifpri.org/utils/getfile/collection/p15738coll2/id/125673/filename/125704.pdf):
This vignette shows an overview of the `pipster` package functions for grouped data. Grouped data are consumption expenditure or income organized in intervals or bins, such as deciles or percentiles. In order to estimate poverty and inequality measures from grouped data, one has to derive a continuous Lorenz curve and use it together with mean welfare to build a full distribution. `pipster` provides a series of functions to estimate poverty and inequality measures, based on the methodology of [Datt (1998)](http://ebrary.ifpri.org/utils/getfile/collection/p15738coll2/id/125673/filename/125704.pdf):

- `pipgd_pov_headcount()` (FGT0)

- `pipgd_pov_gap()` (FGT1)

- `pipgd_pov_severity()` (FGT2)

- `pipgd_gini()`

- `pipgd_mld()`

- `pipgd_watts()`

It also provides a series of functions to calculate distributional measures and to select and validate the best Lorenz curve for subsequent estimation:

- `pipgd_welfare_share_at()`
Expand All @@ -42,22 +47,26 @@ It also provides a series of functions to calculate distributional measures and
- `pipgd_select_lorenz()`

## Sample Grouped Data

In this vignette, we will explore several typical scenarios in which the pipster package can be effectively utilized. In each of these scenario, we will use a sample dataset, `pip_gd`, available with the package and obtained from [Datt (1998)](http://ebrary.ifpri.org/utils/getfile/collection/p15738coll2/id/125673/filename/125704.pdf). The dataset shows the distribution of consumption expenditure in rural India in 1983. The variables are the following:

* **W**: Weights, share of population, sum up to 100.
* **X**: Welfare vector with mean welfare by group.
* **P**: Cumulative share of population.
* **L**: Cumulative share of welfare.
* **R**: Share of welfare, sum up to 1.
- **W**: Weights, share of population, sum up to 100.
- **X**: Welfare vector with mean welfare by group.
- **P**: Cumulative share of population.
- **L**: Cumulative share of welfare.
- **R**: Share of welfare, sum up to 1.

```{r data, echo=FALSE}
pip_gd |>
print()
```

## Case 1: Simple Welfare Analysis and Lorenz Curve

### 1.1 Welfare share at a given population share

One simple use case is calculating the welfare share of a specific share of the population, which can be achieved using `pipgd_welfare_share_at()`:

```{r popshare}
# Calculate the welfare share at a given population share
selected_popshare <- 0.5
Expand All @@ -68,6 +77,7 @@ welfare_share_50 <- pipgd_welfare_share_at(welfare = pip_gd$L,
```

When `complete = FALSE`, the output is a list. The results can be accessed like so:

```{r popshare-results}
# Format the string with the given values
formatted_message <- sprintf("%.0f%% of the population owns %.0f%% of welfare.",
Expand All @@ -76,7 +86,9 @@ formatted_message <- sprintf("%.0f%% of the population owns %.0f%% of welfare.",
print(formatted_message)
```

### 1.2 Quantile share vs cumulative share

`pipster` has a selection of functions to calculate welfare shares. When `n` is declared, `pipgd_quantile_welfare_share()` will calculate the share of welfare owned by a specific share of the population, while `pipgd_welfare_share_at()` will return the cumulative share:

```{r quantile-vs-cumulative}
Expand All @@ -100,12 +112,14 @@ df_combined <- data.frame(
print(df_combined)
```

### 1.3 Estimate and Plot the Lorenz Curve

`pister` can also be used to estimate a Lorenz curve for a dataset of grouped data. One hypothetical workflow:

1. First, generate the parameters using `pipgd_params()`
2. Validate the parameters using `pipgd_validate_lorenz()`
3. Generate the Lorenz curve using the validated parameters with `pipgd_lorenz_curve()`
1. First, generate the parameters using `pipgd_params()`
2. Validate the parameters using `pipgd_validate_lorenz()`
3. Generate the Lorenz curve using the validated parameters with `pipgd_lorenz_curve()`

```{r lorenz-validate}
# Validate Lorenz curve.
Expand All @@ -126,7 +140,6 @@ formatted_message <- sprintf("%s used for distribution statistics and %s used fo
print(formatted_message)
```


```{r lorenz-plot}
# Plot the Lorenz Curve
lorenz_curve_data <- pipgd_lorenz_curve(params = validated_lorenz)
Expand All @@ -143,45 +156,131 @@ plot(lorenz_curve_data$lorenz_curve$points,
abline(0, 1, col = 'red', lty = 2)
```

## Case 2: Poverty Profiling Manual vs Pipster

`pipster` allows the user to estimate poverty measures quickly and accurately using the Lorenz curve. To demonstrate its use, we can manually calculate FGT(0), FGT(1), and FGT(2), and then replicate it using only `pipster` functions.

### 2.1 Manual parameters

Following Datt(1998), we first derive the necessary parameters from the Lorenz curve using `pipgd_lorenz_curve()`:

```{r manual-data}
# STEP 0 : assign variables
cum_welfare <- pip_gd$L
cum_pop <- pip_gd$P
# STEP 1: Estimate Lorenz Curve
lorenz_curve_params <- pipgd_lorenz_curve(welfare = cum_welfare,
weight = cum_pop,
complete = TRUE)
print(lorenz_curve_params$selected_lorenz$for_pov)
```

`pipster` suggests to use `lb`, the Lorenz beta, for poverty measures estimation. We will use `lq` instead to compare our results with the ones reported in the article. We then retrieve the parameters and assign them to objects:

```{r parameters}
# parameters
m <- lorenz_curve_params$gd_params$lq$key_values$m
n <- lorenz_curve_params$gd_params$lq$key_values$n
r <- lorenz_curve_params$gd_params$lq$key_values$r
s1 <- lorenz_curve_params$gd_params$lq$key_values$s1
s2 <- lorenz_curve_params$gd_params$lq$key_values$s2
a <- lorenz_curve_params$gd_params$lq$reg_results$coef[[1]]
b <- lorenz_curve_params$gd_params$lq$reg_results$coef[[2]]
c <- lorenz_curve_params$gd_params$lq$reg_results$coef[[3]]
z <- 89 # the poverty line for rural India, 1983.
mu <- 109.9 # the actual mean of the sample.
# helpful combinations
z_div_mu <- z/mu
mu_div_z <- mu/z
```

## Case 2: Poverty Profiling
### 2.1 Poverty Headcount
First, we can apply the `pipgd_pov_headcount()` function to determine the proportion of the population living below a specified poverty line. According to Datt(1998), the
rural poverty line for India in 1983 is Rs. 89:
In `pipster`, we can apply the `pipgd_pov_headcount()` function to determine the proportion of the population living below a specified poverty line. The poverty headcount can be calculated manually as follows:

```{r headcount}
poverty_line <- 89
$$H=-\frac{1}{2 m}\left[n+r(b+2 (z / \mu))\left\{(b+2 (z / \mu))^2-m\right\}^{-1 / 2}\right]$$
Manually:
```{r headcount-manual}
H <- -(1/(2*m)) * (n + r*(b + 2*(z_div_mu)) * ((b + 2*z_div_mu)^2 - m)^(-1/2))
print(paste0("The poverty headcount is ", round(H*100,2), "%"))
```

Using `pipster`, we simply do:
```{r headcount-pipster}
headcount1 <- pipgd_pov_headcount(welfare = pip_gd$L,
weight = pip_gd$P,
mean = 109.9,
povline = poverty_line)
print(headcount1)
mean = mu,
povline = z,
lorenz = 'lq')
print((paste0("The poverty headcount is ", round(headcount1$headcount*100,2), "%")))
```

However, one might want to calculate the poverty line using `povertyline = mean * times_mean` instead. When defining these parameters, it is important not to define a poverty line as well,
otherwise the parameter `times_mean` will be ignored:
One might want to calculate the poverty line using `povertyline = mean * times_mean` instead. When defining these parameters, it is important not to define a poverty line as well, otherwise the parameter `times_mean` will be ignored:

```{r headcount-times}
headcount2 <- pipgd_pov_headcount(welfare = pip_gd$L,
weight = pip_gd$P,
mean = 109.9,
times_mean = 0.8)
mean = mu,
times_mean = 0.8,
lorenz = 'lq')
print(headcount2)
```

### 2.2 Poverty Gap
Next, we use the `pipgd_pov_gap()` function to calculate the poverty gap index. This index measures the average shortfall of the population from the poverty line, expressed as a percentage of the poverty line.
```{r gap}
poverty_line <- 89
Next, we use the `pipgd_pov_gap()` function to calculate the poverty gap index. This index measures the average shortfall of the population from the poverty line, expressed as a percentage of the poverty line. It can be calculated as follows:

$$PG = H - (\mu / z) L(H)$$
Manually:
```{r gap-manual}
# First we calculate the value of the Lorenz curve at H:
L_at_H <- pipgd_welfare_share_at(welfare = cum_welfare,
weight = cum_pop,
popshare = H)$dist_stats$welfare_share_at
# Then we calculate the poverty gap:
PG = H - mu_div_z*L_at_H
print(paste0("The poverty gap is ", round(PG*100,2), "%"))
```

Using `pipster`, we simply do:
```{r gap-pipster}
gap <- pipgd_pov_gap(welfare = pip_gd$L,
weight = pip_gd$P,
mean = 109.9,
povline = 89)
mean = mu,
povline = z,
lorenz = 'lq')
print(gap)
print((paste0("The poverty gap is ", round(gap$pov_gap*100,2), "%")))
```

### 2.3 Poverty Severity
Finally, we utilize the `pipgd_pov_severity()` function to assess the poverty severity index. This index considers the squared poverty gap, placing more weight on the welfare of the poorest. It can be calculated as follows:

$$\begin{aligned}
& P_2=2(P G)-H \\
& -\left(\frac{\mu}{z}\right)^2\left[a H+b L(H)-\left(\frac{r}{16}\right) \ln \left(\frac{1-H / s_1}{1-H / s_2}\right)\right]
\end{aligned}$$

```{r severity-manual}
SPG = 2*PG - H - ((mu_div_z)^2) * (a*H + b*L_at_H - (r/16) * log((1-(H/s1))/(1-(H/s2))))
print(paste0("The poverty severity is ", round(SPG*100,2), "%"))
```

Using `pipster`, we simply do:
```{r severity}
severity <- pipgd_pov_severity(welfare = pip_gd$L,
weight = pip_gd$P,
mean = mu,
povline = z,
lorenz = 'lq')
print((paste0("The poverty severity is ", round(severity$pov_severity*100,2), "%")))
```

## Case 3:
## Case 3: Inequality Analysis

0 comments on commit 1737d5e

Please sign in to comment.