Skip to content

Latest commit

 

History

History
303 lines (241 loc) · 10 KB

proof_of_concept.org

File metadata and controls

303 lines (241 loc) · 10 KB

Proof of Concept

Preface

This file is meant to basically check the sanity and correctness of the data and code used to work out nutrition from household expenditure data.

There are a few modules you may need to install:

!pip install -r requirements.txt

All the data required for estimation and calculation can be found in a google spreadsheet. Choose one to explore!

COUNTRY = 'Nigeria'
URL = {'Uganda':{'Expenditures':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','Expenditures (2019-20)'),
                 'Prices':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','Prices'),
                 'HH Characteristics':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','HH Characteristics'),
                 'FCT':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','FCT'),
                 'RDI':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','RDI')},
       'Tanzania':'https://docs.google.com/spreadsheets/d/1Tknc2F8K6SaA7j0R7J_NE8yJBTsCPPw77_Bfc04MY40/',
       'ICRISAT':'https://docs.google.com/spreadsheets/d/13Ig5hZif-NSHtgkKRp_cEgKXk0lOsdUB2BAD6O_FnRo',
       'Nigeria':{'Expenditures':'https://docs.google.com/spreadsheets/d/1qzz6XGhPWLZLmwjrUY4W9k9U2PYukWblQKXWu0l37C4/',
                  'FCT':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/',
                  'RDI':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/',
                  'HH Characteristics':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/',
                  'Prices':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/'}
                  }
DAYS_PER_PERIOD = {'Uganda':7,'Tanzania':7,'ICRISAT':365.25,'Nigeria':7} # Number of days of expenditure recall

Estimation

You may already have access to an estimated demand system. If so, use it! Then you can skip down to “Plotting Food Demands”. Otherwise, we pull in data on expenditures, household characteristics, and price:

import cfe
import pandas as pd
import numpy as np
from eep153_tools.sheets import read_sheets

x = read_sheets(URL[COUNTRY],sheet='Expenditures')

if len(x.columns==5): # stored as a series
    x = x.set_index(['j','t','m','i']).squeeze()
else:
    x = x.set_index(['j','t','m'])
    x.columns.name = 'i'
    x = x.stack().dropna()
x.index.names = ['i','t','m','j']

x = x.replace(0,np.nan).dropna()

y = np.log(x)

z = read_sheets(URL[COUNTRY],sheet='HH Characteristics').set_index(['j','t','m'])
z.columns.name = 'k'
z.index.names=['i','t','m']

p = read_sheets(URL[COUNTRY],sheet='Prices').set_index(['t','m'])
p.columns.name = 'j'

Let’s take a look at the different periods that appear in the data. If you can’t estimate (the next step) because your kernel dies it may be that you should use a subset of the periods.

use_periods = p.index.levels[0].tolist()

# If you want to use just a subset of periods, redefine
# use_periods here; e.g.,
#
use_periods = ['2015Q3','2016Q1']
y = y.loc[y.index.get_level_values('t').isin(use_periods)]
z = z.loc[z.index.get_level_values('t').isin(use_periods)]
p = p.loc[p.index.get_level_values('t').isin(use_periods)]

use_periods

Next, we construct an object we can use in the estimation.

r = cfe.Regression(y=y,d=z)

Next, we estimate CFE demands given the prices and budgets of households in the data.

xhat = r.predicted_expenditures()

Estimation is kind of expensive, so you might want to save these results to use in your later code.

r.to_pickle("my %s.pickle" % COUNTRY)

Plotting Food Demands

Now we’re interested in predicting what quantities of different kinds of food would have been, if something (e.g., a price, budget, household characteristics) was different.

Ceteris Paribus

We begin by setting up some benchmarks for prices and budgets, so the things we don’t want to change we can hold fixed.

Reference Prices

Choose reference prices. Here we’ll choose a particular year, and average prices across markets. If you wanted to focus on particular market you’d do this differently.

# Reference prices chosen from a particular time; average across place.
# These are prices per kilogram:
my_t = use_periods[0]  # Choose from periods available in your dataset!

pbar = p.xs(my_t,level='t').mean()
pbar = pbar[r.beta.index] # Only use prices for goods we can estimate

my_j = 'Onions'  # Choose a reference good for analysis; should satisfy

assert my_j in pbar.index, f"The label {my_j} does not match a good for which demands have been estimated."

Budgets

Get food budget for all households, then find median budget:

import numpy as np

xhat = r.predicted_expenditures()

# Total food expenditures per household
xbar = xhat.groupby(['i','t','m']).sum()

# Reference budget
x0 = xbar.quantile(0.5)  # Household at 0.5 quantile is median

f"Median income is {x0} in local currency."

Changing one price, holding others fixed

Finally, define a function to change a single price in the vector $p$:

def my_prices(p0,p=pbar,j=my_j):
    """
    Change price of jth good to p0, holding other prices fixed.
    """
    p = p.copy()
    p.loc[j] = p0
    return p

Plot demand schedule

import matplotlib.pyplot as plt
%matplotlib inline

# Values for prices
ref_price = pbar[my_j]
P = np.linspace(ref_price/5,ref_price*5,50)

for x in [x0*s for s in [.25,.5,1.,2,4]]:
    plt.plot([r.demands(x,my_prices(p0))[my_j] for p0 in P],P)

plt.xlabel(my_j)
plt.ylabel(f'Price of {my_j}')

Plotting Nutrient Demands

The nutrient value of food consumed by the household is just the product of its diet and a food conversion table. So let’s write a function that describes that product:

from eep153_tools.sheets import read_sheets
import warnings

# Get FCT:
fct = read_sheets(URL[COUNTRY],
                  sheet='FCT').set_index('i')
fct.columns.name='n'


def nutrient_demand(x,p):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        c = r.demands(x,p)

    fct0,c0 = fct.align(c,axis=0,join='inner')
    N = fct0.T@c0

    N = N.loc[~N.index.duplicated()]

    return N

With this nutrient_demand function in hand, we can see how nutrient outcomes vary with budget, given prices:

import numpy as np
import pandas as pd

X = np.linspace(x0/5,x0*5,50)

# UseNutrients = ['Protein','Calories','Iron','Calcium']
UseNutrients = fct.columns.tolist()

plt.plot(X,[np.log(nutrient_demand(x,pbar))[UseNutrients] for x in X])
plt.legend(UseNutrients)
plt.xlabel('Budget')
plt.ylabel('log nutrient')

Nutritional requirements

Individuals have nutritional requirements established by nutrition scientists. Here we grab one such set of requirements:

rdi = read_sheets(URL[COUNTRY],
                  sheet='RDI').set_index('n').replace(np.nan,0)
rdi.columns.name = 'k'

rdi = rdi.replace('',0)
rdi

Nutritional Needs of Households

Our data on demand and nutrients is at the household level; we can’t directly compare household level nutrition with individual level requirements. What we can do is add up minimum individual requirements, and see whether household total exceed these. This isn’t a guarantee that all individuals have adequate nutrition (since the way food is allocated in the household might be quite unequal, or unrelated to individual requirements), but it is necessary if all individuals are to have adequate nutrition.

For the average household, the number of different kinds of people can be computed by averaging over households:

# Find average household characteristics for reference period & place
zbar = r.d.mean()

Now, the inner/dot/matrix product between zbar and the rda DataFrame of requirements will give us minimum requirements for the average household:

# This matrix product gives minimum nutrient requirements for average
# household in reference year & place
my_rdi,my_zbar = rdi.align(zbar.T,axis=1,join='inner')
hh_rdi = my_rdi@my_zbar.T

# But this is per *day*, while our data is per period:
hh_rdi = hh_rdi*DAYS_PER_PERIOD[COUNTRY]
hh_rdi

Nutritional Adequacy of Food Demands

Since we can trace out demands for nutrients as a function of $(x,p)$, and we’ve computed minimum nutritional requirements for the average household, we can normalize nutritional intake to check the adequacy of diet.

def nutrient_adequacy_ratio(x,p):
    return (nutrient_demand(x,p)/hh_rdi.T).squeeze()

In terms of normalized nutrients, any household with more than one unit of any given nutrient (or zero in logs) will be consuming a minimally adequate level of the nutrient; below this level there’s clearly nutritional inadequacy. For this reason the ratio of actual nutrients to required nutrients is termed the “nutrient adequacy ratio,” or NAR.

plt.plot(X,[np.log(nutrient_adequacy_ratio(x,pbar))[UseNutrients] for x in X])
plt.legend(UseNutrients)
plt.xlabel('Budget')
plt.ylabel('log nutrient adequacy ratio')
plt.axhline(0)

As before, we can also vary relative prices. Here we trace out nutritional adequacy varying the price of a single good:

poorer_x = x0/2

plt.plot([np.log(nutrient_adequacy_ratio(poorer_x,my_prices(p0,j=my_j)))[UseNutrients] for p0 in P],P)
plt.legend(UseNutrients)
plt.ylabel('Price')
plt.xlabel('log nutrient adequacy ratio')
plt.axvline(0)
plt.axhline(phat[my_j])