This file is meant to basically check the sanity and correctness of the data and code used to work out nutrition from household expenditure data.
There are a few modules you may need to install:
!pip install -r requirements.txt
All the data required for estimation and calculation can be found in a google spreadsheet. Choose one to explore!
COUNTRY = 'Nigeria'
URL = {'Uganda':{'Expenditures':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','Expenditures (2019-20)'),
'Prices':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','Prices'),
'HH Characteristics':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','HH Characteristics'),
'FCT':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','FCT'),
'RDI':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','RDI')},
'Tanzania':'https://docs.google.com/spreadsheets/d/1Tknc2F8K6SaA7j0R7J_NE8yJBTsCPPw77_Bfc04MY40/',
'ICRISAT':'https://docs.google.com/spreadsheets/d/13Ig5hZif-NSHtgkKRp_cEgKXk0lOsdUB2BAD6O_FnRo',
'Nigeria':{'Expenditures':'https://docs.google.com/spreadsheets/d/1qzz6XGhPWLZLmwjrUY4W9k9U2PYukWblQKXWu0l37C4/',
'FCT':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/',
'RDI':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/',
'HH Characteristics':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/',
'Prices':'https://docs.google.com/spreadsheets/d/1whE_EW5x-jxrsKvYWfefdBppzp_TZhPP61bdEN-FEJ4/'}
}
DAYS_PER_PERIOD = {'Uganda':7,'Tanzania':7,'ICRISAT':365.25,'Nigeria':7} # Number of days of expenditure recall
You may already have access to an estimated demand system. If so, use it! Then you can skip down to “Plotting Food Demands”. Otherwise, we pull in data on expenditures, household characteristics, and price:
import cfe
import pandas as pd
import numpy as np
from eep153_tools.sheets import read_sheets
x = read_sheets(URL[COUNTRY],sheet='Expenditures')
if len(x.columns==5): # stored as a series
x = x.set_index(['j','t','m','i']).squeeze()
else:
x = x.set_index(['j','t','m'])
x.columns.name = 'i'
x = x.stack().dropna()
x.index.names = ['i','t','m','j']
x = x.replace(0,np.nan).dropna()
y = np.log(x)
z = read_sheets(URL[COUNTRY],sheet='HH Characteristics').set_index(['j','t','m'])
z.columns.name = 'k'
z.index.names=['i','t','m']
p = read_sheets(URL[COUNTRY],sheet='Prices').set_index(['t','m'])
p.columns.name = 'j'
Let’s take a look at the different periods that appear in the data. If you can’t estimate (the next step) because your kernel dies it may be that you should use a subset of the periods.
use_periods = p.index.levels[0].tolist()
# If you want to use just a subset of periods, redefine
# use_periods here; e.g.,
#
use_periods = ['2015Q3','2016Q1']
y = y.loc[y.index.get_level_values('t').isin(use_periods)]
z = z.loc[z.index.get_level_values('t').isin(use_periods)]
p = p.loc[p.index.get_level_values('t').isin(use_periods)]
use_periods
Next, we construct an object we can use in the estimation.
r = cfe.Regression(y=y,d=z)
Next, we estimate CFE demands given the prices and budgets of households in the data.
xhat = r.predicted_expenditures()
Estimation is kind of expensive, so you might want to save these results to use in your later code.
r.to_pickle("my %s.pickle" % COUNTRY)
Now we’re interested in predicting what quantities of different kinds of food would have been, if something (e.g., a price, budget, household characteristics) was different.
We begin by setting up some benchmarks for prices and budgets, so the things we don’t want to change we can hold fixed.
Choose reference prices. Here we’ll choose a particular year, and average prices across markets. If you wanted to focus on particular market you’d do this differently.
# Reference prices chosen from a particular time; average across place.
# These are prices per kilogram:
my_t = use_periods[0] # Choose from periods available in your dataset!
pbar = p.xs(my_t,level='t').mean()
pbar = pbar[r.beta.index] # Only use prices for goods we can estimate
my_j = 'Onions' # Choose a reference good for analysis; should satisfy
assert my_j in pbar.index, f"The label {my_j} does not match a good for which demands have been estimated."
Get food budget for all households, then find median budget:
import numpy as np
xhat = r.predicted_expenditures()
# Total food expenditures per household
xbar = xhat.groupby(['i','t','m']).sum()
# Reference budget
x0 = xbar.quantile(0.5) # Household at 0.5 quantile is median
f"Median income is {x0} in local currency."
Finally, define a function to change a single price in the vector
def my_prices(p0,p=pbar,j=my_j):
"""
Change price of jth good to p0, holding other prices fixed.
"""
p = p.copy()
p.loc[j] = p0
return p
import matplotlib.pyplot as plt
%matplotlib inline
# Values for prices
ref_price = pbar[my_j]
P = np.linspace(ref_price/5,ref_price*5,50)
for x in [x0*s for s in [.25,.5,1.,2,4]]:
plt.plot([r.demands(x,my_prices(p0))[my_j] for p0 in P],P)
plt.xlabel(my_j)
plt.ylabel(f'Price of {my_j}')
The nutrient value of food consumed by the household is just the product of its diet and a food conversion table. So let’s write a function that describes that product:
from eep153_tools.sheets import read_sheets
import warnings
# Get FCT:
fct = read_sheets(URL[COUNTRY],
sheet='FCT').set_index('i')
fct.columns.name='n'
def nutrient_demand(x,p):
with warnings.catch_warnings():
warnings.simplefilter("ignore")
c = r.demands(x,p)
fct0,c0 = fct.align(c,axis=0,join='inner')
N = fct0.T@c0
N = N.loc[~N.index.duplicated()]
return N
With this nutrient_demand
function in hand, we can see how nutrient
outcomes vary with budget, given prices:
import numpy as np
import pandas as pd
X = np.linspace(x0/5,x0*5,50)
# UseNutrients = ['Protein','Calories','Iron','Calcium']
UseNutrients = fct.columns.tolist()
plt.plot(X,[np.log(nutrient_demand(x,pbar))[UseNutrients] for x in X])
plt.legend(UseNutrients)
plt.xlabel('Budget')
plt.ylabel('log nutrient')
Individuals have nutritional requirements established by nutrition scientists. Here we grab one such set of requirements:
rdi = read_sheets(URL[COUNTRY],
sheet='RDI').set_index('n').replace(np.nan,0)
rdi.columns.name = 'k'
rdi = rdi.replace('',0)
rdi
Our data on demand and nutrients is at the household level; we can’t directly compare household level nutrition with individual level requirements. What we can do is add up minimum individual requirements, and see whether household total exceed these. This isn’t a guarantee that all individuals have adequate nutrition (since the way food is allocated in the household might be quite unequal, or unrelated to individual requirements), but it is necessary if all individuals are to have adequate nutrition.
For the average household, the number of different kinds of people can be computed by averaging over households:
# Find average household characteristics for reference period & place
zbar = r.d.mean()
Now, the inner/dot/matrix product between zbar
and the rda
DataFrame of requirements will give us minimum requirements for the
average household:
# This matrix product gives minimum nutrient requirements for average
# household in reference year & place
my_rdi,my_zbar = rdi.align(zbar.T,axis=1,join='inner')
hh_rdi = my_rdi@my_zbar.T
# But this is per *day*, while our data is per period:
hh_rdi = hh_rdi*DAYS_PER_PERIOD[COUNTRY]
hh_rdi
Since we can trace out demands for nutrients as a function of
def nutrient_adequacy_ratio(x,p):
return (nutrient_demand(x,p)/hh_rdi.T).squeeze()
In terms of normalized nutrients, any household with more than one unit of any given nutrient (or zero in logs) will be consuming a minimally adequate level of the nutrient; below this level there’s clearly nutritional inadequacy. For this reason the ratio of actual nutrients to required nutrients is termed the “nutrient adequacy ratio,” or NAR.
plt.plot(X,[np.log(nutrient_adequacy_ratio(x,pbar))[UseNutrients] for x in X])
plt.legend(UseNutrients)
plt.xlabel('Budget')
plt.ylabel('log nutrient adequacy ratio')
plt.axhline(0)
As before, we can also vary relative prices. Here we trace out nutritional adequacy varying the price of a single good:
poorer_x = x0/2
plt.plot([np.log(nutrient_adequacy_ratio(poorer_x,my_prices(p0,j=my_j)))[UseNutrients] for p0 in P],P)
plt.legend(UseNutrients)
plt.ylabel('Price')
plt.xlabel('log nutrient adequacy ratio')
plt.axvline(0)
plt.axhline(phat[my_j])