|
| 1 | +\documentclass[a4paper,10pt, notitlepage]{report} |
| 2 | +\usepackage{geometry} |
| 3 | +\geometry{verbose,tmargin=30mm,bmargin=25mm,lmargin=25mm,rmargin=25mm} |
| 4 | +\usepackage[utf8]{inputenc} |
| 5 | +\usepackage[sectionbib]{natbib} |
| 6 | +\usepackage{amssymb} |
| 7 | +\usepackage{amsmath} |
| 8 | +\usepackage{enumitem} |
| 9 | +\usepackage{xcolor} |
| 10 | +\usepackage{cancel} |
| 11 | +\usepackage{mathtools} |
| 12 | +\usepackage{caption} |
| 13 | +\usepackage{subcaption} |
| 14 | +\usepackage{float} |
| 15 | +\PassOptionsToPackage{hyphens}{url}\usepackage{hyperref} |
| 16 | +\hypersetup{colorlinks=true,citecolor=blue} |
| 17 | + |
| 18 | + |
| 19 | +\newtheorem{thm}{Theorem} |
| 20 | +\newtheorem{lemma}[thm]{Lemma} |
| 21 | +\newtheorem{proposition}[thm]{Proposition} |
| 22 | +\newtheorem{remark}[thm]{Remark} |
| 23 | +\newtheorem{defn}[thm]{Definition} |
| 24 | + |
| 25 | +%%%%%%%%%%%%%%%%%%%% Notation stuff |
| 26 | +\newcommand{\pr}{\operatorname{Pr}} %% probability |
| 27 | +\newcommand{\vr}{\operatorname{Var}} %% variance |
| 28 | +\newcommand{\rs}{X_1, X_2, \ldots, X_n} %% random sample |
| 29 | +\newcommand{\irs}{X_1, X_2, \ldots} %% infinite random sample |
| 30 | +\newcommand{\rsd}{x_1, x_2, \ldots, x_n} %% random sample, realised |
| 31 | +\newcommand{\bX}{\boldsymbol{X}} %% random sample, contracted form (bold) |
| 32 | +\newcommand{\bx}{\boldsymbol{x}} %% random sample, realised, contracted form (bold) |
| 33 | +\newcommand{\bT}{\boldsymbol{T}} %% Statistic, vector form (bold) |
| 34 | +\newcommand{\bt}{\boldsymbol{t}} %% Statistic, realised, vector form (bold) |
| 35 | +\newcommand{\emv}{\hat{\theta}} |
| 36 | +\DeclarePairedDelimiter\ceil{\lceil}{\rceil} |
| 37 | +\DeclarePairedDelimiter\floor{\lfloor}{\rfloor} |
| 38 | + |
| 39 | +% Title Page |
| 40 | +\title{Exam 2 (A2)} |
| 41 | +\author{Class: Bayesian Statistics \\ Instructor: Luiz Max Carvalho} |
| 42 | +\date{02/06/2021} |
| 43 | + |
| 44 | +\begin{document} |
| 45 | +\maketitle |
| 46 | + |
| 47 | +\textbf{Turn in date: until 16/06/2021 at 23:59h Brasilia Time.} |
| 48 | + |
| 49 | +\begin{center} |
| 50 | +\fbox{\fbox{\parbox{1.0\textwidth}{\textsf{ |
| 51 | + \begin{itemize} |
| 52 | + \item Please read through the whole exam before starting to answer; |
| 53 | + \item State and prove all non-trivial mathematical results necessary to substantiate your arguments; |
| 54 | + \item Do not forget to add appropriate scholarly references~\textit{at the end} of the document; |
| 55 | + \item Mathematical expressions also receive punctuation; |
| 56 | + \item You can write your answer to a question as a point-by-point response or in ``essay'' form, your call; |
| 57 | + \item Please hand in a single, \textbf{typeset} ( \LaTeX) PDF file as your final main document. |
| 58 | + Code appendices are welcome,~\textit{in addition} to the main PDF document. |
| 59 | + \item You may consult any sources, provided you cite \textbf{ALL} of your sources (books, papers, blog posts, videos); |
| 60 | + \item You may use symbolic algebra programs such as Sympy or Wolfram Alpha to help you get through the hairier calculations, provided you cite the tools you have used. |
| 61 | + \item The exam is worth 100 %$\min\left\{\text{your\:score}, 100\right\}$ |
| 62 | + marks. |
| 63 | + \end{itemize}} |
| 64 | +}}} |
| 65 | +\end{center} |
| 66 | +% \newpage |
| 67 | +% \section*{Hints} |
| 68 | +% \begin{itemize} |
| 69 | +% \item a |
| 70 | +% \item b |
| 71 | +% \end{itemize} |
| 72 | +% |
| 73 | +\newpage |
| 74 | + |
| 75 | +\section*{Background} |
| 76 | + |
| 77 | +This exam covers applications, namely estimation, prior sensitivity and prediction. |
| 78 | +You will need a working knowledge of basic computing tools, and knowledge of MCMC is highly valuable. |
| 79 | +Chapter 6 in \cite{Robert2007} gives an overview of computational techniques for Bayesian statistics. |
| 80 | + |
| 81 | +\section*{Inferring population sizes -- theory} |
| 82 | + |
| 83 | +Consider the model |
| 84 | +\begin{equation*} |
| 85 | + x_i \sim \operatorname{Binomial}(N, \theta), |
| 86 | +\end{equation*} |
| 87 | +with \textbf{both} $N$ and $\theta$ unknown and suppose one observes $\boldsymbol{x} = \{x_1, x_2, \ldots, x_K\}$. |
| 88 | +Here, we will write $\xi = (N, \theta)$. |
| 89 | + |
| 90 | +\begin{enumerate}[label=\alph*)] |
| 91 | + \item (10 marks) Formulate a hierarchical prior ($\pi_1$) for $N$, i.e., elicit $F$ such that $N \mid \alpha \sim F(\alpha)$ and $\alpha \sim \Pi_A$. |
| 92 | + Justify your choice; |
| 93 | + \item (5 marks) Using the prior from the previous item, write out the full joint posterior kernel for all unknown quantities in the model, $p(\xi \mid \boldsymbol{x})$. \textit{Hint:} do not forget to include the appropriate indicator functions!; |
| 94 | + \item (5 marks) Is your model identifiable? |
| 95 | + \item (5 marks) Exhibit the marginal posterior density for $N$, $p_1(N \mid \boldsymbol{x})$; |
| 96 | + \item (5 marks) Return to point (a) above and consider an alternative, uninformative prior structure for $\xi$, $\pi_2$. |
| 97 | + Then, derive $p_2(N \mid \boldsymbol{x})$; |
| 98 | + \item (10 marks) Formulate a third prior structure on $\xi$, $\pi_3$, that allows for the closed-form marginalisation over the hyperparameters $\alpha$ -- see (a) -- and write out $p_3(N \mid \boldsymbol{x})$; |
| 99 | + \item (10 marks) Show whether each of the marginal posteriors considered is proper. |
| 100 | + Then, derive the posterior predictive distribution, $g_i(\tilde{x} \mid \boldsymbol{x})$, for each of the posteriors considered ($i = 1, 2, 3$). |
| 101 | + \item (5 marks) Consider the loss function |
| 102 | + \begin{equation} |
| 103 | + \label{eq:relative_loss} |
| 104 | + L(\delta(\boldsymbol{x}), N) = \left(\frac{\delta(\boldsymbol{x})-N}{N} \right)^2. |
| 105 | + \end{equation} |
| 106 | + Derive the Bayes estimator under this loss. |
| 107 | +\end{enumerate} |
| 108 | + |
| 109 | +\section*{Inferring population sizes -- practice} |
| 110 | +Consider the problem of inferring the population sizes of major herbivores~\citep{Carroll1985}. |
| 111 | +In the first case, one is interested in estimating the number of impala (\textit{Aepyceros melampus}) herds in the Kruger National Park, in northeastern South Africa. |
| 112 | +In an initial survey collected the following numbers of herds: $\boldsymbol{x}_{\text{impala}} = \{15, 20, 21, 23, 26\}$. |
| 113 | +Another scientific question is the number of individual waterbuck (\textit{Kobus ellipsiprymnus}) in the same park. |
| 114 | +The observed numbers of waterbuck in separate sightings were $\boldsymbol{x}_{\text{waterbuck}} = \{53, 57, 66, 67, 72\}$ and may be regarded (for simplicity) as independent and identically distributed. |
| 115 | + |
| 116 | +\begin{figure}[H] |
| 117 | + \centering |
| 118 | + \begin{subfigure}[b]{0.45\textwidth} |
| 119 | + \centering |
| 120 | + \includegraphics[scale=0.75]{figures/impala.jpeg} |
| 121 | + \caption{Impala} |
| 122 | + \end{subfigure} |
| 123 | + \begin{subfigure}[b]{0.45\textwidth} |
| 124 | + \centering |
| 125 | + \includegraphics[scale=0.75]{figures/waterbuck.jpeg} |
| 126 | + \caption{Waterbuck} |
| 127 | + \end{subfigure} |
| 128 | + \caption{Two antelope species whose population sizes we want to estimate.} |
| 129 | + \label{fig:antelopes} |
| 130 | +\end{figure} |
| 131 | + |
| 132 | + |
| 133 | +\begin{enumerate}[label=\alph*)] |
| 134 | +\setcounter{enumi}{8} |
| 135 | + \item (20 marks) For each data set, sketch the marginal posterior distributions $p_1(N \mid \boldsymbol{x})$, $p_2(N \mid \boldsymbol{x})$ and $p_3(N \mid \boldsymbol{x})$. |
| 136 | + Moreover, under each posterior, provide (i) the Bayes estimator under quadratic loss and under the loss in (\ref{eq:relative_loss}) and (ii) a 95\% credibility interval for $N$. |
| 137 | + Discuss the differences and similarities between these distributions and estimates: do the prior modelling choices substantially impact the final inferences? If so, how? |
| 138 | + \item (25 marks) Let $\bar{x} = K^{-1}\sum_{k =1}^K x_k$ and $s^2 = K^{-1}\sum_{k =1}^K (x_k-\bar{x})^2$. |
| 139 | + For this problem, a sample is said to be \textit{stable} if $\bar{x}/s^2 \geq (\sqrt{2} + 1)/\sqrt{2}$ and \textit{unstable} otherwise. |
| 140 | + Devise a simple method of moments estimator (MME) for $N$. |
| 141 | + Then, using a Monte Carlo simulation, compare the MME to the three Bayes estimators under quadratic loss (\ref{eq:relative_loss}) in terms of relative mean squared error. |
| 142 | + How do the Bayes estimators compare to MME in terms of the statibility of the generated samples? |
| 143 | + \textit{Hint}: You may want to follow the simulation setup of~\cite{Carroll1985}. |
| 144 | +\end{enumerate} |
| 145 | + |
| 146 | +\bibliographystyle{apalike} |
| 147 | +\bibliography{a2} |
| 148 | +\end{document} |
0 commit comments