Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Feb 13, 2025
1 parent 7655b6e commit f7395b4
Show file tree
Hide file tree
Showing 36 changed files with 385 additions and 365 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
78206498
b22e29e9
26 changes: 13 additions & 13 deletions Chapters/Case_study_model_comparison.html
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ <h2 data-number="8.1" class="anchored" data-anchor-id="information-criteria-for-
<section id="the-data" class="level2" data-number="8.2">
<h2 data-number="8.2" class="anchored" data-anchor-id="the-data"><span class="header-section-number">8.2</span> The data</h2>
<p>The data used to fit the models are the results of all matches from 2022-2023 and the budget of each team (for the 2nd model only). Our data therefore consists of two tables: one with one row per match, containing the home and away teams and the goals scored by each; another with one row per team, containing the team and its budget.</p>
<div id="6fa8ce08" class="cell" data-execution_count="2">
<div id="74d0d830" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1"></a><span class="im">import</span> arviz.data.datasets <span class="im">as</span> azd</span>
<span id="cb1-2"><a href="#cb1-2"></a></span>
<span id="cb1-3"><a href="#cb1-3"></a>azd.REMOTE_DATASETS.update({</span>
Expand Down Expand Up @@ -467,7 +467,7 @@ <h2 data-number="8.6" class="anchored" data-anchor-id="variable-and-index-glossa
<li>Field. The field identifier. Two teams play in each game, one being the home team, the other the away one. We use <span class="math inline">\(f\)</span> as the index indicating the field, which can take only two values <span class="math inline">\(h\)</span> or <span class="math inline">\(a\)</span>.</li>
<li>Arbitrary index. For theoretical concepts, we use <span class="math inline">\(i\)</span> to indicate an arbitrary index.</li>
</ul>
<div id="b8eedbbc" class="cell" data-execution_count="3">
<div id="d676b0e6" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="co"># load data</span></span>
<span id="cb2-2"><a href="#cb2-2"></a>base_idata <span class="op">=</span> az.load_arviz_data(<span class="st">"laliga_base"</span>)</span>
<span id="cb2-3"><a href="#cb2-3"></a>budget_idata <span class="op">=</span> az.load_arviz_data(<span class="st">"laliga_budget"</span>)</span>
Expand All @@ -487,7 +487,7 @@ <h2 data-number="8.7" class="anchored" data-anchor-id="information-criterion-cal
</ul>
<p>There are even more examples of predictive tasks where this particular model can be of use. However, it is important to keep in mind that this model predicts the number of goals scored. Its results can be used to estimate probabilities of victory and other derived quantities, but calculating the likelihood of these derived quantities may not be straightforward. And as you can see above, there isn’t <em>one</em> unique predictive task: it all depends on the specific question you’re interested in. As often in statistics, the answer to these questions lies <em>outside</em> the model, <em>you</em> must tell the model what to do, not the other way around.</p>
<p>Even though we know that the predictive task is ambiguous, we will start trying to calculate <code>az.loo</code> with <code>idata_base</code> and then work on the examples above and a couple more to show how would this kind of tasks be performed with ArviZ. But before that, let’s see what ArviZ says when you naively ask it for the LOO of a multi-likelihood model:</p>
<div id="e385c1b5" class="cell" data-execution_count="4">
<div id="8843bb84" class="cell" data-execution_count="4">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1"></a>az.loo(base_idata)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand All @@ -505,7 +505,7 @@ <h3 data-number="8.7.1" class="anchored" data-anchor-id="predicting-the-goals-sc
<p><span class="math display">\[ p(y_i|\theta) = p(y_{i,h}|\theta_{i,h}) = \text{Poiss}(y_{i,h}; \theta_{i,h}) \]</span></p>
<p>with <span class="math inline">\(i\)</span> being both the match indicator (<span class="math inline">\(m\)</span>, which varies with <span class="math inline">\(i\)</span>) and the field indicator (<span class="math inline">\(f\)</span>, here always fixed at <span class="math inline">\(h\)</span>). These are precisely the values stored in the <code>home_goals</code> of the <code>log_likelihood</code> group of <code>idata_base</code>.</p>
<p>We can tell ArviZ to use these values using the argument <code>var_name</code>.</p>
<div id="e54a7eeb" class="cell" data-execution_count="5">
<div id="20f039e7" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1"></a>az.loo(base_idata, var_name<span class="op">=</span><span class="st">"home_goals"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<pre><code>Computed from 8000 posterior samples and 380 observations log-likelihood matrix.
Expand All @@ -522,7 +522,7 @@ <h3 data-number="8.7.1" class="anchored" data-anchor-id="predicting-the-goals-sc
(1, Inf) (very bad) 0 0.0%</code></pre>
</div>
</div>
<div id="2297d7e7" class="cell" data-execution_count="6">
<div id="5c9cdee0" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1"></a>az.compare(model_dict, var_name<span class="op">=</span><span class="st">"home_goals"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<div>
Expand Down Expand Up @@ -588,7 +588,7 @@ <h3 data-number="8.7.1" class="anchored" data-anchor-id="predicting-the-goals-sc
</div>
</div>
</div>
<div id="64d7f7e1" class="cell" data-execution_count="7">
<div id="173bb4ad" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1"></a>az.compare(model_dict, var_name<span class="op">=</span><span class="st">"away_goals"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<div>
Expand Down Expand Up @@ -666,7 +666,7 @@ <h3 data-number="8.7.2" class="anchored" data-anchor-id="predicting-the-outcome-
\]</span></p>
<p>with <span class="math inline">\(i\)</span> being equal to the match indicator <span class="math inline">\(m\)</span>. Therefore, we have <span class="math inline">\(M\)</span> observations like in the previous example, but each observation has two components.</p>
<p>We can calculate the product as a sum of logarithms and store the result in a new variable inside the <code>log_likelihood</code> group.</p>
<div id="c3937c5c" class="cell" data-execution_count="8">
<div id="7f234f43" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1"></a><span class="kw">def</span> match_lik(idata):</span>
<span id="cb9-2"><a href="#cb9-2"></a> log_lik <span class="op">=</span> idata.log_likelihood</span>
<span id="cb9-3"><a href="#cb9-3"></a> log_lik[<span class="st">"matches"</span>] <span class="op">=</span> log_lik.home_goals <span class="op">+</span> log_lik.away_goals</span>
Expand All @@ -692,7 +692,7 @@ <h3 data-number="8.7.2" class="anchored" data-anchor-id="predicting-the-outcome-
(1, Inf) (very bad) 0 0.0%</code></pre>
</div>
</div>
<div id="fc3f4f26" class="cell" data-execution_count="9">
<div id="8fc5425c" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1"></a>az.compare(model_dict, var_name<span class="op">=</span><span class="st">"matches"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<div>
Expand Down Expand Up @@ -772,7 +772,7 @@ <h3 data-number="8.7.3" class="anchored" data-anchor-id="predicting-the-goals-sc
<p><span class="math display">\[\big\{(1,h), (2,h), \dots, (M-1,h), (M,h), (1,a), (2,a) \dots (M-1,a), (M,a)\big\}\]</span></p>
<p>Therefore, unlike in previous cases, we have <span class="math inline">\(2M\)</span> observations.</p>
<p>We can obtain the pointwise log likelihood corresponding to this case by concatenating the pointwise log likelihoods of <code>home_goals</code> and <code>away_goals</code>. Then, like in the previous case, store the result in a new variable inside the <code>log_likelihood</code> group.</p>
<div id="5e258a5d" class="cell" data-execution_count="10">
<div id="493fefb6" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1"></a><span class="kw">def</span> goals_lik(idata):</span>
<span id="cb12-2"><a href="#cb12-2"></a> log_lik <span class="op">=</span> idata.log_likelihood</span>
<span id="cb12-3"><a href="#cb12-3"></a> log_lik[<span class="st">"goals"</span>] <span class="op">=</span> xr.concat((log_lik.home_goals, log_lik.away_goals), <span class="st">"match"</span>).rename({<span class="st">"match"</span>: <span class="st">"goal"</span>})</span>
Expand All @@ -798,7 +798,7 @@ <h3 data-number="8.7.3" class="anchored" data-anchor-id="predicting-the-goals-sc
(1, Inf) (very bad) 0 0.0%</code></pre>
</div>
</div>
<div id="23bc38f3" class="cell" data-execution_count="11">
<div id="d6771688" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb14"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1"></a>az.compare(model_dict, var_name<span class="op">=</span><span class="st">"goals"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<div>
Expand Down Expand Up @@ -872,7 +872,7 @@ <h3 data-number="8.7.4" class="anchored" data-anchor-id="predicting-team-level-p
<p>In this situation, we could describe the cross validation as excluding a team. When we exclude a team, we will exclude all the matches played by the team, not only the goals scored by the team but the whole match. Here is the illustration:</p>
<p><img src="../img/cv_team.png" class="img-fluid"></p>
<p>In the first column, we are excluding “Levante U.D.” which in the rows shown only appears once. In the second one, we are excluding “Athletic Club” which appears two times. This goes on following the order of appearance in the away team column.</p>
<div id="20021d3e" class="cell" data-execution_count="12">
<div id="3d59df53" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb15"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1"></a><span class="kw">def</span> team_lik(idata):</span>
<span id="cb15-2"><a href="#cb15-2"></a> log_lik <span class="op">=</span> idata.log_likelihood</span>
<span id="cb15-3"><a href="#cb15-3"></a> const <span class="op">=</span> idata.constant_data</span>
Expand All @@ -889,7 +889,7 @@ <h3 data-number="8.7.4" class="anchored" data-anchor-id="predicting-team-level-p
<span id="cb15-14"><a href="#cb15-14"></a>budget_idata <span class="op">=</span> team_lik(budget_idata)</span>
<span id="cb15-15"><a href="#cb15-15"></a>nofield_idata <span class="op">=</span> team_lik(nofield_idata)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<div id="85621833" class="cell" data-execution_count="13">
<div id="5165aaeb" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb16"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1"></a>az.loo(base_idata, var_name<span class="op">=</span><span class="st">"teams_match"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/arviz/stats/stats.py:792: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
Expand All @@ -913,7 +913,7 @@ <h3 data-number="8.7.4" class="anchored" data-anchor-id="predicting-team-level-p
</div>
</div>
<p>TODO: it would probably be best to run reloo for the three models for this case and include that on figshare too.</p>
<div id="7477da78" class="cell" data-execution_count="14">
<div id="1ff58726" class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb19"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb19-1"><a href="#cb19-1"></a>az.compare(model_dict, var_name<span class="op">=</span><span class="st">"teams_match"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/arviz/stats/stats.py:792: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
Expand Down
Loading

0 comments on commit f7395b4

Please sign in to comment.