Faster null fit #102

ailurophilia · 2024-12-09T00:14:26Z

Adds fit_null_repar, a couple helper functions, and a test to confirm this finds the same null fit as the augmented lagrangian approach.

Instead of using a constrained optimization method, fit_null_repar treats B_k_constr_j_constr as a function of the remaining elements of B_k_constr and solves the resulting (likelihood) optimization problem by coordinate descent. This is similar to but not the same as the coordinate descent algorithm used in full model fitting. The main differences are the following

since B_k_constr_j_constr depends on every other element of B_k_constr, we loop through j such that j is not the convenience constraint (j_ref) or j_constr and then (conditional on z) take a fisher step in both j and j_constr
we use optim to adjust the overall position of each row of B (relative to B_j_ref, which is forced to equal zero) in between loops through j. That is, we fix B at its current value and solve an optimization in epsilon_1, ..., epsilon_p, where epsilon_k is added to every element of B[k,] except B[k,j_ref] (and B[k_constr,j_constr] if k = k_constr). For B[k_constr,], epsilon_k_constr is added to all elements except j_constr and j_ref; B_k_constr_j_constr is then updated based on the new values of all other elements of B_[k_constr,]. This optimization is performed on the likelihood with z profiled out (since z is very informative wrt overall location of B). This helps us avoid situations where coordinate descent is slow because we have to take a bunch of small steps in B[,c(j,j_constr)] instead of one big.

In my experience so far, this outperforms the augmented lagrangian approach both in stability (I have yet to see it fail) and in scalability (I'm able to fit nulls in J = 1000 categories in a matter of 10 or 20 minutes even when B[k_constr,j_constr] has a large value under the full model fit.

However,

as currently coded, this function assumes that the constraint function is permutation-invariant. This will be true of any reasonable measure of central tendency (e.g., mean, median, pseudo-Huber center, etc.), but it is not true when we want to constrain one element of B[k_constr,] to equal another under the null. I'm noting this for posterity, but I also don't think it's a big problem, because we can fit this type of null model via the full model coordinate descent algorithm if we just set B[k_constr,j_constr] to zero at the beginning of optimization and then skip updating it.
also as currently coded, this fit_null_repar requires j_ref = J (because I didn't want to deal with indexing problems while I wrote it for the first time -- this requirement can be removed easily enough) and also that j_ref != j_constr. This second requirement is fundamental to the approach -- constraining our model to the null via reparametrization does not work if j_ref = j_constr. Practically, however, this should not pose a large problem since j_ref is arbitrary, so we can just choose a j_ref not equal to j_constr, but it's worth noting that we need to enforce this.
I have written fit_null_repar as a replacement for / alternative to fit_null, and I have added a test comparing these two algorithms on a toy problem, but I have not integrated fit_null_repar into score_test, which will be the next step needed to incorporate this algorithm into radEmu.

…ugh pkgdown.yaml workflow.

… so that the pkgdown.yaml workflow actually uses the logo for the github.io page.

…d by pkgdown.

Create github.io webpage using pkgdown

…tor vector or the name of a variable in `data`, adding tests to make sure the argument works for all of these cases

…ata` (this code was confusing and not working in tests called by github actions, I think a better way to implement this if desired would be through the formula object)

Cluster

remove old code defining unused variable `cluster_name`

…LSE`

update `emuFit()` so that it won't run into error with `penalize = FA…

…nder null hypotheses (if score tests are fit) - used for debugging

…d vignette are built. (update this again once PRs with additional vignettes are merged)

Parallel vignette

…formula and data arguments)

…o matrices

…cal flag match_row_names to the parameters. Started test-that file for this row name matching in the data frame.

…well.

…sts worked with the added message that warns when rownames are missing.

…k from Amy

This addresses feature request/issue statdivlab#88

…g process, but we may remove p-value check soon.

add option to return score test components

Matching row names

Updating local files with upstream files.

add plotting demo to radEmu vignettes

Make simulate_data() for internal use and apply in the testthat files

Maria Valdez Cabrera and others added 30 commits February 24, 2024 17:14

Adding necessary documents to making github.io page on my branch thro…

2ef285c

…ugh pkgdown.yaml workflow.

Including the pkgdown/favicon folder generated from the Package logo,…

01e3f47

… so that the pkgdown.yaml workflow actually uses the logo for the github.io page.

Small change to README.md to connect logo in it to the logo being use…

c76b365

…d by pkgdown.

Changing URL from my account to StatDivLab github.io

3937f16

Merge pull request statdivlab#37 from MariaAVC/main

c0abc7c

Create github.io webpage using pkgdown

Merge branch 'ailurophilia-main' into main

b07822d

add test of use case; index version

6b9c6c7

updating cluster argument to be either a numeric, character, or fac…

81264be

…tor vector or the name of a variable in `data`, adding tests to make sure the argument works for all of these cases

start vignette for clustered data

e4ac3f9

first draft of cluster vignette

32d610b

remove ability of cluster to be input as the name of a variable in `d…

cf850c4

…ata` (this code was confusing and not working in tests called by github actions, I think a better way to implement this if desired would be through the formula object)

Merge pull request statdivlab#1 from svteichman/cluster

8bf29b8

Cluster

remove old code defining unused variable cluster_name

9331d3e

Merge pull request statdivlab#2 from svteichman/cluster

3990e3c

remove old code defining unused variable `cluster_name`

add parallel vignette

2ef0c36

Adding parallel to suggests because it is used in a vignette

215b4ce

add a check in vignette to check if we're on a windows machine

7522b31

add comma to description

e4130a2

update vignette to only use results of mclapply if it is run

03d8b21

fix indentation in vignette

06e12eb

update emuFit() so that it won't run into error with `penalize = FA…

e73ad74

…LSE`

Merge pull request statdivlab#3 from svteichman/update_print

97788b7

update `emuFit()` so that it won't run into error with `penalize = FA…

add option to emuFit() to return_nullB, which is to return MLEs u…

7b46b9b

…nder null hypotheses (if score tests are fit) - used for debugging

add link in our README to github.io website in which documentation an…

e050e2b

…d vignette are built. (update this again once PRs with additional vignettes are merged)

minor edits; ready to merge

7793922

one more minor edit; ready to merge

73e18f0

Merge pull request statdivlab#42 from svteichman/parallel-vignette

26dc335

Parallel vignette

Merge branch 'statdivlab:main' into main

f35152a

make emuFit run with intercept only model

e9c77b2

add in ability to run intercept only model with X argument (not just …

31c9cfa

…formula and data arguments)

gthopkins and others added 29 commits October 27, 2024 16:20

Adjust MPLE implementation to differ notably from MLE

9e91259

Merge branch 'statdivlab:main' into main

c3be909

Add linguistic note discussed in PR statdivlab#95

595e1cb

First implementation of the process to match row names between the tw…

9bbfe54

…o matrices

Some modifications in the row matching implementation. Added the logi…

6ea0cde

…cal flag match_row_names to the parameters. Started test-that file for this row name matching in the data frame.

Matching row names process finished and test for the process done as …

138fa9a

…well.

Added row names to matrix X and Y in test-emuFit, to make sure all te…

17371fc

…sts worked with the added message that warns when rownames are missing.

Running devtools:document() and devtools:check() to update documentation

8971b96

Modifying messages on Row name matching process after helpful feedbac…

a7c3aeb

…k from Amy

Add partially_verbose option

8acdfda

Add documentation to include partially_verbose argument in emuFit

ff9164e

This addresses feature request/issue statdivlab#88

For now, hard-code old data. Can alternatively use old data generatin…

cde1f1c

…g process, but we may remove p-value check soon.

add option to return score test components

116178a

try to fix issue with mac-os and github actions related to homebrew bug

210f17f

try updating checkout to v4

aa31895

returning .yaml files to previous states as fixes didn't help

4acedce

Merge pull request statdivlab#100 from svteichman/return_score_stat_info

a7de668

add option to return score test components

Merge pull request statdivlab#96 from MariaAVC/Comparing_rowNames

49b50dc

Matching row names

add plot.emuFit function to vignettes

08652d2

Merge branch 'main' of github.com:shirmath/radEmu

cc2ad1e

Updating local files with upstream files.

fix plotting code

ab7f4c1

fix plot sizes, and extract only plots from plotting function object

94c4731

Merge branch 'main' into main

64b06a5

Merge pull request statdivlab#101 from shirmath/main

9c536e8

add plotting demo to radEmu vignettes

Update verbose argument to display only relevant messages

b1b78c2

Merge branch 'statdivlab:main' into main

e77fa27

Merge pull request statdivlab#92 from gthopkins/main

846f792

Make simulate_data() for internal use and apply in the testthat files

add fit_null_repar, helper functions, and a test

88f0d6c

add tests comparing fit_null and fit_null_repar

b1a26e3

adw96 marked this pull request as draft December 10, 2024 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster null fit #102

Faster null fit #102

ailurophilia commented Dec 9, 2024

Faster null fit #102

Are you sure you want to change the base?

Faster null fit #102

Conversation

ailurophilia commented Dec 9, 2024