- 
                Notifications
    You must be signed in to change notification settings 
- Fork 64
Description
First, thanks very much for this package. It gets me a lot closer to being able to do everything in Python! However, I've noticed an enormous speed difference between pyfixest and the Stata (+ Julia) function reghdfejl. Specifically, using the exact same data and running the same regression with two sets of fixed effects, pyfixest takes 3 - 6 hours (depending on what I set the tolerance to), while reghdfejl takes something like 3 minutes. That's quite a difference! The results generated are very similar in both cases.
I'm using pyfixest 0.30.2 with jax 0.7.2 and cuda 13. pyfixest is definitely using the GPU (Nvidia 3090Ti), which shows 100% utilization during the entire run, has something over 18GB RAM allocated, and heats up my office nicely. reghdfejl is a Stata routine that calls Julia to do the heavy processing, and again I've set it to use the GPU.
I've seen older discussions about the speed not being what it might be, but this difference seems quite extreme. Any thoughts on what might be going on? One possibly relevant thought: I have to increase the maximum number of iterations for the demeaning to finish, otherwise it gets to 100,000 and stops with an error. That seems like quite a lot of iterations.
Thanks for any suggestions.