Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dill to pickle #1194

Closed
wants to merge 1 commit into from
Closed

Replace dill to pickle #1194

wants to merge 1 commit into from

Conversation

StpMax
Copy link
Contributor

@StpMax StpMax commented Sep 26, 2023

I noticed that pickle works much faster than dill when dump model.
Results for model dump (regular home_rentals):

def x():
    with open(file_path, "wb") as fp:
        pickle.dump(self, fp, protocol=5)
timeit.timeit(x, number=100)
0.31827622700075153

def y():
    with open(file_path, "wb") as fp:
        dill.dump(self, fp)
timeit.timeit(y, number=100)
2.572783964998962

Model load is equal:

def x():
    with open(state_file, 'rb') as fp:
        predictor = pickle.load(fp)
timeit.timeit(x, number=100)
0.26572044099884806
        
def y():
    with open(state_file, 'rb') as fp:
        predictor = dill.load(fp)
timeit.timeit(y, number=100)
0.26366778900046484

What is the reasons to use dill? If there is no reason, then @paxcema can you please confirm results on your pc, and if it also faster, then may be change dill to pickle?

@StpMax StpMax added enhancement New feature or request discussion Further discussion is required labels Sep 26, 2023
@StpMax StpMax requested a review from paxcema September 26, 2023 13:53
@paxcema
Copy link
Contributor

paxcema commented Sep 26, 2023

Interesting benchmarks @StpMax, I think in general this is expected (source).

I do remember there was a reason we moved from pickle to dill at some point a couple years ago. I want to say it had to do with the confidence module, but I will double check. We now include this module within Lightwood, so there is a chance we can make pickle the new default. Alternatively, we can try the option recommended in the link above byref=True to benchmark against pickle as a middle ground.

@paxcema
Copy link
Contributor

paxcema commented Jan 22, 2024

@StpMax is this still relevant? If we really need the extra performance gains, we can look into merging this. Otherwise I would close it, as indeed some of the classes in our confidence module require dill (pickle breaks it).

@StpMax
Copy link
Contributor Author

StpMax commented Jan 23, 2024

@paxcema let close it for now

@StpMax StpMax closed this Jan 23, 2024
@hamishfagg hamishfagg deleted the model_load_test branch December 10, 2024 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Further discussion is required enhancement New feature or request
Projects
Status: closed
Development

Successfully merging this pull request may close these issues.

2 participants