-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about Finetuning LSF tasks #31
Comments
I believe it uses the same train/val/test split as the LSF setting. However, it doesn't perform normalization based on train set statistics, which is used in the LSF setting, so there is a mismatch between the fine-tuning and evaluation. If you want to fine-tune in a multivariate fashion, then yes, process it as a multivariate dataset, and also remove the SampleDimension transformation. |
Thanks for your reply. Following your suggestions, I normalized the data for fine-tuning, built the data in 'wide_multivariate' and removed the SampleDimension transformation. However, when I ran the experiment, an error occurred:
I think this error is caused by using dataset built in 'wide_multivariate' mode. How should I handle this issue? Do I need to modify this |
Hi @zqiao11, sorry for the late response. Have you managed to resolve this issue? If not, could you provide more details? |
Hi. I haven't resolved this issue, but I have tracked the reason. This issue can happen when a flatten patchfied sequence exceeds the For example, Etth1 has 7 variates. If I use a context_length=5000, prediction_length=96, and patch_size=64 (same config to reproduce LSF results), then there would be 81 patches for one variate. And after flattening the 7 variates, there are The assertion error is raised by the sequence packing function, which is only used in training and not used in forecasting. So that is why one can evaluate the model with BTW, is it safe to modify this |
FYI, you can reproduce this issue by running your example codes of finetuning. Just build the Etth1 dataset with |
@zqiao11 Have you resolved this issue? I'm experiencing the same situation as you. |
@zqiao11 Hello, I also finetuned the model in the ETTh1 and its performance decreased significantly. Have you solve this issue after normalizing the ETTH1. |
@wyhzunzun123123 Hi, I haven't solved this issue for ETTh1. Since the config for reproduction uses mode='M' in ETTh1, I think one may need to finetune it with dataset built in the multi-variate time series format. But I cannot handle the error caused by 'max_seq_len' and need to wait for the author's reply. You may consider to finetune the model with ETTm1 dataset, which evals in mode='S' (build the dataset in 'wide'). |
So sorry for the delayed response, for the
The idea is that we set a maximum number of tokens, |
Regarding a difference in performance with ETTh1, if you want to evaluate on the LSTF setting, you will have to perform a normalization on the train set statistics first. |
Thanks. Can you briefly explain the role of |
It subsamples the variates given the |
@zqiao11 Hi, have you solved the |
I face similar question. I finetune the pretrained model on ETTh1 but got worser result than original dataset. Has anybody solve this problem? |
Hi, I’m also researching the issue of fine-tuning, and maybe you could try lowering the default learning rate to 1e-5 (the default 1e-3 might be too high). If you encounter any issues, feel free to discuss further.
|
Thanks! I will have a try. |
I decrease the learning rate to 1e-5, it performs better than finetuning with lr=1e-3 but still performs worse than the pretrained model. How about you? Did you solve the issue by decreasing the learning rate? |
How do you set the hyper parameters (such as context_length, patch_size, variate_mode)? These params are important in my expriments. |
Does it need to set the the context_len and patch_size in the finetune module? I just set as default. |
Hi. I'm working on enhancing long sequence forecasting performance through finetuning. I have successfully replicated the zero-shot learning results shown in Table 22 and will use them as a baseline for comparison.
For a fair comparison, I need to do finetuning under the same train-val-test setup as the zero-shot experiments in Table 22. However, I am unsure if my approach is accurate. Below is a summary of my workflow to finetune Moirai-small on ETTh1 and evaluate it with prediction length of 96:
conf/finetune/val_data/etth1.yaml
asDespite following these steps, the finetuning results are underperforming compared to the zero-shot outcomes (MSE is 0.375 and MAE is 0.402 in the original results).
I have a few questions:
data.mode = M
during testing, do I need to build the dataset withwide_multivariate
for finetuning?Thank you for your assistance.
The text was updated successfully, but these errors were encountered: