Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about Finetuning LSF tasks #31

Open
zqiao11 opened this issue Apr 22, 2024 · 19 comments
Open

Questions about Finetuning LSF tasks #31

zqiao11 opened this issue Apr 22, 2024 · 19 comments
Labels
good first issue Good for newcomers

Comments

@zqiao11
Copy link
Contributor

zqiao11 commented Apr 22, 2024

Hi. I'm working on enhancing long sequence forecasting performance through finetuning. I have successfully replicated the zero-shot learning results shown in Table 22 and will use them as a baseline for comparison.

For a fair comparison, I need to do finetuning under the same train-val-test setup as the zero-shot experiments in Table 22. However, I am unsure if my approach is accurate. Below is a summary of my workflow to finetune Moirai-small on ETTh1 and evaluate it with prediction length of 96:

  1. Following the lsf setup, I split the training data with the same offset as here:
python -m uni2ts.data.builder.simple ETTh1 dataset/ETT-small/ETTh1.csv --offset 8640
  1. Accordingly, I revised the conf/finetune/val_data/etth1.yaml as
_target_: uni2ts.data.builder.ConcatDatasetBuilder
_args_:
  _target_: uni2ts.data.builder.simple.generate_eval_builders
  dataset: ETTh1_eval
  offset: 8640  # Same as _lsf_dataset.py
  eval_length: 2880  # Same as _lsf_dataset.py
  prediction_lengths: [96, 192, 336, 720]
  context_lengths: [1000, 2000, 3000, 4000, 5000]
  patch_sizes: [32, 64]
  1. Then I finetuned a Moirai model with the same command as the example:
python -m cli.finetune \
  run_name=my_lsf_run \
  model=moirai_1.0_R_small \
  data=etth1 \
  val_data=etth1
  1. Finally, I changed the ckpt in the model's yaml and evaluated the finetuned model by the 2nd approach in the example:
python -m cli.eval \
  run_name=my_lsf_run \
  model=moirai_1.0_R_small \
  model.patch_size=64 \
  model.context_length=5000 \
  data=lsf_test \
  data.dataset_name=ETTh1 \
  data.mode=M \
  data.prediction_length=96

Despite following these steps, the finetuning results are underperforming compared to the zero-shot outcomes (MSE is 0.375 and MAE is 0.402 in the original results).
image

I have a few questions:

  1. Is the workflow above correct? Does it use the same train-val-test split setup of the original experiments?
  2. Given data.mode = M during testing, do I need to build the dataset with wide_multivariate for finetuning?
  3. If the workflow is correct, do you have any suggestions to improve the finetuning performance?

Thank you for your assistance.

@gorold
Copy link
Contributor

gorold commented Apr 24, 2024

I believe it uses the same train/val/test split as the LSF setting. However, it doesn't perform normalization based on train set statistics, which is used in the LSF setting, so there is a mismatch between the fine-tuning and evaluation. If you want to fine-tune in a multivariate fashion, then yes, process it as a multivariate dataset, and also remove the SampleDimension transformation.

@zqiao11
Copy link
Contributor Author

zqiao11 commented Apr 25, 2024

Thanks for your reply. Following your suggestions, I normalized the data for fine-tuning, built the data in 'wide_multivariate' and removed the SampleDimension transformation.

However, when I ran the experiment, an error occurred:

...
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/eee/qzz/uni2ts/venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/eee/qzz/uni2ts/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/home/eee/qzz/uni2ts/src/uni2ts/data/loader.py", line 106, in __call__
    assert all(
AssertionError: Sample length must be less than or equal to max_length (512)

I think this error is caused by using dataset built in 'wide_multivariate' mode. How should I handle this issue? Do I need to modify this max_length here, and how to calculate this value?

@gorold
Copy link
Contributor

gorold commented May 7, 2024

Hi @zqiao11, sorry for the late response. Have you managed to resolve this issue? If not, could you provide more details?

@zqiao11
Copy link
Contributor Author

zqiao11 commented May 7, 2024

Hi. I haven't resolved this issue, but I have tracked the reason. This issue can happen when a flatten patchfied sequence exceeds the max_length=512 of Moirai. I think this could be common when processing data built in wide_multivariate.

For example, Etth1 has 7 variates. If I use a context_length=5000, prediction_length=96, and patch_size=64 (same config to reproduce LSF results), then there would be 81 patches for one variate. And after flattening the 7 variates, there are
567 patches ( equals target.size(1)), exceeding the max_seq_length of 512.

The assertion error is raised by the sequence packing function, which is only used in training and not used in forecasting. So that is why one can evaluate the model with mode=M without error.

BTW, is it safe to modify this max_seq_length? Besides sequence packing, I notice it is also used in the codes related to self-attention.

@zqiao11
Copy link
Contributor Author

zqiao11 commented May 7, 2024

FYI, you can reproduce this issue by running your example codes of finetuning. Just build the Etth1 dataset with wide_multivariate, and set context_length=5000, prediction_length=96, and patch_size=64.

@thisthq
Copy link

thisthq commented May 9, 2024

@zqiao11 Have you resolved this issue? I'm experiencing the same situation as you.

@wyhzunzun123123
Copy link

@zqiao11 Hello, I also finetuned the model in the ETTh1 and its performance decreased significantly. Have you solve this issue after normalizing the ETTH1.

@zqiao11
Copy link
Contributor Author

zqiao11 commented May 17, 2024

@wyhzunzun123123 Hi, I haven't solved this issue for ETTh1. Since the config for reproduction uses mode='M' in ETTh1, I think one may need to finetune it with dataset built in the multi-variate time series format. But I cannot handle the error caused by 'max_seq_len' and need to wait for the author's reply.

You may consider to finetune the model with ETTm1 dataset, which evals in mode='S' (build the dataset in 'wide').

@gorold
Copy link
Contributor

gorold commented May 29, 2024

So sorry for the delayed response, for the max_seq_len issue, you can use one of the following options:

  1. increase the max_seq_len parameter
  2. use a shorter context length
  3. add in the SampleDimension feature with the max_dim parameter set appropriately.

The idea is that we set a maximum number of tokens, max_seq_len. This is calculated by (context_len + prediction_len) / patch_size * dim.

@gorold
Copy link
Contributor

gorold commented May 29, 2024

Regarding a difference in performance with ETTh1, if you want to evaluate on the LSTF setting, you will have to perform a normalization on the train set statistics first.

@zqiao11
Copy link
Contributor Author

zqiao11 commented May 31, 2024

Thanks. Can you briefly explain the role of SampleDimension feature? Does it sample as many dimensions/variates as possible from an MTS with a given limit of max_seq_len?

@gorold
Copy link
Contributor

gorold commented Jun 5, 2024

It subsamples the variates given the max_dim parameter. max_seq_len is not given to SampleDimension as a parameter.

@DongChen06
Copy link

@zqiao11 Hi, have you solved the max_seq_len issues, any experiences with this error? Thank you so much!

@littlesun0727
Copy link

I face similar question. I finetune the pretrained model on ETTh1 but got worser result than original dataset. Has anybody solve this problem?

@zhangzw16
Copy link

I face similar question. I finetune the pretrained model on ETTh1 but got worser result than original dataset. Has anybody solve this problem?

Hi, I’m also researching the issue of fine-tuning, and maybe you could try lowering the default learning rate to 1e-5 (the default 1e-3 might be too high). If you encounter any issues, feel free to discuss further.

@littlesun0727
Copy link

I face similar question. I finetune the pretrained model on ETTh1 but got worser result than original dataset. Has anybody solve this problem?

Hi, I’m also researching the issue of fine-tuning, and maybe you could try lowering the default learning rate to 1e-5 (the default 1e-3 might be too high). If you encounter any issues, feel free to discuss further.

Thanks! I will have a try.

@littlesun0727
Copy link

I face similar question. I finetune the pretrained model on ETTh1 but got worser result than original dataset. Has anybody solve this problem?

Hi, I’m also researching the issue of fine-tuning, and maybe you could try lowering the default learning rate to 1e-5 (the default 1e-3 might be too high). If you encounter any issues, feel free to discuss further.

I decrease the learning rate to 1e-5, it performs better than finetuning with lr=1e-3 but still performs worse than the pretrained model. How about you? Did you solve the issue by decreasing the learning rate?

@zhangzw16
Copy link

I face similar question. I finetune the pretrained model on ETTh1 but got worser result than original dataset. Has anybody solve this problem?

Hi, I’m also researching the issue of fine-tuning, and maybe you could try lowering the default learning rate to 1e-5 (the default 1e-3 might be too high). If you encounter any issues, feel free to discuss further.

I decrease the learning rate to 1e-5, it performs better than finetuning with lr=1e-3 but still performs worse than the pretrained model. How about you? Did you solve the issue by decreasing the learning rate?

How do you set the hyper parameters (such as context_length, patch_size, variate_mode)? These params are important in my expriments.

@littlesun0727
Copy link

I face similar question. I finetune the pretrained model on ETTh1 but got worser result than original dataset. Has anybody solve this problem?

Hi, I’m also researching the issue of fine-tuning, and maybe you could try lowering the default learning rate to 1e-5 (the default 1e-3 might be too high). If you encounter any issues, feel free to discuss further.

I decrease the learning rate to 1e-5, it performs better than finetuning with lr=1e-3 but still performs worse than the pretrained model. How about you? Did you solve the issue by decreasing the learning rate?

How do you set the hyper parameters (such as context_length, patch_size, variate_mode)? These params are important in my expriments.

Does it need to set the the context_len and patch_size in the finetune module? I just set as default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

8 participants