Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-shot performance is unexpected poor #162

Open
dawnvince opened this issue Dec 12, 2024 · 6 comments
Open

Zero-shot performance is unexpected poor #162

dawnvince opened this issue Dec 12, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@dawnvince
Copy link

dawnvince commented Dec 12, 2024

Describe the bug
This is quite a meaningful work. Thanks for your hard work and contributions.

I'm going to test the zero-shot ability of Moirai-Moe-small on a periodic time series. However, the performance seems, emmm..., quite poor. Here is my code and the plot. So, is the issue owing to the potential bugs in my code? Or owing to the model's ability when handling some time series that were originally easy to predict? (I tried to normalize the lookback window, but it didn't work)

To Reproduce
Please provide a code snippet of a minimal reproducible example for the error.

MODEL = "moirai-moe"  # model name: choose from {'moirai', 'moirai-moe'}
SIZE = "small"  # model size: choose from {'small', 'base', 'large'}
PDT = 336  # prediction length: any positive integer
CTX = 672 # context length: any positive integer
PSZ = "auto"  # patch size: choose from {"auto", 8, 16, 32, 64, 128}
BSZ = 8  # batch size: any positive integer


# Prepare pre-trained model by downloading model weights from huggingface hub
model = MoiraiMoEForecast(
    module=MoiraiMoEModule.from_pretrained(f"pretrained-models/Moirai-moe-{SIZE}"),
    prediction_length=PDT,
    context_length=CTX,
    patch_size=16,
    num_samples=50,
    target_dim=1,
    feat_dynamic_real_dim=0,
    past_feat_dynamic_real_dim=0,
)
    

model_device = "cuda" if torch.cuda.is_available() else "cpu"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
test_loader = torch.utils.data.DataLoader(
    SyntheticDataset(CTX, PDT, dataset="Mini"), 
    batch_size=BSZ, 
    shuffle=True
)

model = model.to(device)
loop = tqdm.tqdm(enumerate(test_loader),total=len(test_loader),leave=True)

past_target = torch.empty(BSZ, CTX, 1)
past_observed_target = torch.ones_like(past_target, dtype=torch.bool).to(device)
past_is_pad = torch.zeros_like(past_target, dtype=torch.bool).squeeze(-1).to(device)

# seq: [batch_size, CTX], pred: [batch_size, PDT]
for idx, (seq, pred) in loop:
    seqs = seq.float().to(device)
    pred = pred.float().to(device)
    
    # mean, std = seqs.mean(dim=-1, keepdim=True), seqs.std(dim=-1, keepdim=True)
    # seqs = (seqs - mean) / std
    
    seqs = seqs.unsqueeze(-1)
    
    predictions = model(
        past_target=seqs,
        past_observed_target=past_observed_target,
        past_is_pad=past_is_pad,
    )
    
    predictions = torch.mean(predictions, dim=1)
    # predictions = predictions * std + mean
    
    print(predictions)
    print(pred)
    
    plt.figure()
    ... (plot figure)

Results:
without norm:
res

with norm:

res

@dawnvince dawnvince added the bug Something isn't working label Dec 12, 2024
@chenghaoliu89
Copy link
Contributor

hi @dawnvince, increase the context length to 5000, 8000 or more, try patch size 16, 32 ,64 instead of auto. If you still have issue, you can share me the data.

@dawnvince
Copy link
Author

dawnvince commented Dec 13, 2024

Thanks for your timely reply. According to your suggestion, I span the CTX to a larger range by interpolating values, and change the patch size to 32/64. Here's my code:

...
for idx, (seq, pred) in loop:
    seq = seq.float().to(device)
    pred = pred.float().to(device)
    
    seqs = seq.unsqueeze(1)
    seqs = F.interpolate(seqs, size=CTX * SCA, mode='linear', align_corners=False)
    seqs = seqs.squeeze(1)
    
    pred = pred.unsqueeze(1)
    pred = F.interpolate(pred, size=PDT * SCA, mode='linear', align_corners=False)
    pred = pred.squeeze(1)
    
    mean, std = seqs.mean(dim=-1, keepdim=True), seqs.std(dim=-1, keepdim=True)
    normed_seqs = (seqs - mean) / std
    
    normed_seqs = normed_seqs.unsqueeze(-1)
    
    predictions = model(
        past_target=normed_seqs,
        past_observed_target=past_observed_target,
        past_is_pad=past_is_pad,
    )
...

While the performance is still unsatisfactory.
res

Here are the data of this sample, x.npy is the CTX data, and gt.npy is the PDT data.

tsd.zip

Thanks again for your reply, and look forward to hearing from you soon!

@liuxu77
Copy link
Contributor

liuxu77 commented Dec 13, 2024

Hi @dawnvince @chenghaoliu89 , patch size is fixed to 16, so changing the patch size will not influence the results. Could you try num_samples=100? Thanks.

Also, I noticed you use the mean to average the predictions here: torch.mean(predictions, dim=1). Could you use the median instead?

@dawnvince
Copy link
Author

@liuxu77 Thx for your reply. I have changed the settings, but the results are also almost the same as before. So it may still have some room for optimization, and I'm delighted to discuss it at any time.

@liuxu77 liuxu77 reopened this Dec 16, 2024
@liuxu77
Copy link
Contributor

liuxu77 commented Dec 16, 2024

Hi @dawnvince, thank you for the reply. In the tsd.zip file, there is only the x.npy file, and I cannot find gt.npy, could you also share gt.npy file? Thanks.

@dawnvince
Copy link
Author

Oh, sorry for that. Here is the new file:
tsd.zip

@liuxu77 liuxu77 closed this as completed Jan 7, 2025
@liuxu77 liuxu77 reopened this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants