Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when trying to prepare custom dataset for finetuning #102

Closed
marcopeix opened this issue Aug 11, 2024 · 4 comments
Closed

Bug when trying to prepare custom dataset for finetuning #102

marcopeix opened this issue Aug 11, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@marcopeix
Copy link

I've run into a bug that I can't fix when trying to prepare a dataset for finetuning.

Here's the code:

def data_generator() -> Generator[dict[str, Any]]:
    yield {
        "target": df['Weekly_Sales'].to_numpy(),
        "start": df.index[0],
        "freq": pd.infer_freq(df.index),
        "item_id": "1",
    }

features = Features(
    dict(
        target=Sequence(Value("float32")),
        start=Value("date32")),
        freq=Value("string"),
        item_id=Value("string"),
    )

hf_dataset = Dataset.from_generator(data_generator, features=features)

hf_dataset.save_to_disk(Path("sales_dataset/"))

df = hf_dataset.to_pandas()

df.to_csv('sales_dataset/sales_data.csv', index=False)

Then, when I run python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long , I get the error:

IndexError: index 0 is out of bounds for axis 0 with size 0. Not sure why that happens, as my df is not empty, and the .csv is not empty either.

What am I missing?

@marcopeix marcopeix added the bug Something isn't working label Aug 11, 2024
@liu-jc
Copy link
Contributor

liu-jc commented Aug 13, 2024

Hi @marcopeix,

Could you please provide a sample .csv you used? We can look more into it.

@marcopeix
Copy link
Author

@liu-jc sure here's the CSV I'm using: https://raw.githubusercontent.com/marcopeix/FoundationModelsForTimeSeriesForecasting/main/data/walmart_sales_small.csv

I'm only using data for Store==1 (143 rows of data) and the first three columns only (Store, Date, Weekly_Sales). Prior to running the function, I set the index as the Date column.

@marcopeix
Copy link
Author

@liu-jc, did you have time to take a look at this? It's blocking me in my progress! Thanks!

@chenghaoliu89
Copy link
Contributor

closed, the same issue as #122

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants