Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when preparing data for finetuning #122

Closed
marcopeix opened this issue Sep 11, 2024 · 2 comments
Closed

Bug when preparing data for finetuning #122

marcopeix opened this issue Sep 11, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@marcopeix
Copy link

I've run into a bug that I can't fix when trying to prepare a dataset for finetuning.

Here's the code:

def data_generator() -> Generator[dict[str, Any]]:
    yield {
        "target": df['Weekly_Sales'].to_numpy(),
        "start": df.index[0],
        "freq": pd.infer_freq(df.index),
        "item_id": "1",
    }

features = Features(
    dict(
        target=Sequence(Value("float32")),
        start=Value("date32")),
        freq=Value("string"),
        item_id=Value("string"),
    )

hf_dataset = Dataset.from_generator(data_generator, features=features)

hf_dataset.save_to_disk(Path("sales_dataset/"))

df = hf_dataset.to_pandas()

df.to_csv('sales_dataset/sales_data.csv', index=False)

Then, when I run python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long , I get the error:

IndexError: index 0 is out of bounds for axis 0 with size 0. Not sure why that happens, as my df is not empty, and the .csv is not empty either.

Here's the CSV I'm using: https://raw.githubusercontent.com/marcopeix/FoundationModelsForTimeSeriesForecasting/main/data/walmart_sales_small.csv

I'm only using data for Store==1 (143 rows of data) and the first three columns only (Store, Date, Weekly_Sales). Prior to running the function, I set the index as the Date column.

What am I missing?

@marcopeix marcopeix added the bug Something isn't working label Sep 11, 2024
@gorold
Copy link
Contributor

gorold commented Sep 30, 2024

didn't look too deeply into this, but I'm guessing it's due to the format (column names) of your data frame?

item_df = df.query(f'item_id == "{item_id}"').drop("item_id", axis=1)

@chenghaoliu89
Copy link
Contributor

Hi @marcopeix, have you solved this issue? If so, I will close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants