Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep: The export from pinecone fails due to some data type error #105

Open
actuallyabhi opened this issue Jul 19, 2024 · 3 comments
Open
Labels
sweep Sweep your software chores

Comments

@actuallyabhi
Copy link

actuallyabhi commented Jul 19, 2024

Details

Fetching namespaces: 0% 0/1 [02:54<?, ?it/s] Error: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 89, in main run_export(span) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 149, in run_export export_obj = slug_to_export_func[args["vector_database"]](args) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 164, in export_vdb pinecone_export.get_data() File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 481, in get_data index_meta = self.get_data_for_index(index_name) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 575, in get_data_for_index total_size += self.save_vectors_to_parquet( File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/vdb_export_cls.py", line 87, in save_vectors_to_parquet df.to_parquet(parquet_file) File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 2970, in to_parquet return to_parquet( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 483, in to_parquet impl.write( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 189, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 624, in dataframe_to_arrays arrays[i] = maybe_fut.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 340, in pyarrow.lib.array File "pyarrow/array.pxi", line 86, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Exporting fluidaigpt-dev: 0% 0/1 [02:56<?, ?it/s] Final Step: Fetching vectors: 100% 14404/14404 [02:39<00:00, 90.24it/s]

Branch

No response

@actuallyabhi actuallyabhi added the sweep Sweep your software chores label Jul 19, 2024
Copy link
Contributor

sweep-ai bot commented Jul 19, 2024

Sweeping

0%

💎 Sweep Pro: You have unlimited Sweep issues

Actions

  • ↻ Restart Sweep

Step 1: 🔎 Searching

I'm searching for relevant snippets in your repository. If this is your first time using Sweep, I'm indexing your repository, which will take a few minutes.


Tip

To recreate the pull request, edit the issue title or description.

Copy link

greptile-apps bot commented Jul 19, 2024

To resolve the data type error during export from Pinecone, modify the save_vectors_to_parquet method in /src/vdf_io/export_vdf/vdb_export_cls.py to ensure the created_at column is converted to a double. Add the following code before calling df.to_parquet(parquet_file):

if 'created_at' in df.columns:
    df['created_at'] = df['created_at'].astype(float)

This will convert the created_at column to the correct data type.

References

/.github/ISSUE_TEMPLATE
/src/vdf_io/import_vdf/pinecone_import.py
/src/vdf_io/export_vdf_cli.py
/archive/example-VDF_META.json
/src/vdf_io/import_vdf
/src/vdf_io/notebooks/test_filtering_pc_log.ipynb
/src/vdf_io/notebooks
/src/vdf_io/import_vdf_cli.py
/docs/export_vdf_pinecone_help.txt
/.github/ISSUE_TEMPLATE/sweep-template.yml
/.github/ISSUE_TEMPLATE/support-for-new-vector-db.md
/src/vdf_io/export_vdf/vertexai_vector_search_export.py
/.github
/src/vdf_io
/src/vdf_io/scripts/push_to_hub_vdf.py
/docs
/src
/src/vdf_io/export_vdf/pinecone_export.py
/archive
/src/vdf_io/notebooks/kdbai_end_to_end_vectorIO.ipynb
/README.md
/src/vdf_io/export_vdf/vdb_export_cls.py

About Greptile

This response provides a starting point for your research, not a precise solution.

Help us improve! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

Ask Greptile · Edit Issue Bot Settings

@dhruv-anand-aintech
Copy link
Member

dhruv-anand-aintech commented Jul 19, 2024

What type is the created at column in your original index?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sweep Sweep your software chores
Projects
None yet
Development

No branches or pull requests

2 participants