You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
https://github.com/simonw/sqlite-diffable can be used to dump the content of a sqlite db as text files and reloading them, this is useful to keep track of data/schemas using git.
I tried to use it for dumping and re-loading the llm logs db and faced some difficulties. Not sure if these are actual problems that need fixing so I am opening this to ask the questions and provide some context.
AFAIU this happens because the database uses the FTS module for full-text search and this means that, upon creation and population of the responses_fts table, 4 other tables are automatically created and populated (responses_fts_data, responses_fts_idx, responses_fts_docsize, responses_fts_config), this is explained here https://www.sqlite.org/fts5.html#fts5_data_structures
So I tired to using the suggested --replace option and even though it leads to successful completion of the sqlite-diffable load command, it then errors when trying to query the database (redacted content for brevity)
$ sqlite-diffable load logs_recovered.db dump/ --replace
$ sqlite-utils tables logs_recovered.db --columns --table --counts
Traceback (most recent call last):
File "/Users/giuli077/.local/bin/sqlite-utils", line 8, in <module>
sys.exit(cli())
^^^^^
...
File "/Users/giuli077/.local/pipx/venvs/sqlite-utils/lib/python3.11/site-packages/sqlite_utils/db.py", line 1315, in count_where
return self.db.execute(sql, where_args or []).fetchone()[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/giuli077/.local/pipx/venvs/sqlite-utils/lib/python3.11/site-packages/sqlite_utils/db.py", line 533, in execute
return self.conn.execute(sql, parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.DatabaseError: vtable constructor failed: responses_fts
I haven't dug into this specific error but I assume something goes wrong because the values of the response_fts_* tables that are automatically created when responses_fts is first loaded are replaced by values of the contents of those tables from the dump and that leads to some inconsistency in that table data.
I then decided to exclude from the load the host FTS-specific tables and only load the rest
this works but I noticed there is some inconsistency with the number of rows in the responses_fts_data (8 rows) and responses_fts_idx (6 rows) which in the original db had more rows (21 and 19 respectively)
I am not an expert on how these tables work, I had a look at the changes in responses_fts_idx.ndjson (which are vaguely human-readable compared to those in responses_fts_data.ndjson), it seems there are a bunch of removed rows with very similar content "b''" and also some others
full-text search still seems to work on the re-loaded logs_recovered_partial.db so I am not sure this is an actual problem.
My question(s): is this behaviour ok? Would the missing data in responses_fts_data and responses_fts_idx cause any potential issue that is not immediately obvious to me? or can I happlily rely on sqlite doing the right thing when it recreates and repopulates the responses_fts_* tables upon re-loading of responses_fts?
Note: I do realise this question might not be closely related to the llm cli tool and it is quite likely just a clarification about FTS and using sqlite-diffable with databases that use it.
Because this is the use-case I encountered I decided to ask in this project anyway :).
It is perhaps an opportunity to consider explicitly pointing people at sqlite-diffable as a way to version-control the conversations they have via llm (maybe to add somewhere in the docs).
The text was updated successfully, but these errors were encountered:
https://github.com/simonw/sqlite-diffable can be used to dump the content of a sqlite db as text files and reloading them, this is useful to keep track of data/schemas using git.
I tried to use it for dumping and re-loading the
llm
logs db and faced some difficulties. Not sure if these are actual problems that need fixing so I am opening this to ask the questions and provide some context.when trying to reload it
AFAIU this happens because the database uses the FTS module for full-text search and this means that, upon creation and population of the
responses_fts
table, 4 other tables are automatically created and populated (responses_fts_data
,responses_fts_idx
,responses_fts_docsize
,responses_fts_config
), this is explained here https://www.sqlite.org/fts5.html#fts5_data_structuresSo I tired to using the suggested
--replace
option and even though it leads to successful completion of the sqlite-diffable load command, it then errors when trying to query the database (redacted content for brevity)I haven't dug into this specific error but I assume something goes wrong because the values of the response_fts_* tables that are automatically created when responses_fts is first loaded are replaced by values of the contents of those tables from the dump and that leads to some inconsistency in that table data.
I then decided to exclude from the load the host FTS-specific tables and only load the rest
this works but I noticed there is some inconsistency with the number of rows in the
responses_fts_data
(8 rows) andresponses_fts_idx
(6 rows) which in the original db had more rows (21 and 19 respectively)I am not an expert on how these tables work, I had a look at the changes in
responses_fts_idx.ndjson
(which are vaguely human-readable compared to those inresponses_fts_data.ndjson
), it seems there are a bunch of removed rows with very similar content"b''"
and also some othersfull-text search still seems to work on the re-loaded
logs_recovered_partial.db
so I am not sure this is an actual problem.My question(s): is this behaviour ok? Would the missing data in
responses_fts_data
andresponses_fts_idx
cause any potential issue that is not immediately obvious to me? or can I happlily rely on sqlite doing the right thing when it recreates and repopulates the responses_fts_* tables upon re-loading ofresponses_fts
?Note: I do realise this question might not be closely related to the
llm
cli tool and it is quite likely just a clarification about FTS and usingsqlite-diffable
with databases that use it.Because this is the use-case I encountered I decided to ask in this project anyway :).
It is perhaps an opportunity to consider explicitly pointing people at
sqlite-diffable
as a way to version-control the conversations they have viallm
(maybe to add somewhere in the docs).The text was updated successfully, but these errors were encountered: