Replies: 2 comments
-
Thanks @simolus3 for the this important proposal and for the detailed clarification. Well, my use case is based on spatial data through PostGIS and Spatialite extension in SQLite, which depends on Geometries as Binary data type. And as you mentioned, the binary upload would be through a separate rpc endpoint. |
Beta Was this translation helpful? Give feedback.
-
In theory, now that we support custom domains on Postgres, it should be possible to use that. You can essentially create a custom domain that persists data as a |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Motivation
Fundamentally, PowerSync evaluates and syncs data as JSON text. Because every part of the stack (from SQLite to the SDKs) has great JSON support, this is a reasonable decision to keep the protocol simple.
However, it also means that we can only sync the following SQLite data types:
null
Notably missing is support for binary data (the BLOB type in SQLite). For larger binary data such as images, our recommendation is to store them out of the primary database, instead using references to attachment ids to drive that sync proces.
Especially for smaller binary data though, redesigning a database to use attachments is a significant change. Given that both SQLite and all source databases support binary data, the lack of support for binary data is a pain point for some users.
One workaround is to
hex()
data in sync stream definitions and to use raw tables / views tounhex()
it on the client.Status
2025-09-08: Proposal created
Feedback wanted!
We see that support for binary data is a popular request. As this proposal shows however, it requires changes to the entire PowerSync stack. And especially for Supabase, uploading binary data requires a custom procedure.
So, we're curious to hear from users interested in binary data:
a. If it's not small (say >1MB), why are attachment helpers not an option for you?
b. If it is small, why is serializing binary data with
hex
/unhex
not an option?Proposal
Handling binary data requires changes across the entire stack:
Bucket storage
To store evaluated rows with binary data in them, JSON is not a viable option. Given that we're already using BSON in a few places, that seems like a reasonable alternative.
Our checksum calculation currently uses strings, but is fundamentally based on SHA-256 which natively works on bytes. Changing that means the checksum for identical rows would also change, but since this only affects newly-processed rows that may be fine (TODO: investigate).
Protocol changes
Since not all transport protocols (or clients) support binary data, this would have to be opt-in. Clients could set a flag in the sync request indicating that they're ready to process binary data, and then the sync service could detect whether a compatible transport mode is used. If that is the case, we'd sync BSON documents potentially containing binary data.
If not, we'd have to decode the documents, strip out binary values and re-encode them as JSON for older clients. This would make the sync process slightly more expensive on the sync service side.
Another option could be to split evaluated rows into two objects: A JSON-object containing non-binary data, and a BSON object for binary fields. For the majority of rows which don't contain binary data, no additional processing would be necessary with that mode. Clients would have to merge these values together, but since this requires client-side changes either way it shouldn't be a problem.
Sync client changes
On clients, we would like to keep using standard JSON values where possible. This:
Again, we could use the trick of keeping the JSON subset intact, and only introduce a separate BSON object for binary values (setting it to
null
for rows without binary data).The
ps_data
tables would have to be migrated to add the binary column, which would then allow copying it from oplog entries in thesync_local
process.To expose binary data from PowerSync-managed views, we'd need a custom SQL function parsing and extracting from BSON.
To support binary data for raw tables, the "extract from column" source would have to take the separate binary value into account.
Another issue is what we'd do with mismatched client schemas. Today, a column defines as an
int
on the client can be interpreted from a synced text value because we insert aCAST
operation parsing the value. If we split data into JSON and a binary object however, reading anint
column would look up values from the JSON object. So if the actual value was a BLOB, the result would beNULL
. Similarly, interpreting a value synced as text as aBLOB
would returnNULL
because the value would not be present in the BSON column. This may be acceptable, but it's still a weird distinction we don't have for other types.Local writes
PowerSync detects local writes with triggers inserting into
ps_crud
. Depending on table options, we'd either insert the entire updated row or just the updated fields.Supporting binary data requires multiple changes to that table:
ps_crud
: Similar to the other tables, this would need a BSON column for binary values in addition to the existing JSON column.CRUD representation
While we have a small BSON implementation in the core extension we need for binary sync lines, we'd like to avoid introducing BSON to all SDKs. So if binary values for local writes are represented as BSON, how do we load that into the SDKs?
We could introduce a helper that allows SDKs to interpret these values without a full BSON decoder, e.g.
Since SDKs expose binary values as a
Map<String, Object?>
, introducing aUint8List
value would not be breaking.Finally, a popular backend connector (also offered as a builtin API in some SDKs) is postgrest with Supabase. Postgrest does not support uploading binary data as-is. For inserts, postgrest also doesn't allow specifying a SQL transformation for columns (so we can't do a
unhex($1)
).So this would mean that support for binary data ends at the connector, with users being responsible for writing a binary-aware connector if they want to upload those values.
Beta Was this translation helpful? Give feedback.
All reactions