-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sql_server] Add mapping from SQL Server data types to Materialize data types #32040
[sql_server] Add mapping from SQL Server data types to Materialize data types #32040
Conversation
…ns into Materialize tables and columns
@martykulma and @DAlperin if either of y'all don't feel comfortable reviewing this feel let me know! Nothing here is really Materialize specific, mostly just massaging data from one format to another. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks sane to me. One comment about something I don't understand. I assume this will be followed by subsequent PRs?
"real" => (ScalarType::Float32, SqlServerColumnDecodeType::F32), | ||
"double" => (ScalarType::Float64, SqlServerColumnDecodeType::F64), | ||
"char" | "nchar" | "varchar" | "nvarchar" | "text" | "sysname" => { | ||
// When the `max_length` is -1 SQL Server will not present us with the "before" value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "before" value in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment, hopefully it makes more sense now!
Basically for sources when an UPDATE
or DELETE
occurs in the upstream DB we need to get the old value of the row as it existed before it was updated or deleted so we can emit (old_val, -1)
into the data flow. In SQL Server these data types don't always include the old value, so for the moment we can't support them.
TFTR! |
src/sql-server-util/src/desc.rs
Outdated
/// Maximum length (in bytes) of the column. | ||
/// | ||
/// For `varchar(max)`, `nvarchar(max)`, `varbinary(max)`, or `xml` this will be `-1`. For | ||
/// `text`, `ntext`, and `image` columns this will be 16. | ||
pub max_length: i16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does raw.max_length == -1
imply unlimited length (e.g. ln 235 above) for all data types or just for some?
json also return -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out json
is a brand new experimental type, only supported in Azure SQL as of 2024, so docs on it are sparse.
Thank you for the comment though! I went back and double checked these values and realized parse_data_type(...)
wasn't totally up to date! I also also linked the SQL Server docs that call out what these values will be
src/sql-server-util/src/desc.rs
Outdated
column: &'a ColumnType, | ||
) -> Result<Datum<'a>, SqlServerError> { | ||
let maybe_datum = match (self, &column.scalar_type) { | ||
(SqlServerColumnDecodeType::Bool, ScalarType::Bool) => data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: 😂 the tuples being backwards from parse_data_types may cause some discomfort for certain individuals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha totally fair! I flipped them around!
src/sql-server-util/src/desc.rs
Outdated
(SqlServerColumnDecodeType::DateTime, ScalarType::Interval) => data | ||
.try_get(name) | ||
.context("interval")? | ||
.map(|val: chrono::DateTime<chrono::Utc>| { | ||
let ts = val.try_into().context("parse timestamptz")?; | ||
Ok::<_, SqlServerError>(Datum::TimestampTz(ts)) | ||
}) | ||
.transpose()?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this decode is called on each row based on SqlServerTableDesc
, which gets its column types from parse_data_type
.
I wasn't seeing where ScalarType::Interval
is set - is this supposed to be ScalarType::TimestampTz
to match datetimeoffset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! This was stale from when I initially thought datetimeoffset
was an interval-like type, instead it's actually a timestamptz type and you're seeing that I didn't complete the refactor 😂
* flip order of arguments so it's consistent in the entire file * update comment to include a link to the docs * fix stale reference to Interval
This PR adds types to describe tables and columns from MS SQL Server and how data will map into Materialize. On each new type I tried to document how it fits into the flow of a theoretical SQL Server source, but reviewers please feel free to push back if something isn't obvious. Also, the module comment in
sql-server-util/src/desc.rs
should (hopefully) add some helpful context.The specific mappings from SQL Server datatypes into Materialize is described here, this PR follows that spec.
For testing, this PR relies on some test helpers I added in MaterializeInc/tiberius#1.
Throughout the code you'll see a few comments like
TODO(sql_serverX)
the idea here is X defines the milestone which these need to be solved, roughly:TODO(sql_server1)
: Before private previewTODO(sql_server2)
: Before GATODO(sql_server3)
: BacklogMotivation
Progress towards https://github.com/MaterializeInc/database-issues/issues/8762
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.