[sql_server] Add mapping from SQL Server data types to Materialize data types #32040

ParkMyCar · 2025-03-28T16:19:18Z

This PR adds types to describe tables and columns from MS SQL Server and how data will map into Materialize. On each new type I tried to document how it fits into the flow of a theoretical SQL Server source, but reviewers please feel free to push back if something isn't obvious. Also, the module comment in sql-server-util/src/desc.rs should (hopefully) add some helpful context.

The specific mappings from SQL Server datatypes into Materialize is described here, this PR follows that spec.

For testing, this PR relies on some test helpers I added in MaterializeInc/tiberius#1.

Throughout the code you'll see a few comments like TODO(sql_serverX) the idea here is X defines the milestone which these need to be solved, roughly:

TODO(sql_server1): Before private preview
TODO(sql_server2): Before GA
TODO(sql_server3): Backlog

Motivation

Progress towards https://github.com/MaterializeInc/database-issues/issues/8762

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

…ns into Materialize tables and columns

ParkMyCar · 2025-03-28T16:20:36Z

@martykulma and @DAlperin if either of y'all don't feel comfortable reviewing this feel let me know! Nothing here is really Materialize specific, mostly just massaging data from one format to another.

DAlperin

This looks sane to me. One comment about something I don't understand. I assume this will be followed by subsequent PRs?

DAlperin · 2025-03-28T16:24:05Z

src/sql-server-util/src/desc.rs

+        "real" => (ScalarType::Float32, SqlServerColumnDecodeType::F32),
+        "double" => (ScalarType::Float64, SqlServerColumnDecodeType::F64),
+        "char" | "nchar" | "varchar" | "nvarchar" | "text" | "sysname" => {
+            // When the `max_length` is -1 SQL Server will not present us with the "before" value


What is the "before" value in this context?

Updated the comment, hopefully it makes more sense now!

Basically for sources when an UPDATE or DELETE occurs in the upstream DB we need to get the old value of the row as it existed before it was updated or deleted so we can emit (old_val, -1) into the data flow. In SQL Server these data types don't always include the old value, so for the moment we can't support them.

ParkMyCar · 2025-03-28T17:25:20Z

TFTR!

martykulma · 2025-03-28T18:47:34Z

src/sql-server-util/src/desc.rs

+    /// Maximum length (in bytes) of the column.
+    ///
+    /// For `varchar(max)`, `nvarchar(max)`, `varbinary(max)`, or `xml` this will be `-1`. For
+    /// `text`, `ntext`, and `image` columns this will be 16.
+    pub max_length: i16,


does raw.max_length == -1 imply unlimited length (e.g. ln 235 above) for all data types or just for some?

json also return -1?

It turns out json is a brand new experimental type, only supported in Azure SQL as of 2024, so docs on it are sparse.

Thank you for the comment though! I went back and double checked these values and realized parse_data_type(...) wasn't totally up to date! I also also linked the SQL Server docs that call out what these values will be

martykulma · 2025-03-28T18:56:12Z

src/sql-server-util/src/desc.rs

+        column: &'a ColumnType,
+    ) -> Result<Datum<'a>, SqlServerError> {
+        let maybe_datum = match (self, &column.scalar_type) {
+            (SqlServerColumnDecodeType::Bool, ScalarType::Bool) => data


super nit: 😂 the tuples being backwards from parse_data_types may cause some discomfort for certain individuals

Haha totally fair! I flipped them around!

martykulma · 2025-03-28T19:21:02Z

src/sql-server-util/src/desc.rs

+            (SqlServerColumnDecodeType::DateTime, ScalarType::Interval) => data
+                .try_get(name)
+                .context("interval")?
+                .map(|val: chrono::DateTime<chrono::Utc>| {
+                    let ts = val.try_into().context("parse timestamptz")?;
+                    Ok::<_, SqlServerError>(Datum::TimestampTz(ts))
+                })
+                .transpose()?,


My understanding is that this decode is called on each row based on SqlServerTableDesc, which gets its column types from parse_data_type.

I wasn't seeing where ScalarType::Interval is set - is this supposed to be ScalarType::TimestampTz to match datetimeoffset?

Great catch! This was stale from when I initially thought datetimeoffset was an interval-like type, instead it's actually a timestamptz type and you're seeing that I didn't complete the refactor 😂

* flip order of arguments so it's consistent in the entire file * update comment to include a link to the docs * fix stale reference to Interval

start, add types for transforming between SQL Server tables and colum…

6a89ca9

…ns into Materialize tables and columns

ParkMyCar requested review from DAlperin and martykulma March 28, 2025 16:19

DAlperin approved these changes Mar 28, 2025

View reviewed changes

update comment and change commit in tiberius fork

618a839

martykulma reviewed Mar 28, 2025

View reviewed changes

responding to GitHub feedback

cdff129

* flip order of arguments so it's consistent in the entire file * update comment to include a link to the docs * fix stale reference to Interval

ParkMyCar enabled auto-merge (squash) March 28, 2025 19:59

ParkMyCar merged commit d38c022 into MaterializeInc:main Mar 28, 2025
82 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sql_server] Add mapping from SQL Server data types to Materialize data types #32040

[sql_server] Add mapping from SQL Server data types to Materialize data types #32040

ParkMyCar commented Mar 28, 2025

ParkMyCar commented Mar 28, 2025

DAlperin left a comment

DAlperin Mar 28, 2025

ParkMyCar Mar 28, 2025

ParkMyCar commented Mar 28, 2025

martykulma Mar 28, 2025

ParkMyCar Mar 28, 2025

martykulma Mar 28, 2025

ParkMyCar Mar 28, 2025

martykulma Mar 28, 2025

ParkMyCar Mar 28, 2025

[sql_server] Add mapping from SQL Server data types to Materialize data types #32040

[sql_server] Add mapping from SQL Server data types to Materialize data types #32040

Conversation

ParkMyCar commented Mar 28, 2025

Motivation

Checklist

ParkMyCar commented Mar 28, 2025

DAlperin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParkMyCar commented Mar 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment