You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Resources that support parameters (warehouse, database, schema, table, task) require at least 2 metadata queries to fetch, typically a SHOW command and a SHOW PARAMETERS IN command. For example, in Titan Schema supports the max_data_extension_time_in_days parameter.
For most fetch methods, Titan uses the broadest possible SHOW command (eg SHOW TABLES IN ACCOUNT) and caches the result. This minimizes queries to Snowflake. In the best case, only a single SHOW command is needed for all resources of the same type in the blueprint.
However, there is no way to retrieve resource parameters in bulk. So a SHOW PARAMETERS command must be run for every single resource that supports parameters. This leads to unnecessary performance issues.
My proposal is to handle parameters similar to tags: accept parameters as named kwargs just like today, but instead of storing them in the _data member, make a call to set_parameters(...) which will selectively create a new resource type: ResourceParameters.
When a user specifies parameters for a resource, Titan will track and fetch those parameters separately. Critically: if parameters aren't specified by a user, Titan will skip fetching them, saving time.
This design has trade-offs. During apply, when a resource with parameters is created for the first time, Titan will run 2 commands (CREATE & ALTER) instead of just one. To me, this trade-off is acceptable. This proposed design will significantly reduce queries required for plan and export. This design has no performance impact on apply if parameter fields aren't used.
There are issues with drift that also need to be considered.
The text was updated successfully, but these errors were encountered:
Resources that support parameters (warehouse, database, schema, table, task) require at least 2 metadata queries to fetch, typically a
SHOW
command and aSHOW PARAMETERS IN
command. For example, in TitanSchema
supports themax_data_extension_time_in_days
parameter.For most fetch methods, Titan uses the broadest possible SHOW command (eg
SHOW TABLES IN ACCOUNT
) and caches the result. This minimizes queries to Snowflake. In the best case, only a single SHOW command is needed for all resources of the same type in the blueprint.However, there is no way to retrieve resource parameters in bulk. So a SHOW PARAMETERS command must be run for every single resource that supports parameters. This leads to unnecessary performance issues.
My proposal is to handle parameters similar to tags: accept parameters as named kwargs just like today, but instead of storing them in the
_data
member, make a call toset_parameters(...)
which will selectively create a new resource type:ResourceParameters
.When a user specifies parameters for a resource, Titan will track and fetch those parameters separately. Critically: if parameters aren't specified by a user, Titan will skip fetching them, saving time.
This design has trade-offs. During
apply
, when a resource with parameters is created for the first time, Titan will run 2 commands (CREATE & ALTER) instead of just one. To me, this trade-off is acceptable. This proposed design will significantly reduce queries required forplan
andexport
. This design has no performance impact onapply
if parameter fields aren't used.There are issues with drift that also need to be considered.
The text was updated successfully, but these errors were encountered: