Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Inserts of Default Values/Dictionaries #107

Open
genzgd opened this issue Feb 1, 2023 · 6 comments
Open

Simplify Inserts of Default Values/Dictionaries #107

genzgd opened this issue Feb 1, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@genzgd
Copy link
Collaborator

genzgd commented Feb 1, 2023

Currently the Client insert method requires the caller to create a complete, populated dataset, including any default values. There should be mechanism where missing/default values are automatically supplied for None values when inserting into a non-nullable columns.

This is closely related to an interface which allows building an insert of Map/Dictionaries using the map/dict keys to determine column names. Missing keys would use the ClickHouse default value.

Determining "non-standard" default values from the "Describe Table" query to ClickHouse is somewhat challenging, as the default expression is a string that ClickHouse Connect would have to evaluate.

@genzgd genzgd added the enhancement New feature or request label Feb 1, 2023
@PHameete
Copy link

Hi @genzgd are there any plans to implement this behavior into clickhouse-connect soon?

And if not: are you open to accept a PR for this?

@genzgd
Copy link
Collaborator Author

genzgd commented Dec 27, 2023

This is not currently a priority, so a PR in this area would be welcome.

Just a warning, one of the reasons that it's not a priority is because it's a tricky problem. When inserting data using Native format, there's no way to specify that the inserted value should be converted to the ClickHouse default value (although this functionality was implemented for RowBinary using a new format). So I don't see any easy way to make it work for calculated default values like now() or column_1 * column_2.

One possible solution would be to do such inserts using a JSON or RowBinaryWithDefaults format, but that would be significantly slower and require a fair amount of additional code.

Because of the complexity of the problem, I would recommend outlining your planned approach here for review before writing a lot of code.

@PHameete
Copy link

Got it - Thanks for elaborating on that. That does sound more complex than I was hoping it'd be ;-)

It also does sound like RowBinaryWithDefaults would be the way to go, to ensure it would work with the calculated values you describe.

Why would it be significantly slower - if I understand correctly there's an extra byte added to each field?

@genzgd
Copy link
Collaborator Author

genzgd commented Dec 28, 2023

This library originally started with the RowBinary format -- if you look at very old (pre 0.3.0) versions, you'll see the original code. However, it turns out to be very slow to call a separate conversion function on each value when building (or reading) the data for the binary row. The ClickHouse Native format is columnar, so hundreds or thousands of values in the same format can be converted using a loop in one function without the need for (expensive) Python function call overhead. Such functions can also be translated to C and optimized. The performance difference is at least 10x based on that early code.

So adding the extra byte is not really a big issue -- the problem is essentially you would you have to rewrite all of the currently optimized serialization code just to get a much slower result. The real answer is probably to push harder for support for a NativeWithDefaults format in ClickHouse server that would add an additional "defaults" column similar to the current "nullable" column, but that's going to be a non-trivial amount of C++ work.

@PHameete
Copy link

I see - Thanks for the explanation. Going back and losing that 10x performance improvement indeed seems very undesirable. So it makes sense to not pick this up until NativeWithDefaults is supported. I can ask for support for that. There's some good case to be made to have it :]

@mshustov
Copy link
Member

depdens on ClickHouse/ClickHouse#58662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants