Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom date logic to extend effective range of date type #39

Merged
merged 16 commits into from
Oct 22, 2024

Conversation

ADBond
Copy link
Owner

@ADBond ADBond commented Oct 22, 2024

Native date types in Clickhouse don't work for dates before 1900. This is fine for most use cases, but occasionally one may be interested in linking data that deals with dates before then - for instance historical census data.

This creates some custom logic to handle dates before then. Specifically:

  • Custom SQL udf days_since_epoch that converts a string (in standard ISO-8601 format) into a signed integer representing the number of days since 1970-01-01 (-ve days meaning days before this)
  • Extending Splink's ColumnExpression to create a transform for parsing a date to an int of this type
  • Versions of cll.AbsoluteDateDifferenceLevel, cl.AbsoluteDateDifferenceAtThresholds, and cl.DateOfBirthComparison that all use this type under-the-hood, either with a pre-made integer column, or doing the parsing on-the-fly

This is treating dates as being in the proleptic Gregorian calendar.

@ADBond ADBond force-pushed the feature/custom-date-logic branch from 7b22cfe to 9108102 Compare October 22, 2024 19:24
@ADBond ADBond merged commit 8dc5042 into main Oct 22, 2024
15 checks passed
@ADBond ADBond deleted the feature/custom-date-logic branch October 22, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant