Skip to content

Conversation

@dvasilas
Copy link
Collaborator

Adds methods to commit/get an offset in ClickHouse for a given bucket and raft session.

An offset is a timestamp. The timestamp will be derived from the insertedAt column (the maximum insertedAt value in the processed log batch).

This will be used for committing progress after writing a log object in S3.
The log discovery query uses offests to filter out logs that have already been processed.

Copy link

@fredmnl fredmnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved if we remove that last function. I think you meant to remove it (and I believe that ClickHouse provides read after write consistency as long as you use the same replica, doesn't it?)

return nil
}

// WaitForOffset polls ClickHouse until the offset appears or times out.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is unused. Remove it?

If you want to keep it, I have some comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in two tests, no? 🤔

I added it because I was not sure if read-after-write is guaranteed when we have the pattern NULL table -> materialized view -> storage table.

But it seems that the materialized view evocation happens synchronously, so we don't need to wait for updates to appear.

I will remove the function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooohhh, makes sense. I searched for the function name in the PR but didn't find it, maybe I had collapsed the test file. I think it's better we remove it, and add it back later if the tests end up flaky.

ClickHouse operations are synchronous (excluding replication)
and polling is unnecessary.
@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

❌ Patch coverage is 91.83673% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.15%. Comparing base (35a57b4) to head (f6cbeff).

Files with missing lines Patch % Lines
pkg/logcourier/offset.go 87.09% 2 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
pkg/logcourier/batchfinder.go 88.67% <100.00%> (ø)
pkg/logcourier/logfetch.go 89.28% <100.00%> (ø)
pkg/testutil/clickhouse.go 90.62% <100.00%> (+2.35%) ⬆️
pkg/logcourier/offset.go 87.09% <87.09%> (ø)
@@                  Coverage Diff                   @@
##           improvement/LOGC-6      #11      +/-   ##
======================================================
+ Coverage               85.42%   86.15%   +0.73%     
======================================================
  Files                      11       12       +1     
  Lines                     501      513      +12     
======================================================
+ Hits                      428      442      +14     
+ Misses                     53       50       -3     
- Partials                   20       21       +1     
Flag Coverage Δ
unit 86.15% <91.83%> (+0.73%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +18 to +20
// 2. Offset semantics:
// - Offset is max(inserted_at) from processed log records
// - On restart, consumer processes logs with inserted_at > last committed offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple operations have the same timestamp inserted_at could it be possible then in case of failure that we only handle the first operation and then skip the other ones ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I think no. We treat all operations in each batch as a single unit, so there are no partial failures.

But there is a case in which multiple operations with the same inserted_at could be an issue:
For https://scality.atlassian.net/browse/LOGC-10, I am going to start limiting the size of log batches, which means that operations with the same inserted_at might end up in different batches. After processing a batch, we will skip operations in the same inserted_at that were not included in the batch.
The fix I think is to add req_id to the offsets table. I will do it in LOGC-10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants