LOGC-5: Implement offset management #11

dvasilas · 2025-10-31T15:00:39Z

Adds methods to commit/get an offset in ClickHouse for a given bucket and raft session.

An offset is a timestamp. The timestamp will be derived from the insertedAt column (the maximum insertedAt value in the processed log batch).

This will be used for committing progress after writing a log object in S3.
The log discovery query uses offests to filter out logs that have already been processed.

fredmnl

Approved if we remove that last function. I think you meant to remove it (and I believe that ClickHouse provides read after write consistency as long as you use the same replica, doesn't it?)

fredmnl · 2025-11-03T13:52:29Z

pkg/testutil/clickhouse.go

 	return nil
 }
+
+// WaitForOffset polls ClickHouse until the offset appears or times out.


The function is unused. Remove it?

If you want to keep it, I have some comments.

It's used in two tests, no? 🤔

I added it because I was not sure if read-after-write is guaranteed when we have the pattern NULL table -> materialized view -> storage table.

But it seems that the materialized view evocation happens synchronously, so we don't need to wait for updates to appear.

I will remove the function.

Oooohhh, makes sense. I searched for the function name in the PR but didn't find it, maybe I had collapsed the test file. I think it's better we remove it, and add it back later if the tests end up flaky.

ClickHouse operations are synchronous (excluding replication) and polling is unnecessary.

codecov · 2025-11-03T16:07:01Z

Codecov Report

❌ Patch coverage is 91.83673% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.15%. Comparing base (35a57b4) to head (f6cbeff).

Files with missing lines	Patch %	Lines
pkg/logcourier/offset.go	87.09%	2 Missing and 2 partials ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
pkg/logcourier/batchfinder.go	`88.67% <100.00%> (ø)`
pkg/logcourier/logfetch.go	`89.28% <100.00%> (ø)`
pkg/testutil/clickhouse.go	`90.62% <100.00%> (+2.35%)`	⬆️
pkg/logcourier/offset.go	`87.09% <87.09%> (ø)`

@@                  Coverage Diff                   @@
##           improvement/LOGC-6      #11      +/-   ##
======================================================
+ Coverage               85.42%   86.15%   +0.73%     
======================================================
  Files                      11       12       +1     
  Lines                     501      513      +12     
======================================================
+ Hits                      428      442      +14     
+ Misses                     53       50       -3     
- Partials                   20       21       +1

Flag	Coverage Δ
unit	`86.15% <91.83%> (+0.73%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

BourgoisMickael · 2025-11-04T01:05:06Z

pkg/logcourier/offset.go

+// 2. Offset semantics:
+//    - Offset is max(inserted_at) from processed log records
+//    - On restart, consumer processes logs with inserted_at > last committed offset


If multiple operations have the same timestamp inserted_at could it be possible then in case of failure that we only handle the first operation and then skip the other ones ?

Currently, I think no. We treat all operations in each batch as a single unit, so there are no partial failures.

But there is a case in which multiple operations with the same inserted_at could be an issue:
For https://scality.atlassian.net/browse/LOGC-10, I am going to start limiting the size of log batches, which means that operations with the same inserted_at might end up in different batches. After processing a batch, we will skip operations in the same inserted_at that were not included in the batch.
The fix I think is to add req_id to the offsets table. I will do it in LOGC-10.

dvasilas added 2 commits October 31, 2025 16:09

LOGC-5: Add table name constants

7efee60

LOGC-5: Implement offset management

4cc88ca

fredmnl approved these changes Nov 3, 2025

View reviewed changes

LOGC-5: Remove polling wait functions from tests

f6cbeff

ClickHouse operations are synchronous (excluding replication) and polling is unnecessary.

BourgoisMickael reviewed Nov 4, 2025

View reviewed changes

BourgoisMickael approved these changes Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LOGC-5: Implement offset management #11

LOGC-5: Implement offset management #11

dvasilas commented Oct 31, 2025

Uh oh!

fredmnl left a comment

Uh oh!

fredmnl Nov 3, 2025

Uh oh!

dvasilas Nov 3, 2025

Uh oh!

fredmnl Nov 3, 2025

Uh oh!

codecov bot commented Nov 3, 2025

Uh oh!

BourgoisMickael Nov 4, 2025

Uh oh!

dvasilas Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LOGC-5: Implement offset management #11

Are you sure you want to change the base?

LOGC-5: Implement offset management #11

Conversation

dvasilas commented Oct 31, 2025

Uh oh!

fredmnl left a comment

Choose a reason for hiding this comment

Uh oh!

fredmnl Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

dvasilas Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

fredmnl Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 3, 2025

Codecov Report

Uh oh!

BourgoisMickael Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

dvasilas Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants