Skip to content

MCR-3695 MCRMetadataManager: per-object locking for write methods#2929

Open
sebhofmann wants to merge 3 commits intomainfrom
issues/MCR-3695-MCRMetadataManager_per_object_locking
Open

MCR-3695 MCRMetadataManager: per-object locking for write methods#2929
sebhofmann wants to merge 3 commits intomainfrom
issues/MCR-3695-MCRMetadataManager_per_object_locking

Conversation

@sebhofmann
Copy link
Copy Markdown
Member

Link to jira

Summary

Introduces a per-MCRObjectID lock primitive on MCRMetadataManager so that all write methods (update/create/delete × MCRObject/MCRDerivate) are atomic, and exposes the lock as an opt-in MCRMetadataManager.lock(MCRObjectID) API for callers that need to protect their own read-modify-write spans across multiple manager calls.

The optimistic checkModificationDates check was a TOCTOU because it lived outside any lock; it now sits inside the locked critical section and is finally reliable.

MCRPIService.register() and MCRPIService.updateFlag() use the new caller-side lock(id) to fix the PI flag race that motivated this work — concurrent flag mutations no longer silently lose entries.

Design highlights

  • MCRMetadataManager.lock(MCRObjectID) returns an AutoCloseable MCRObjectLock (try-with-resources).
  • Backed by ConcurrentHashMap<MCRObjectID, LockEntry> + reference counting; entries dropped at refcount 0 (no map growth).
  • ReentrantLock per entry — EventHandlers can re-enter update() on the same id without deadlock.
  • tryLock with configurable timeout; throws MCRPersistenceException on timeout — caller can retry, surface, or abort.

Configuration

Property Default Meaning
MCR.Metadata.Manager.LockTimeoutSeconds 60 tryLock timeout before MCRPersistenceException

Scope

  • Single-JVM only. Multi-node clustering would require distributed locking; out of scope.
  • No public API breakage. Existing callers gain serialisation transparently. New failure modes for concurrent writers: lock timeout or now-reliable stale modifyDate — both MCRPersistenceException, which the API already declared.

Pull Request Checklist (Author)

Ticket & Documentation

  • The issue in the ticket is clearly described and the solution is documented.
  • Design decisions (if any) are explained.
  • The ticket references the correct source and target branches.
  • The fixed-version is correctly set in the ticket and matches the PR's target branch (main).

Feature & Improvement Specific Checks

  • Instructions on how to test or use the feature are included or linked (e.g. to documentation).
  • For UI changes: before & after screenshots are attached. (N/A — server-side change)
  • New features or migrations are documented. (Javadoc on lock() + property in mycore.properties)
  • Does this change affect existing applications, data, or configurations?
    • Yes: Is a migration required? If yes, describe it. (No migration; transparent for existing callers.)
    • Breaking change is marked in the commit message. (No breaking change.)

Bugfix-Specific Checks

(N/A — improvement, not bugfix)

Testing

  • I have tested the changes locally (CI=true mvn clean install green).
  • The feature behaves as described in the ticket.
  • Were existing tests modified? (No — only new tests added.)

MCR Conventions & Metadata

  • MCR naming conventions are followed.
  • If the public API has changed:
    • Old API is deprecated or a migration is documented. (N/A — only additions, no removal/deprecation.)
    • If not, no action needed. (API addition only.)
  • Java license headers are added where necessary.
  • Javadoc is written for non-self-explanatory classes/methods (Clean Code).
  • All configuration options are documented in Javadoc and mycore.properties.
  • No default properties are hardcoded — all set via mycore.properties.

Multi-Repo Considerations

  • Is an equivalent PR in MIR required?
    • If yes, is it already created? (No MIR change required — purely internal serialisation.)

Test plan

  • New tests:
    • MCRMetadataManagerLockTest — 5 unit tests on the lock primitive: reentrancy, cross-id parallelism, timeout, refcount cleanup, mutual exclusion.
    • MCRMetadataManagerConcurrentUpdateTest — 2 deterministic integration tests against a real metadata store. Both fail without the auto-lock:
      • updateBlocksWhileExternalLockHeld — caller holds lock(id) on a separate thread; main-thread update(id) must block until release.
      • forcedRaceProducesExactlyOneWinnerCyclicBarrier forces both threads to retrieve at the same modifyDate snapshot before either persists; with the lock, one wins and the other is rejected as stale; without the lock, lost update.
  • Verified: CI=true mvn clean install green on this branch.

Introduce a per-MCRObjectID lock primitive on MCRMetadataManager:

- Public lock(MCRObjectID) returns an AutoCloseable MCRObjectLock for
  use with try-with-resources.
- Backed by ConcurrentHashMap<MCRObjectID, LockEntry> with reference
  counting; entries dropped at refcount 0.
- ReentrantLock per entry — EventHandlers can re-enter update() on the
  same id without deadlock.
- tryLock with configurable timeout (MCR.Metadata.Manager.LockTimeoutSeconds,
  default 60 s); throws MCRPersistenceException on timeout.
- All six write methods (update/create/delete x MCRObject/MCRDerivate)
  acquire the lock automatically; checkModificationDates now sits inside
  the locked critical section, so the optimistic-lock check is reliable.
- MCRPIService.register() and MCRPIService.updateFlag() wrap their
  retrieve-mutate-update spans in caller-side lock(id) to fix the PI
  flag race that motivated this work.
AutoCloseable strongly recommends idempotent close. A second close()
on the same handle previously corrupted refcount/unlock state. Guard
with a closed flag (handle is single-thread by contract).
@kkrebs kkrebs requested a review from toKrause May 5, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant