Fix EssPower thread-safety race condition (#3500)#3602
Fix EssPower thread-safety race condition (#3500)#3602rishabhvaish wants to merge 1 commit intoOpenEMS:developfrom
Conversation
The Solver thread calls Data.getConstraintsWithoutDisabledInverters() which reads the esss list and calls Coefficients.of(). Neither method is synchronized. When the OSGi thread concurrently calls Data.addEss() → updateInverters() → Coefficients.initialize(), the initialize() call clears the coefficient list before rebuilding it. The Solver thread can see the new ESS (via CopyOnWriteArrayList) but find no coefficient for it, causing 'Coefficient was not found' errors. This leads to the ESS operating at 0W and requires manual intervention to recover. A production site reported 40+ hours of downtime from this race condition. Fix: - Synchronize Data read methods (getConstraintsWithoutDisabledInverters, etc.) - Synchronize Coefficients.of() - Change Coefficients.initialize() to build-then-swap instead of clear-then-rebuild Fixes OpenEMS#3500 Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: Rishabh Vaish <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #3602 +/- ##
=============================================
- Coverage 58.60% 58.51% -0.08%
+ Complexity 105 104 -1
=============================================
Files 3091 3095 +4
Lines 134005 134207 +202
Branches 9882 9870 -12
=============================================
Hits 78516 78516
- Misses 52590 52772 +182
- Partials 2899 2919 +20 🚀 New features to boost your workflow:
|
|
The synchronization is unnecessary:
The "atomic swap" is not actually atomic: this.coefficients.clear(); // list is empty here
this.coefficients.addAll(newCoefficients); // list is filled here
The same window exists between clear() and addAll(). It only "works" because of() is now also synchronized - making the temporary list entirely redundant.The real issue from #3500 is something else entirely. Adding synchronized to the Data getter methods introduces unnecessary lock contention on a hot path - getConstraintsWithoutDisabledInverters() runs every cycle (~1x/second) and performs |
Summary
Data.getConstraints*andCoefficients.of()are not synchronized, whileCoefficients.initialize()clears-then-rebuilds (non-atomic)Root cause
Data.addEss()→updateInverters()→Coefficients.initialize()→ clears coefficient listgetConstraintsWithoutDisabledInverters()→ sees new ESS in CopyOnWriteArrayList → callsCoefficients.of(essId)→ coefficient not found (list was cleared)Changes
Coefficients.java(io.openems.edge.ess.api)of()→synchronized: Prevents reading coefficients whileinitialize()is rebuilding theminitialize()→ build-then-swap: Coefficients are built in a temporaryArrayListfirst, thenclear()+addAll()happen at the end while holding the monitor lock. No reader (viaof()) can observe the empty intermediate state.Data.java(io.openems.edge.ess.core)getConstraintsForAllInverters()→synchronizedgetConstraintsForInverters()→synchronizedgetConstraintsWithoutDisabledInverters()→synchronizedThese methods read
esss,inverters,coefficients, andsymmetricMode— all of which are mutated byaddEss()/removeEss()/updateInverters()(which are alreadysynchronizedon the sameDatainstance). Without synchronization, the Solver thread can observe partially-updated state.Test plan
Fixes #3500