-
-
Notifications
You must be signed in to change notification settings - Fork 319
fix: Partially fix long running import #3300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Partially fix long running import #3300
Conversation
…s were being saved, even those that should've been ignored. Furthermore, this was done inside a loop while iterating over languages.
… haven't yet processed transaltions at that point, so there's nothing to populate
WalkthroughIntroduced an intermediate Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Import Processor
participant Proc as CoreImportFilesProcessor
participant Cache as importedTranslations
participant IDM as ImportDataManager
participant Store as storedTranslations
UI->>Proc: processFiles(saveData flag)
Proc->>Proc: processLanguages() -- defer preselection, mark non-ignored
Proc->>Proc: processTranslations() -- phase 1
Proc->>Proc: getOrCreateKey() for each entry
Proc->>Cache: populate importedTranslations[language][key] += translations
note right of Cache `#DFF2E1`: Intermediate grouped storage
Proc->>Proc: processTranslations() -- phase 2
loop per key/lang
Proc->>Cache: read translations for collisions
alt no collision / ok
Proc->>IDM: addToStoredTranslations(...) or queue
else collision
Proc->>Proc: resolve collision
end
end
Proc->>IDM: populateStoredTranslationsFrom(importedTranslations)
IDM->>Store: merge into storedTranslations (existing + new)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25–30 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ecc59e4 to
cee7266
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt(4 hunks)backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt(0 hunks)
💤 Files with no reviewable changes (1)
- backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt
🧰 Additional context used
🧬 Code graph analysis (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/processors/FileProcessorContext.kt (1)
getOrCreateKey(177-179)
| translationsByKeys.forEach { (key, translations) -> | ||
| translations.forEach { translation -> | ||
| importedTranslations.putIfAbsent(translation.language, mutableMapOf()) | ||
| importedTranslations.getValue(translation.language).putIfAbsent(key, mutableListOf()) | ||
| importedTranslations.getValue(translation.language).getValue(key).add(translation) | ||
| } | ||
| keyEntity.shouldBeImported = shouldImportKey(keyEntity.name) | ||
| } | ||
|
|
||
| translationsByKeys.forEach { (key, translations) -> | ||
| translations.forEach { translation -> | ||
| translation.key = key | ||
| processTranslation(translation) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collision detection regressed: every first translation is now rejected
By adding each ImportTranslation into importedTranslations before processTranslation runs, checkForInFileCollisions now sees the current translation in storedTranslations. Because the list is non-empty, isCollision stays true, we return early at Line 367, and the very first occurrence of every key/language pair never reaches addToStoredTranslations. Imports will therefore drop all translations on the first encounter. Please exclude the current object from the collision set (or defer adding it until after collision checks) so that only previously processed entries are considered. One possible fix:
- val storedTranslations = importedTranslations[newTranslation.language]
- ?.get(newTranslation.key)
- ?: emptyList()
+ val storedTranslations = importedTranslations[newTranslation.language]
+ ?.get(newTranslation.key)
+ ?.filter { it !== newTranslation }
+ ?: emptyList()and reuse storedTranslations when computing isCollision.
Also applies to: 385-388
🤖 Prompt for AI Agents
In
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt
around lines 335-347 (and similarly 385-388), the code adds each
ImportTranslation into importedTranslations before calling processTranslation,
which causes checkForInFileCollisions to see the current translation as already
stored and incorrectly treat it as a collision; change the flow so the current
translation is not present in the set checked for collisions — either remove the
putInto-importedTranslations step before processTranslation and instead add the
translation into importedTranslations/storedTranslations after
processTranslation succeeds, or adjust the collision-check to exclude the exact
current object (e.g., compute isCollision using storedTranslations that do not
include the current translation) so only previously processed entries are
considered.
… them when checking for conflicts instead of using ImportDataManager, which queries the database
cee7266 to
f7e962d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)
329-349: Critical collision bug: translations reject themselves.The three-phase approach has a critical flaw:
Phase 1 (lines 336-342): ALL translations are added to
importedTranslationsupfront.Phase 2 (lines 344-349): Each translation is processed, including collision checks.
Problem: When
checkForInFileCollisionsruns for the first translation of a key/language pair (line 386), it queriesimportedTranslations[language][key]and finds the translation itself (added in phase 1). The checkif (storedTranslations.isNotEmpty())at line 388 becomes true,isCollisionis set to true (line 389), and we return early at line 369 without adding the translation tostoredTranslations. Result: every first occurrence of a key/language pair is incorrectly rejected as a collision.Apply this fix to exclude the current translation from the collision set:
private fun checkForInFileCollisions( newTranslation: ImportTranslation, ): Pair<Boolean, MutableList<Pair<FileIssueType, Map<FileIssueParamType, String>>>> { var isCollision = false val issues = mutableListOf<Pair<FileIssueType, Map<FileIssueParamType, String>>>() - val storedTranslations = importedTranslations[newTranslation.language]?.get(newTranslation.key) ?: emptyList() + val storedTranslations = importedTranslations[newTranslation.language] + ?.get(newTranslation.key) + ?.filter { it !== newTranslation } + ?: emptyList() if (storedTranslations.isNotEmpty()) {This ensures only previously processed entries are checked for collisions, not the current translation itself.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/app/src/test/resources/import/bigZipOfJsons.zipis excluded by!**/*.zip
📒 Files selected for processing (3)
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt(2 hunks)backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt(4 hunks)backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (1)
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/SingleStepImportControllerTest.kt (1)
performImport(460-466)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/processors/FileProcessorContext.kt (1)
getOrCreateKey(177-179)
🔇 Additional comments (5)
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (1)
31-32: LGTM: Larger test resource aligns with performance objectives.Adding a bigger test file supports the PR's goal of profiling and fixing import performance bottlenecks.
backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt (1)
163-176: LGTM: Clean bulk-merge implementation.The function correctly merges translations from an intermediate map into stored translations, using a Set to deduplicate entries before converting back to a MutableList.
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (3)
68-69: LGTM: Intermediate storage decouples collision detection from persistence.Introducing
importedTranslationsas a staging area before bulk-merging intostoredTranslationsaligns with the PR's performance goals by enabling batch operations.
235-254: Refactored language processing improves clarity.Separating the ignored-language filtering (lines 236-241) from persistence (lines 243-245) and preselection (lines 247-254) makes the flow easier to follow and ensures ignored languages are excluded from storage.
73-73: Remove this review comment; line number references are incorrect.The review comment claims a collision-detection bug exists at lines 329-349, but those lines contain standard translation processing in
processTranslations(). Collision detection logic actually exists at lines 388-402 (checkingMULTIPLE_VALUES_FOR_KEY_AND_LANGUAGE), not where referenced.The
populateStoredTranslationsFrom()call at line 73 is a straightforward merge operation that transfers intermediate translations to stored translations. Collision handling occurs separately during the actual import process through handlers inImportDataManager, not as a result of this merge call.Likely an incorrect or invalid review comment.
| performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons))) | ||
| // .andAssertThatJson { | ||
| // node("result._embedded.languages").isArray.hasSize(3) | ||
| // } | ||
| // validateSavedJsonImportData(base.project, base.userAccount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented assertions hide test failures and provide false confidence.
All validation logic is commented out, so the test exercises the import but verifies nothing. This typically signals that the test is failing and assertions were disabled rather than fixed. Given the collision-detection bug in CoreImportFilesProcessor.kt (see my comments there), this test likely fails because translations are incorrectly rejected during import.
Restore the assertions and fix the underlying import bug:
- performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons)))
-// .andAssertThatJson {
-// node("result._embedded.languages").isArray.hasSize(3)
-// }
-// validateSavedJsonImportData(base.project, base.userAccount)
+ performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons)))
+ .andAssertThatJson {
+ node("result._embedded.languages").isArray.hasSize(3)
+ }
+ validateSavedJsonImportData(base.project, base.userAccount)
Partially fix #3175
Summary by CodeRabbit
Refactor
Tests