fix: Partially fix long running import #3300

gabrielshanahan · 2025-11-06T14:22:45Z

Partially fix #3175

Summary by CodeRabbit

Refactor
- Reworked import processing to collect and merge translations in a staged, bulk-aware way for more reliable and consistent imports.
Tests
- Updated import tests to use a larger sample archive and adjusted validations to reflect the new import behavior.

…s were being saved, even those that should've been ignored. Furthermore, this was done inside a loop while iterating over languages.

… haven't yet processed transaltions at that point, so there's nothing to populate

coderabbitai · 2025-11-06T14:23:01Z

Walkthrough

Introduced an intermediate importedTranslations map to accumulate translations per language/key during import; reorganized translation processing into pre-build and commit phases; added populateStoredTranslationsFrom to ImportDataManager to merge the intermediate map into stored translations; test updated to use a larger zip resource.

Changes

Cohort / File(s)	Summary
Core Import Processing Refactor `backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt`	Added private `importedTranslations` map; `processFiles` now calls `populateStoredTranslationsFrom(importedTranslations)` after processing; `processLanguages` defers preselection and stores only non-ignored languages; `processTranslations` now two-phased (ensure/getOrCreateKey → populate `importedTranslations` grouped by language/key → iterate translations and set `translation.key` → save stored translations if `saveData`); `processTranslation` signature simplified (no key parameter) and collision checks read from `importedTranslations`.
Import Data Manager API `backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt`	Added `populateStoredTranslationsFrom(importedTranslations: MutableMap<ImportLanguage, MutableMap<ImportKey, MutableList<ImportTranslation>>>)` to merge/import the intermediate map into `storedTranslations` by language/key, combining existing and new translations.
Tests — import resource change `backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt`	Added `lateinit var bigZipOfJsons: Resource` and switched a test to use `classpath:import/bigZipOfJsons.zip`; some assertions / validation lines related to the previous zip were commented out.

Sequence Diagram(s)

sequenceDiagram
    participant UI as Import Processor
    participant Proc as CoreImportFilesProcessor
    participant Cache as importedTranslations
    participant IDM as ImportDataManager
    participant Store as storedTranslations

    UI->>Proc: processFiles(saveData flag)
    Proc->>Proc: processLanguages() -- defer preselection, mark non-ignored
    Proc->>Proc: processTranslations() -- phase 1
    Proc->>Proc: getOrCreateKey() for each entry
    Proc->>Cache: populate importedTranslations[language][key] += translations
    note right of Cache `#DFF2E1`: Intermediate grouped storage
    Proc->>Proc: processTranslations() -- phase 2
    loop per key/lang
      Proc->>Cache: read translations for collisions
      alt no collision / ok
        Proc->>IDM: addToStoredTranslations(...) or queue
      else collision
        Proc->>Proc: resolve collision
      end
    end
    Proc->>IDM: populateStoredTranslationsFrom(importedTranslations)
    IDM->>Store: merge into storedTranslations (existing + new)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25–30 minutes

Pay special attention to collision-detection changes (now reading from importedTranslations) and correctness of merge semantics in populateStoredTranslationsFrom.
Verify language preselection timing and that ignored languages are correctly excluded.
Review test change to ensure the commented assertions are intentional and coverage remains adequate.

Possibly related PRs

fix: import with meta without creating new keys (#3000) #3113 — modifies CoreImportFilesProcessor import flow and ImportDataManager key/translation handling (closely related changes).

Suggested reviewers

stepan662
JanCizmar

Poem

🐇 In burrows of bytes I softly hop,
I gather translations in one neat crop.
Keys pre-made, collisions chased away,
Then I bundle and store them at end of day.
— a happy import rabbit 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The PR addresses the performance bottleneck objective [#3175] through refactoring import processing logic to reduce N+1 queries via intermediate translation storage, but does not implement namespace preselection.
Out of Scope Changes check	✅ Passed	All changes directly support the import performance optimization objective; test modifications use a larger dataset (bigZipOfJsons) to validate performance improvements, which is in-scope.
Title check	✅ Passed	The title 'fix: Partially fix long running import' directly aligns with the PR's main objective to address the long-running import performance issue referenced in issue #3175.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d08d3c and cee7266.

📒 Files selected for processing (2)

backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (4 hunks)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt (0 hunks)

💤 Files with no reviewable changes (1)

backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt

🧰 Additional context used

🧬 Code graph analysis (1)

backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)

backend/data/src/main/kotlin/io/tolgee/service/dataImport/processors/FileProcessorContext.kt (1)

getOrCreateKey (177-179)

coderabbitai · 2025-11-06T14:27:46Z

backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt

+    translationsByKeys.forEach { (key, translations) ->
+      translations.forEach { translation ->
+        importedTranslations.putIfAbsent(translation.language, mutableMapOf())
+        importedTranslations.getValue(translation.language).putIfAbsent(key, mutableListOf())
+        importedTranslations.getValue(translation.language).getValue(key).add(translation)
      }
-      keyEntity.shouldBeImported = shouldImportKey(keyEntity.name)
    }
+
+    translationsByKeys.forEach { (key, translations) ->
+      translations.forEach { translation ->
+        translation.key = key
+        processTranslation(translation)
+      }


⚠️ Potential issue | 🔴 Critical

Collision detection regressed: every first translation is now rejected

By adding each ImportTranslation into importedTranslations before processTranslation runs, checkForInFileCollisions now sees the current translation in storedTranslations. Because the list is non-empty, isCollision stays true, we return early at Line 367, and the very first occurrence of every key/language pair never reaches addToStoredTranslations. Imports will therefore drop all translations on the first encounter. Please exclude the current object from the collision set (or defer adding it until after collision checks) so that only previously processed entries are considered. One possible fix:

- val storedTranslations = importedTranslations[newTranslation.language] - ?.get(newTranslation.key) - ?: emptyList() + val storedTranslations = importedTranslations[newTranslation.language] + ?.get(newTranslation.key) + ?.filter { it !== newTranslation } + ?: emptyList()

and reuse storedTranslations when computing isCollision.

Also applies to: 385-388

🤖 Prompt for AI Agents

In backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt around lines 335-347 (and similarly 385-388), the code adds each ImportTranslation into importedTranslations before calling processTranslation, which causes checkForInFileCollisions to see the current translation as already stored and incorrectly treat it as a collision; change the flow so the current translation is not present in the set checked for collisions — either remove the putInto-importedTranslations step before processTranslation and instead add the translation into importedTranslations/storedTranslations after processTranslation succeeds, or adjust the collision-check to exclude the exact current object (e.g., compute isCollision using storedTranslations that do not include the current translation) so only previously processed entries are considered.

… them when checking for conflicts instead of using ImportDataManager, which queries the database

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)
329-349: Critical collision bug: translations reject themselves.

The three-phase approach has a critical flaw:

Phase 1 (lines 336-342): ALL translations are added to importedTranslations upfront.

Phase 2 (lines 344-349): Each translation is processed, including collision checks.

Problem: When checkForInFileCollisions runs for the first translation of a key/language pair (line 386), it queries importedTranslations[language][key] and finds the translation itself (added in phase 1). The check if (storedTranslations.isNotEmpty()) at line 388 becomes true, isCollision is set to true (line 389), and we return early at line 369 without adding the translation to storedTranslations. Result: every first occurrence of a key/language pair is incorrectly rejected as a collision.

Apply this fix to exclude the current translation from the collision set:
 private fun checkForInFileCollisions(
     newTranslation: ImportTranslation,
   ): Pair<Boolean, MutableList<Pair<FileIssueType, Map<FileIssueParamType, String>>>> {
     var isCollision = false
     val issues =
       mutableListOf<Pair<FileIssueType, Map<FileIssueParamType, String>>>()
-    val storedTranslations = importedTranslations[newTranslation.language]?.get(newTranslation.key) ?: emptyList()
+    val storedTranslations = importedTranslations[newTranslation.language]
+      ?.get(newTranslation.key)
+      ?.filter { it !== newTranslation }
+      ?: emptyList()

     if (storedTranslations.isNotEmpty()) {
This ensures only previously processed entries are checked for collisions, not the current translation itself.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cee7266 and f7e962d.

⛔ Files ignored due to path filters (1)

backend/app/src/test/resources/import/bigZipOfJsons.zip is excluded by !**/*.zip

📒 Files selected for processing (3)

backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (2 hunks)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (4 hunks)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (1)

backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/SingleStepImportControllerTest.kt (1)

performImport (460-466)

backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)

backend/data/src/main/kotlin/io/tolgee/service/dataImport/processors/FileProcessorContext.kt (1)

getOrCreateKey (177-179)

🔇 Additional comments (5)

backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (1)

31-32: LGTM: Larger test resource aligns with performance objectives.

Adding a bigger test file supports the PR's goal of profiling and fixing import performance bottlenecks.

backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt (1)

163-176: LGTM: Clean bulk-merge implementation.

The function correctly merges translations from an intermediate map into stored translations, using a Set to deduplicate entries before converting back to a MutableList.

backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (3)

68-69: LGTM: Intermediate storage decouples collision detection from persistence.

Introducing importedTranslations as a staging area before bulk-merging into storedTranslations aligns with the PR's performance goals by enabling batch operations.

235-254: Refactored language processing improves clarity.

Separating the ignored-language filtering (lines 236-241) from persistence (lines 243-245) and preselection (lines 247-254) makes the flow easier to follow and ensures ignored languages are excluded from storage.

73-73: Remove this review comment; line number references are incorrect.

The review comment claims a collision-detection bug exists at lines 329-349, but those lines contain standard translation processing in processTranslations(). Collision detection logic actually exists at lines 388-402 (checking MULTIPLE_VALUES_FOR_KEY_AND_LANGUAGE), not where referenced.

The populateStoredTranslationsFrom() call at line 73 is a straightforward merge operation that transfers intermediate translations to stored translations. Collision handling occurs separately during the actual import process through handlers in ImportDataManager, not as a result of this merge call.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-11-06T15:06:29Z

...est/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt

+    performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons)))
+//      .andAssertThatJson {
+//        node("result._embedded.languages").isArray.hasSize(3)
+//      }
+//    validateSavedJsonImportData(base.project, base.userAccount)


⚠️ Potential issue | 🔴 Critical

Commented assertions hide test failures and provide false confidence.

All validation logic is commented out, so the test exercises the import but verifies nothing. This typically signals that the test is failing and assertions were disabled rather than fixed. Given the collision-detection bug in CoreImportFilesProcessor.kt (see my comments there), this test likely fails because translations are incorrectly rejected during import.

Restore the assertions and fix the underlying import bug:

- performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons))) -// .andAssertThatJson { -// node("result._embedded.languages").isArray.hasSize(3) -// } -// validateSavedJsonImportData(base.project, base.userAccount) + performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons))) + .andAssertThatJson { + node("result._embedded.languages").isArray.hasSize(3) + } + validateSavedJsonImportData(base.project, base.userAccount)

gabrielshanahan added 4 commits November 6, 2025 15:21

fix: Fix incorrect call to importService.saveLanguages - all language…

1cb7261

…s were being saved, even those that should've been ignored. Furthermore, this was done inside a loop while iterating over languages.

fix: Remove call to importDataManager.populateStoredTranslations - we…

03370ba

… haven't yet processed transaltions at that point, so there's nothing to populate

refactor: remove unnecessary loop qualifier

a3276b4

refactor: Make CoreImportFilesProcessor.processTranslations less tangled

9e5e03a

gabrielshanahan force-pushed the gabrielshanahan/partially-fix-long-running-import branch from ecc59e4 to cee7266 Compare November 6, 2025 14:24

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

gabrielshanahan added 2 commits November 6, 2025 16:01

perf: Make import quicker by caching imported translations, and using…

877b299

… them when checking for conflicts instead of using ImportDataManager, which queries the database

to be removed: test on big enough json

f7e962d

gabrielshanahan force-pushed the gabrielshanahan/partially-fix-long-running-import branch from cee7266 to f7e962d Compare November 6, 2025 15:01

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

JanCizmar changed the title ~~Partially fix long running import~~ fix: Partially fix long running import Nov 11, 2025

Anty0 requested review from Anty0 and removed request for Anty0 November 25, 2025 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: Partially fix long running import #3300

fix: Partially fix long running import #3300

gabrielshanahan commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 6, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

fix: Partially fix long running import #3300

Are you sure you want to change the base?

fix: Partially fix long running import #3300

Conversation

gabrielshanahan commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gabrielshanahan commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 6, 2025 •

edited

Loading