Skip to content

Conversation

@gabrielshanahan
Copy link

@gabrielshanahan gabrielshanahan commented Nov 6, 2025

Partially fix #3175

Summary by CodeRabbit

  • Refactor

    • Reworked import processing to collect and merge translations in a staged, bulk-aware way for more reliable and consistent imports.
  • Tests

    • Updated import tests to use a larger sample archive and adjusted validations to reflect the new import behavior.

…s were being saved, even those that should've been ignored. Furthermore, this was done inside a loop while iterating over languages.
… haven't yet processed transaltions at that point, so there's nothing to populate
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 6, 2025

Walkthrough

Introduced an intermediate importedTranslations map to accumulate translations per language/key during import; reorganized translation processing into pre-build and commit phases; added populateStoredTranslationsFrom to ImportDataManager to merge the intermediate map into stored translations; test updated to use a larger zip resource.

Changes

Cohort / File(s) Summary
Core Import Processing Refactor
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt
Added private importedTranslations map; processFiles now calls populateStoredTranslationsFrom(importedTranslations) after processing; processLanguages defers preselection and stores only non-ignored languages; processTranslations now two-phased (ensure/getOrCreateKey → populate importedTranslations grouped by language/key → iterate translations and set translation.key → save stored translations if saveData); processTranslation signature simplified (no key parameter) and collision checks read from importedTranslations.
Import Data Manager API
backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt
Added populateStoredTranslationsFrom(importedTranslations: MutableMap<ImportLanguage, MutableMap<ImportKey, MutableList<ImportTranslation>>>) to merge/import the intermediate map into storedTranslations by language/key, combining existing and new translations.
Tests — import resource change
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt
Added lateinit var bigZipOfJsons: Resource and switched a test to use classpath:import/bigZipOfJsons.zip; some assertions / validation lines related to the previous zip were commented out.

Sequence Diagram(s)

sequenceDiagram
    participant UI as Import Processor
    participant Proc as CoreImportFilesProcessor
    participant Cache as importedTranslations
    participant IDM as ImportDataManager
    participant Store as storedTranslations

    UI->>Proc: processFiles(saveData flag)
    Proc->>Proc: processLanguages() -- defer preselection, mark non-ignored
    Proc->>Proc: processTranslations() -- phase 1
    Proc->>Proc: getOrCreateKey() for each entry
    Proc->>Cache: populate importedTranslations[language][key] += translations
    note right of Cache `#DFF2E1`: Intermediate grouped storage
    Proc->>Proc: processTranslations() -- phase 2
    loop per key/lang
      Proc->>Cache: read translations for collisions
      alt no collision / ok
        Proc->>IDM: addToStoredTranslations(...) or queue
      else collision
        Proc->>Proc: resolve collision
      end
    end
    Proc->>IDM: populateStoredTranslationsFrom(importedTranslations)
    IDM->>Store: merge into storedTranslations (existing + new)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25–30 minutes

  • Pay special attention to collision-detection changes (now reading from importedTranslations) and correctness of merge semantics in populateStoredTranslationsFrom.
  • Verify language preselection timing and that ignored languages are correctly excluded.
  • Review test change to ensure the commented assertions are intentional and coverage remains adequate.

Possibly related PRs

Suggested reviewers

  • stepan662
  • JanCizmar

Poem

🐇 In burrows of bytes I softly hop,
I gather translations in one neat crop.
Keys pre-made, collisions chased away,
Then I bundle and store them at end of day.
— a happy import rabbit 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR addresses the performance bottleneck objective [#3175] through refactoring import processing logic to reduce N+1 queries via intermediate translation storage, but does not implement namespace preselection.
Out of Scope Changes check ✅ Passed All changes directly support the import performance optimization objective; test modifications use a larger dataset (bigZipOfJsons) to validate performance improvements, which is in-scope.
Title check ✅ Passed The title 'fix: Partially fix long running import' directly aligns with the PR's main objective to address the long-running import performance issue referenced in issue #3175.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gabrielshanahan gabrielshanahan force-pushed the gabrielshanahan/partially-fix-long-running-import branch from ecc59e4 to cee7266 Compare November 6, 2025 14:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d08d3c and cee7266.

📒 Files selected for processing (2)
  • backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (4 hunks)
  • backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt (0 hunks)
💤 Files with no reviewable changes (1)
  • backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt
🧰 Additional context used
🧬 Code graph analysis (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/processors/FileProcessorContext.kt (1)
  • getOrCreateKey (177-179)

Comment on lines +335 to +348
translationsByKeys.forEach { (key, translations) ->
translations.forEach { translation ->
importedTranslations.putIfAbsent(translation.language, mutableMapOf())
importedTranslations.getValue(translation.language).putIfAbsent(key, mutableListOf())
importedTranslations.getValue(translation.language).getValue(key).add(translation)
}
keyEntity.shouldBeImported = shouldImportKey(keyEntity.name)
}

translationsByKeys.forEach { (key, translations) ->
translations.forEach { translation ->
translation.key = key
processTranslation(translation)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Collision detection regressed: every first translation is now rejected

By adding each ImportTranslation into importedTranslations before processTranslation runs, checkForInFileCollisions now sees the current translation in storedTranslations. Because the list is non-empty, isCollision stays true, we return early at Line 367, and the very first occurrence of every key/language pair never reaches addToStoredTranslations. Imports will therefore drop all translations on the first encounter. Please exclude the current object from the collision set (or defer adding it until after collision checks) so that only previously processed entries are considered. One possible fix:

-    val storedTranslations = importedTranslations[newTranslation.language]
-      ?.get(newTranslation.key)
-      ?: emptyList()
+    val storedTranslations = importedTranslations[newTranslation.language]
+      ?.get(newTranslation.key)
+      ?.filter { it !== newTranslation }
+      ?: emptyList()

and reuse storedTranslations when computing isCollision.

Also applies to: 385-388

🤖 Prompt for AI Agents
In
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt
around lines 335-347 (and similarly 385-388), the code adds each
ImportTranslation into importedTranslations before calling processTranslation,
which causes checkForInFileCollisions to see the current translation as already
stored and incorrectly treat it as a collision; change the flow so the current
translation is not present in the set checked for collisions — either remove the
putInto-importedTranslations step before processTranslation and instead add the
translation into importedTranslations/storedTranslations after
processTranslation succeeds, or adjust the collision-check to exclude the exact
current object (e.g., compute isCollision using storedTranslations that do not
include the current translation) so only previously processed entries are
considered.

… them when checking for conflicts instead of using ImportDataManager, which queries the database
@gabrielshanahan gabrielshanahan force-pushed the gabrielshanahan/partially-fix-long-running-import branch from cee7266 to f7e962d Compare November 6, 2025 15:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)

329-349: Critical collision bug: translations reject themselves.

The three-phase approach has a critical flaw:

Phase 1 (lines 336-342): ALL translations are added to importedTranslations upfront.

Phase 2 (lines 344-349): Each translation is processed, including collision checks.

Problem: When checkForInFileCollisions runs for the first translation of a key/language pair (line 386), it queries importedTranslations[language][key] and finds the translation itself (added in phase 1). The check if (storedTranslations.isNotEmpty()) at line 388 becomes true, isCollision is set to true (line 389), and we return early at line 369 without adding the translation to storedTranslations. Result: every first occurrence of a key/language pair is incorrectly rejected as a collision.

Apply this fix to exclude the current translation from the collision set:

 private fun checkForInFileCollisions(
     newTranslation: ImportTranslation,
   ): Pair<Boolean, MutableList<Pair<FileIssueType, Map<FileIssueParamType, String>>>> {
     var isCollision = false
     val issues =
       mutableListOf<Pair<FileIssueType, Map<FileIssueParamType, String>>>()
-    val storedTranslations = importedTranslations[newTranslation.language]?.get(newTranslation.key) ?: emptyList()
+    val storedTranslations = importedTranslations[newTranslation.language]
+      ?.get(newTranslation.key)
+      ?.filter { it !== newTranslation }
+      ?: emptyList()

     if (storedTranslations.isNotEmpty()) {

This ensures only previously processed entries are checked for collisions, not the current translation itself.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cee7266 and f7e962d.

⛔ Files ignored due to path filters (1)
  • backend/app/src/test/resources/import/bigZipOfJsons.zip is excluded by !**/*.zip
📒 Files selected for processing (3)
  • backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (2 hunks)
  • backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (4 hunks)
  • backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (1)
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/SingleStepImportControllerTest.kt (1)
  • performImport (460-466)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (1)
backend/data/src/main/kotlin/io/tolgee/service/dataImport/processors/FileProcessorContext.kt (1)
  • getOrCreateKey (177-179)
🔇 Additional comments (5)
backend/app/src/test/kotlin/io/tolgee/api/v2/controllers/v2ImportController/V2ImportControllerAddFilesTest.kt (1)

31-32: LGTM: Larger test resource aligns with performance objectives.

Adding a bigger test file supports the PR's goal of profiling and fixing import performance bottlenecks.

backend/data/src/main/kotlin/io/tolgee/service/dataImport/ImportDataManager.kt (1)

163-176: LGTM: Clean bulk-merge implementation.

The function correctly merges translations from an intermediate map into stored translations, using a Set to deduplicate entries before converting back to a MutableList.

backend/data/src/main/kotlin/io/tolgee/service/dataImport/CoreImportFilesProcessor.kt (3)

68-69: LGTM: Intermediate storage decouples collision detection from persistence.

Introducing importedTranslations as a staging area before bulk-merging into storedTranslations aligns with the PR's performance goals by enabling batch operations.


235-254: Refactored language processing improves clarity.

Separating the ignored-language filtering (lines 236-241) from persistence (lines 243-245) and preselection (lines 247-254) makes the flow easier to follow and ensures ignored languages are excluded from storage.


73-73: Remove this review comment; line number references are incorrect.

The review comment claims a collision-detection bug exists at lines 329-349, but those lines contain standard translation processing in processTranslations(). Collision detection logic actually exists at lines 388-402 (checking MULTIPLE_VALUES_FOR_KEY_AND_LANGUAGE), not where referenced.

The populateStoredTranslationsFrom() call at line 73 is a straightforward merge operation that transfers intermediate translations to stored translations. Collision handling occurs separately during the actual import process through handlers in ImportDataManager, not as a result of this merge call.

Likely an incorrect or invalid review comment.

Comment on lines +198 to +202
performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons)))
// .andAssertThatJson {
// node("result._embedded.languages").isArray.hasSize(3)
// }
// validateSavedJsonImportData(base.project, base.userAccount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Commented assertions hide test failures and provide false confidence.

All validation logic is commented out, so the test exercises the import but verifies nothing. This typically signals that the test is failing and assertions were disabled rather than fixed. Given the collision-detection bug in CoreImportFilesProcessor.kt (see my comments there), this test likely fails because translations are incorrectly rejected during import.

Restore the assertions and fix the underlying import bug:

-    performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons)))
-//      .andAssertThatJson {
-//        node("result._embedded.languages").isArray.hasSize(3)
-//      }
-//    validateSavedJsonImportData(base.project, base.userAccount)
+    performImport(projectId = base.project.id, listOf(Pair("bigZipOfJsons.zip", bigZipOfJsons)))
+      .andAssertThatJson {
+        node("result._embedded.languages").isArray.hasSize(3)
+      }
+    validateSavedJsonImportData(base.project, base.userAccount)

@JanCizmar JanCizmar changed the title Partially fix long running import fix: Partially fix long running import Nov 11, 2025
@Anty0 Anty0 requested review from Anty0 and removed request for Anty0 November 25, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve importing with namespaces

1 participant