Refactor subsequences 2025 #6511

rgraber · 2025-12-01T14:10:46Z

🗒️ Checklist

run linter locally
update developer docs (API, README, inline, etc.), if any
for user-facing doc changes create a Zulip thread at #Support Docs Updates, if any
draft PR with a title <type>(<scope>)<!>: <title> DEV-1234
assign yourself, tag PR: at least Front end and/or Back end or workflow
fill in the template below and delete template comments
review thyself: read the diff and repro the preview as written
open PR & confirm that CI passes & request reviewers, if needed
delete this section before merging

📣 Summary

TODO

📖 Description

TODO

👷 Description for instance maintainers

TODO

💭 Notes

TODO

👀 Preview steps

ℹ️ have an account and a project
do this
do that
🔴 [on main] notice that this isn't anywhere
🟢 [on PR] notice that this is here
do that another thing
🟢 notice that this changed like that

…il-2024

…and with less tiptoeing around what's already there

previous work to `subsequences__old`

…2025 # Conflicts: # kobo/apps/subsequences__new/actions/manual_transcription.py

### 💭 Notes Rename "automated_google_\*" actions to "automatIC_google_\*". Also renames a few other methods for clarity. The unit tests that are failing are failing on the base branch as well.

action data within `_data` attribute for each version Two tests are failing as they were already on 0a92a24: FAILED test_models.py::SubmissionSupplementTestCase::test_retrieve_data_from_migrated_data - KeyError: '_version' FAILED test_models.py::SubmissionSupplementTestCase::test_retrieve_data_with_stale_questions - AssertionError: assert {'group_name/question_name': {'manual_translation': {'en': {'_versions': [{'_uuid': '22b04ce8-61c2-4383-836f-5d5f0ad73645', 'value': 'berserk',...

) ### 📣 Summary Fixes a CI installation issue caused by an incompatibility between `pip` 25.3 and `pip-tools` 7.x.

### Notes Unit tests only. Skips tests that we eventually want to implement but don't have implementations for yet. DRF failures are unrelated to PR. The only substantive difference is in how we add supplements to duplicated submissions. The old `update_submission_extras` method has been removed so instead we just create a SubmissionSupplement object with the correct data. --------- Co-authored-by: John N. Milner <[email protected]>

…DEV-1229 (#6492) ### 💭 Notes Add new QuestionAdvancedAction model and associated CRU endpoints. This PR does not involve actually using the models, though it does include audit logs for when users hit those endpoints. QuestionAdvancedAction logs cannot be deleted. Also includes the automatic migration of the `advanced_features` dict into corresponding QuestionAdvancedAction objects. For now it does not change the `advanced_features` dict since we are still using it, but eventually it will be updated to signal that the data in it has already been migrated and we should use the associated QuestionAdvancedAction models instead. The OpenAPI errors are pre-existing and will be dealt with at the branch level sometime before merging the full project branch. ### 👀 Preview steps 1. ℹ️ have an account and a project with an audio question 2. POST to `/api/v2/assets/<asset_uid>/advanced-features/` the following data: ``` { "action": "manual_transcription", "question_xpath": <audio question xpath>, "params": [{"language": "en"}] } ``` 4. Navigate to `/api/v2/assets/<asset-uid>/advanced-features` in a browser 5. 🟢 There should be one advanced feature in the list 6. Note the uuid of the action you just created 7. PATCH `/api/v2/assets/<asset_uid>/advanced-features/<action_uuid>/` with `{"params": ["language": "es"]}` 8. Reload `/api/v2/assets/<asset-uid>/advanced-features` 9. 🟢 [on PR] notice that the params for the action now include both English and Spanish

…-2025

…6523) ### 📣 Summary Add supplemental NLP columns to data table. ### 📖 Description This is just for adding the columns to the data table. They may not be populated correctly. If an NLP action is enabled, there will be a column for it, even if there are presently no responses. ### 💭 Notes Using analysis_form_json to avoid having to make changes on the frontend even though it's not a very descriptive name. Does not add QA questions, those will come later. Removed some of the response fields from the old analysis_from_json since they don't seem to be used. It's possible in QA or when dealing with exports they will turn out to be used but we can always add them back. ### 👀 Preview steps 1. ℹ️ have an account and a project with an audio question with at least one response 2. In a python shell, enable all NLP actions by running ``` asset = Asset.objects.get(uid={uid}) for action in [ Action.MANUAL_TRANSLATION, Action.MANUAL_TRANSCRIPTION, Action.AUTOMATIC_GOOGLE_TRANSLATION, Action.AUTOMATIC_GOOGLE_TRANSCRIPTION, ]: language = 'en' if 'transcription' in action else 'es' QuestionAdvancedFeature.objects.create( question_xpath={xpath}, action=action, params=[{'language': language}], asset=asset, ) ``` Note: this will enable English manual/automatic transcripts and Spanish automatic/manual translations 3. Navigate to the data table 4. 🟢 [on PR] Notice there are columns for English transcript and Spanish translation for the relevant question

### 📣 Summary Ensure transcriptions and translations are displayed in the data table. ### 📖 Description Only accepted transcriptions/translations will be displayed. ### 💭 Notes There were several issues preventing transcriptions and translations from showing up in the data table: 1. retrieve_data was not being called with `for_output=True`, 2. The method signature and the implementations of transform_data_for_output did not correctly reflect how the method was being called by SubmissionSupplement.retrieve_data 3. Even when for_output was set, SubmissionSupplement.retrieve_data did not output the data in the format expected by the frontend This PR addresses all of these issues. It uses the `_advanced_features` field to determine the activated features because QuestionAdvancedFeatures are not fully implemented yet. Deferred for later: Fixing drf Using QuestionAdvancedFeatures instead of `_advanced_features` Handling Qual actions ### 👀 Preview steps Going through the preview is a little annoying because columns are determined the new way (using QuestionAdvancedFeatures) but the data in the rows is determined the old way (using Asset._advanced_features). Also data can only be added by PATCH and not through the UI. 1. ℹ️ have an account and NLP set up 2. Create a new project with an audio question 3. Add a submission with an audio response 4. In a django shell, enable NLP actions the new way by running ``` asset = Asset.objects.get(uid=<uid>) for action in [ Action.MANUAL_TRANSLATION, Action.MANUAL_TRANSCRIPTION, Action.AUTOMATIC_GOOGLE_TRANSLATION, Action.AUTOMATIC_GOOGLE_TRANSCRIPTION, ]: language = 'en' if 'transcription' in action else 'es' QuestionAdvancedFeature.objects.create( question_xpath=<xpath>, action=action, params=[{'language': language}], asset=asset, ) ``` 5. Enable NLP actions the old way by running ``` asset = Asset.objects.get(uid=<uid>) asset.advanced_features = { '_version': '20250820', '_actionConfigs': { <xpath>: { 'manual_transcription': [{'language':'en'}], 'manual_translation': [{'language':'es'}], 'automatic_google_transcription': [{'language':'en'}], 'automatic_google_translation': [{'language':'es'}] } } } asset.save() ``` 6. Using curl and your authorization token, PATCH the following JSONs to `http://kf.kobo.local/api/v2/assets/<asset_uid>/data/<sub_uuid>/submission-supplement`. In between each PATCH refresh the data table. 7. manual transcription: `'{"_version":"20250820", "<xpath>": {"manual_transcription": {"language":"en", "value":"Hello"}}}'` 8. 🟢 [on PR] The transcription column should contain "Hello" 9. automatic transcription: ` '{"_version":"20250820", "<xpath>": {"automatic_google_transcription": {"language":"en"}}}'` 10. 🟢 [on PR] The transcription column should contain "Hello" 11. accepting the automatic transcription: `'{"_version":"20250820", "<xpath>": {"automatic_google_transcription": {"language":"en", "accepted": true}}}'` 12. 🟢 [on PR] The transcription column should contain the automatically generated transcription 13. automatic translation: `'{"_version":"20250820", "<xpath>": {"automatic_google_translation": {"language":"es"}}}'` 14. 🟢 [on PR] The translation column should be empty 15. accepting the automatic translation: `'{"_version":"20250820", "<xpath>": {"automatic_google_translation": {"language":"es", "accepted": true}}}'` 16. 🟢 [on PR] The translation column should contain the automatic translation 17. manual_translation: `'{"_version":"20250820", "<xpath>": {"manual_translation": {"language":"es", "value":"Hola"}}}'` 18. 🟢 [on PR] The translation column should contain "Hola"

…aint` DEV-1432 (#6534) ### 💭 Notes Will need to re-apply the 0005_questionadvancedfeature migration with the new changes.

### 💭 Notes Use the new QuestionAdvancedFeature model for revising/retrieving data instead of the asset.advanced_features dict. ### 👀 Preview steps 1. ℹ️ have an account 2. Create a new project with an audio question 3. Add a submission 4. Enable transcriptions by running ``` curl -X POST -H 'Authorization: Token <your token>' http://kf.kobo.local/api/v2/assets/<asset_uid>/advanced-features/ --json '{"question_xpath":<audio_question_xpath>, "action": "manual_transcription", "params": [{"language": "en"}]}' ``` 5. Add an English transcription by running ``` curl -X PATCH -H 'Authorization: Token <your token>' http://kf.kobo.local/api/v2/assets/<asset_uid>/data/<submission_uuid>/supplement/ --json '{"_version":"20250820", "<audio_question_xpath>": {"manual_transcription": {"language":"en", "value": "hello"}}}' ``` 6. Navigate to the data table 7. 🟢 [on PR] The transcript for the submission should show up in the table

…_for_output` for `QualAction` (#6504) ### 📣 Summary Add implementation of `get_output_fields()` and `transform_data_for_output()` in `QualAction`. ### 📖 Description This update enables qualitative analysis results to appear correctly in exports or the table view. The new logic: - Defines the output fields for each qualitative question (including labels, types, and choices). - Converts stored qualitative results into export-ready values, including expanding choice UUIDs into readable label objects.

### 💭 Notes Migrate asset.advanced_features to asset.advanced_features_set when someone hits an advanced-features endpoint or saves an existing asset. Notable Decisions: We will use known_cols only to determine which questions had nlp actions performed. If any question had a transcript or a translation in any language, we will enable all 4 nlp actions (manual transcript, automatic transcript, manual translation, automatic translation) for it, using the languages in `advanced_features` as params Removed the `set_version` method because it was causing circular imports that were quite difficult to fix and it didn't seem worth it. Note the data table will not load properly because this PR only migrates `advanced_features` and not `SubmissionSupplements`. ### 👀 Preview steps 1. ℹ️ have an account 3. [on main] Create a new project with an audio question and at least one submission 4. [on main] Add at least one transcription, one translation, and a QA question 5. Switch to the PR branch 6. Navigate to `/api/v2/assets/<uid>/advanced-features` 8. 🟢 [on PR] notice there are configured advanced features for all nlp actions (manual/automatic transcription/translation) and qual for the relevant audio question

…hema

…ect params DEV-1441 (#6548) ### 📣 Summary Validate `params` before creating new advanced features. ### 👀 Preview steps 1. ℹ️ have an account and a project 2. `curl -X POST -H 'Authorization: Token <your token>' http://kf.kobo.local/api/v2/assets/<asset_uid>/advanced-features --json '{"question_xpath": <xpath>, "action": "manual_transcription", "params": [{"something":"bad"}]}' 3. 🔴 [on refactor-subsequences-2025] request 500s 4. 🟢 [on PR] request 400s

…ental DEV-934 (#6422) ### 💭 Notes Fill out the method for converting old SubmissionExtra content dicts to the new format expected by SubmissionSupplemental for translations and transcripts. This code makes numerous assumptions to fill in information that is not present in the old structure but required in the new: 1. If old[xpath]['transcript']['value'] == old[xpath]['googlets']['value']and the language codes are the same, we assume the most recent transcript was automatically generated 2. If, for any revision in old[xpath]['transcript']['revisions'], revision['value'] is the same as old[xpath]['googlets']['value'] and the language codes match, we assume that revision was automatically generated. If multiple match, we assume they were all automatically generated. This should be pretty rare but it's possible 3. 1-2 also apply to transcriptions 4. old[xpath]['transcript']['dateModified'] will be assumed to be the creation date of the most recent revision (ie whatever is in old[xpath]['transcript']['value']). The same goes for translations 5. All uuids are newly generated 6. All old transcriptions/translations have status=complete with a _dateAccepted of now() (whenever the code is running) 7. To determine the dependency of any old translation, whether automated or manual: * If we know the source language, look for the most recent transcript in that language that was created before the translation. * If there is none, take the most recent transcription in that language * If there are no transcriptions in the source language, take the most recent transcript * If we don't know the source language, take the most recent transcript that was created before the translation * If there is none, take the most recent transcript 8. We can ignore any badly formatted revisions/transcripts/translations 9. Most recent revisions will be first in the version array

…` structure DEV-1443 (#6549) ### 📣 Summary Restore background and NLP processing by reading values from the new `_data` field. ### 📖 Description This fix updates the background processing logic to support the new data structure where value, language, and status (when present) are now nested under a `_data` dictionary. Some automated NLP actions were broken because they were still looking for these fields at the top level, where they can no longer exist. ### 👀 Preview steps 1. ℹ️ have an account and a project with an audio question 2. submit an audio file (longer than 2 minutes) 3. use the shell to process an automatic action (look at Linear for snippet) 4. 🔴 [on main] notice that the background process never starts, and if user sends acceptance, external service is still called 5. 🟢 [on PR] notice that everything works as expected

jnm and others added 30 commits April 9, 2024 11:22

Add WIP edits to subsequences README

b8eb530

Make NumberDoubler action class work

3a6d582

send number_doubler to formpack in super hacky way

23d3b22

Merge branch 'fix-subsequences-readme' into wip-refactor-subsequences

a73c969

yay

fba697e

wip

678e8ec

Merge remote-tracking branch 'origin/main' into subsequences-work-apr…

658c49f

…il-2024

Merge remote-tracking branch 'origin/main' into subsequences-work-apr…

18d41b5

…il-2024

Make unit tests pass again after merging main

6f1a982

Add grievances to README.md

91ad964

Merge remote-tracking branch 'origin/main' into august-2025

c2fd97c

Start drafting new README based on what we want

24a07b4

…and with less tiptoeing around what's already there

Begin rewriting manual transcription action

3421691

Continue rewriting manual transcription action

5cd5896

Create fresh subsequences directory, and move…

4b85d2f

previous work to `subsequences__old`

Remove unused load_params()

5091c64

Add preliminary manual transcription tests

81d4d6c

Move new work to subsequence__new instead, restore previous Django app

7d53d2a

Update revise_field to support new structure

249abad

typo

4bfdc04

More manual transcription tests, tweaks to revise_field

dc067cc

Merge remote-tracking branch 'jnm/august-2025' into august-2025

4c37d0e

wip

aaa17f1

Use data schema to build result schema

c03ee57

more

36e864b

even more

e82b0ec

Make result schema more dynamic

9f51715

Merge branch 'august-2025' of github.com:jnm/kpi-private into august-…

65d8bcc

…2025 # Conflicts: # kobo/apps/subsequences__new/actions/manual_transcription.py

Comment out timezone detection in "utc_datetime_to_simplified_iso8601"

5602de6

Move result_schema to base class

ac0ae75

noliveleger and others added 30 commits September 26, 2025 13:39

Save errors when Google Timeout is reached

8a78a77

Refactor dependencies system

b7134ed

Persist action dependency

d70fc40

Test Celery is triggered when task is in progress

0f29748

Fix dependency field on error

1beb46f

Update README

4d6abfc

Update README

b42f23d

WIP: migrate advanced_features and submission supplements

f550303

Reactivate limits

0a92a24

refactor(subsequences): rename "automated" to "automatic" (#6446)

9aec680

### 💭 Notes Rename "automated_google_\*" actions to "automatIC_google_\*". Also renames a few other methods for clarity. The unit tests that are failing are failing on the base branch as well.

Correct name of patched method

7023d27

fix(ci): pin pip<25.3 to restore compatibility with pip-tools 7.x (#6435

c0ac023

) ### 📣 Summary Fixes a CI installation issue caused by an incompatibility between `pip` 25.3 and `pip-tools` 7.x.

Merge remote-tracking branch 'origin/main' into refactor-subsequences…

0d9e892

…-2025

fix merge artifacts

038fa91

fix(subsequences): return schema if not migrated

7299c28

Merge branch 'main' into refactor-subsequences-2025

ff2323d

refactor(subsequences): update advanced features to use `UniqueConstr…

cea1bd6

…aint` DEV-1432 (#6534) ### 💭 Notes Will need to re-apply the 0005_questionadvancedfeature migration with the new changes.

Merge branch 'main' into refactor-subsequences-2025

a1839d5

fix: do not create asset version where migrating advanced features sc…

3857c71

…hema

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor subsequences 2025 #6511

Refactor subsequences 2025 #6511

Uh oh!

rgraber commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Refactor subsequences 2025 #6511

Are you sure you want to change the base?

Refactor subsequences 2025 #6511

Uh oh!

Conversation

rgraber commented Dec 1, 2025

🗒️ Checklist

📣 Summary

📖 Description

👷 Description for instance maintainers

💭 Notes

👀 Preview steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants