nuclia · sunbit · Feb 19, 2025 · Feb 7, 2025 · Feb 7, 2025 · Feb 7, 2025
diff --git a/.github/workflows/run-e2e-nuclia-prod.yml b/.github/workflows/run-e2e-nuclia-prod.yml
@@ -42,7 +42,7 @@ jobs:
           TEST_AWS_US_EAST_2_1_NUCLIA_NUA: ${{ secrets.TEST_AWS_US_EAST_2_1_NUCLIA_NUA }}
           TEST_GMAIL_APP_PASSWORD: ${{ secrets.TEST_GMAIL_APP_PASSWORD }}
         run: |
-          TEST_ENV=prod pytest -sxv nuclia_e2e/nuclia_e2e/tests \
+          TEST_ENV=prod pytest -sxvr nuclia_e2e/nuclia_e2e/tests \
             --durations=0 --junitxml=nuclia-${{ matrix.shard_index }}.xml \
             --shard-id=${{ matrix.shard_index }} --num-shards=3
 

diff --git a/.github/workflows/run-e2e-nuclia-stage.yml b/.github/workflows/run-e2e-nuclia-stage.yml
@@ -43,7 +43,7 @@ jobs:
           STAGE_GLOBAL_RECAPTCHA: ${{ secrets.STAGE_GLOBAL_RECAPTCHA }}
           TEST_GMAIL_APP_PASSWORD: ${{ secrets.TEST_GMAIL_APP_PASSWORD }}
         run: |
-          TEST_ENV=stage pytest -sxv nuclia_e2e/nuclia_e2e/tests \
+          TEST_ENV=stage pytest -sxvr nuclia_e2e/nuclia_e2e/tests \
             --durations=0 --junitxml=nuclia-${{ matrix.shard_index }}.xml \
             --shard-id=${{ matrix.shard_index }} --num-shards=3
 

diff --git a/nuclia_e2e/Makefile b/nuclia_e2e/Makefile
@@ -7,6 +7,7 @@ lint:
 	mypy .
 
 lint-autofix:
+	ruff format .
 	ruff check . --fix
 	mypy .
 

diff --git a/nuclia_e2e/README.md b/nuclia_e2e/README.md
@@ -110,6 +110,28 @@ The Nuclia SDK is implemented as a singleton in terms of how it handles the conf
 
 --
 
+## Testing locally
+
+These e2e tests are meant to be run against real environments (stage, prod). To run them locally, you'll need to configure your local machine to point to one of those.
+
+For stage, you'll first need the following env vars:
+``` shell
+export TEST_GMAIL_APP_PASSWORD="..."
+export STAGE_PERMAMENT_ACCOUNT_OWNER_PAT_TOKEN="..."
+export STAGE_ROOT_PAT_TOKEN="..."
+export STAGE_GLOBAL_RECAPTCHA="..."
+export TEST_EUROPE1_STASHIFY_NUA="..."
+```
+
+And then run tests defining `TEST_ENV=stage`:
+``` shell
+TEST_ENV=stage pytest nuclia_e2e/tests <...>
+```
+
+If you don't have those environment variables, ask for them to the owners of the repo :)
+
+--
+
 ## TODO
 - **Minimum Endpoint Coverage**: Confirm that all documented public endpoints appear in at least one E2E test.
 - **Feature Checklist** : Maintain an up-to-date list of features, noting whether they have E2E coverage and why.
diff --git a/nuclia_e2e/nuclia_e2e/assets/plaintext_manual_split.txt b/nuclia_e2e/nuclia_e2e/assets/plaintext_manual_split.txt
@@ -0,0 +1,25 @@
+This is a very small paragraph that goes alone
+
+This is intended to be a decently sized paragraph that should also be by itself. Isn't it great? Now users can split their own documents manually! :D
+
+This one as well
+
+Let's now do a very long one to see if our backup splitting works. Here follows some random text. Lorsque j'avais six ans j'ai vu, une fois, une magnifique image, dans un livre sur la Forêt Vierge qui s'appelait"Histoires Vécues". Ça représentait un serpent boa qui avalait un fauve. Voilà la copie du dessin.
+On disait dans le livre :"Les serpents boas avalent leur proie tout entière, sans la mâcher. Ensuite ils ne peuvent plus bouger et ils dorment pendant les six mois de leur digestion."
+J'ai alors beaucoup réfléchi sur les aventures de la jungle et, à mon tour, j'ai réussi, avec un crayon de couleur, à tracer mon premier dessin. Mon dessin numéro i. Il était comme ça :
+J'ai montré mon chef-d'oeuvre aux grandes personnes et je leur ai demandé si mon dessin leur faisait peur.
+Elles m'ont répondu :"Pourquoi un chapeau ferait-il peur ?"
+Mon dessin ne représentait pas un chapeau. Il représentait un serpent boa qui digérait un éléphant. J'ai alors dessiné l'intérieur du serpent boa, afin que les grandes personnes puissent comprendre. Elles ont toujours besoin d'explications. Mon dessin numéro 2 était comme ça:
+Les grandes personnes m'ont conseillé de laisser de côté les dessins de serpents boas ouverts ou fermés, et de m'intéresser plutôt à la géographie, à l'histoire, au calcul et à la grammaire. C'est ainsi que j'ai abandonné, à l'âge de six ans, une magnifique carrière de peintre. J'avais été découragé par l'insuccès de mon dessin numéro i et de mon dessin numéro 2. Les grandes personnes ne comprennent jamais rien toutes seules, et c'est fatigant, pour les enfants, de toujours et toujours leur donner des explications.
+J'ai donc dû choisir un autre métier et j'ai appris à piloter des avions. J'ai volé un peu partout dans le monde. Et la géographie, c'est exact, m'a beaucoup servi. Je savais reconnaître, du premier coup d'oeil la Chine de l'Arizona. C'est très utile, si l'on est égaré pendant la nuit.
+J'ai ainsi eu, au cours de ma vie, des tas de contacts avec des tas de gens sérieux. J'ai beaucoup vécu chez les grandes personnes. Je les ai vues de très près. Ça n'a pas trop amélioré mon opinion.
+Quand j'en rencontrais une qui me paraissait un peu lucide, je faisais l'expérience sur elle de mon dessin numéro i que j'ai toujours conservé. Je voulais savoir si elle était vraiment compréhensive. Mais toujours elle me répondait :"C'e§t un chapeau."Alors je ne lui parlais ni de serpents boas, ni de forêts vierges, ni d'étoiles. Je me mettais à sa portée. Je lui parlais de bridge, de golf, de politique et de cravates. Et la grande personne était bien contente de connaître un homme aussi raisonnable.
+On disait dans le livre :"Les serpents boas avalent leur proie tout entière, sans la mâcher. Ensuite ils ne peuvent plus bouger et ils dorment pendant les six mois de leur digestion."
+J'ai alors beaucoup réfléchi sur les aventures de la jungle et, à mon tour, j'ai réussi, avec un crayon de couleur, à tracer mon premier dessin. Mon dessin numéro i. Il était comme ça :
+J'ai montré mon chef-d'oeuvre aux grandes personnes et je leur ai demandé si mon dessin leur faisait peur.
+Elles m'ont répondu :"Pourquoi un chapeau ferait-il peur ?"
+Mon dessin ne représentait pas un chapeau. Il représentait un serpent boa qui digérait un éléphant. J'ai alors dessiné l'intérieur du serpent boa, afin que les grandes personnes puissent comprendre. Elles ont toujours besoin d'explications. Mon dessin numéro 2 était comme ça:
+Les grandes personnes m'ont conseillé de laisser de côté les dessins de serpents boas ouverts ou fermés, et de m'intéresser plutôt à la géographie, à l'histoire, au calcul et à la grammaire. C'est ainsi que j'ai abandonné, à l'âge de six ans, une magnifique carrière de peintre. J'avais été découragé par l'insuccès de mon dessin numéro i et de mon dessin numéro 2. Les grandes personnes ne comprennent jamais rien toutes seules, et c'est fatigant, pour les enfants, de toujours et toujours leur donner des explications.
+J'ai donc dû choisir un autre métier et j'ai appris à piloter des avions. J'ai volé un peu partout dans le monde. Et la géographie, c'est exact, m'a beaucoup servi. Je savais reconnaître, du premier coup d'oeil la Chine de l'Arizona. C'est très utile, si l'on est égaré pendant la nuit.
+J'ai ainsi eu, au cours de ma vie, des tas de contacts avec des tas de gens sérieux. J'ai beaucoup vécu chez les grandes personnes. Je les ai vues de très près. Ça n'a pas trop amélioré mon opinion.
+Quand j'en rencontrais une qui me paraissait un peu lucide, je faisais l'expérience sur elle de mon dessin numéro i que j'ai toujours conservé. Je voulais savoir si elle était vraiment compréhensive. Mais toujours elle me répondait :"C'e§t un chapeau."Alors je ne lui parlais ni de serpents boas, ni de forêts vierges, ni d'étoiles. Je me mettais à sa portée. Je lui parlais de bridge, de golf, de politique et de cravates. Et la grande personne était bien contente de connaître un homme aussi raisonnable.
diff --git a/nuclia_e2e/nuclia_e2e/assets/test_slides.pptx b/nuclia_e2e/nuclia_e2e/assets/test_slides.pptx
diff --git a/nuclia_e2e/nuclia_e2e/tests/nua/test_da_tasks.py b/nuclia_e2e/nuclia_e2e/tests/nua/test_da_tasks.py
@@ -355,7 +355,7 @@ def validate_labeler_output_text_block(msg: BrokerMessage):
                     )
                 )
             ],
-            llm=LLMConfig(model="chatgpt-azure-4o-mini"),
+            llm=LLMConfig(model="gemini-1-5-flash"),
         ),
         validate_output=validate_llm_graph_output,
     ),
@@ -522,6 +522,8 @@ async def tmp_nua_key(
         yield nua_key
     finally:
         await delete_nua_key(client=pat_client, account_id=account_id, nua_client_id=nua_client_id)
+        await nua_client.stream_client.aclose()
+        await nua_client.client.aclose()
 
 
 @pytest.mark.asyncio_cooperative

diff --git a/nuclia_e2e/nuclia_e2e/tests/nua/test_llm_rag.py b/nuclia_e2e/nuclia_e2e/tests/nua/test_llm_rag.py
@@ -10,7 +10,7 @@
 # - asyncio loop overload
 # - Transient error
 # For any t of hese reasons, make sense not to retry immediately
-@pytest.mark.flaky(reruns=2, reruns_delay=10)
+@pytest.mark.flaky(reruns=4, reruns_delay=10)
 @pytest.mark.asyncio_cooperative
 @pytest.mark.parametrize("model", ALL_LLMS)
 async def test_llm_rag(nua_client: AsyncNuaClient, model):

diff --git a/nuclia_e2e/nuclia_e2e/tests/nua/test_predict.py b/nuclia_e2e/nuclia_e2e/tests/nua/test_predict.py
@@ -54,7 +54,7 @@ async def test_predict_tokens(nua_client: AsyncNuaClient):
 # - asyncio loop overload
 # - Transient error
 # For any t of hese reasons, make sense not to retry immediately
-@pytest.mark.flaky(reruns=2, reruns_delay=10)
+@pytest.mark.flaky(reruns=4, reruns_delay=10)
 @pytest.mark.asyncio_cooperative
 @pytest.mark.parametrize("model", NON_REASONING_LLMS)
 async def test_predict_rephrase(nua_client: AsyncNuaClient, model):

diff --git a/nuclia_e2e/nuclia_e2e/tests/nua/test_processor.py b/nuclia_e2e/nuclia_e2e/tests/nua/test_processor.py
@@ -39,3 +39,21 @@ async def test_vude_1(nua_client: AsyncNuaClient):
 async def test_activity(nua_client: AsyncNuaClient):
     nc = AsyncNucliaProcessing()
     await nc.status(nc=nua_client)
+
+
+@pytest.mark.asyncio_cooperative
+async def test_pptx(nua_client: AsyncNuaClient):
+    path = get_asset_file_path("test_slides.pptx")
+    nc = AsyncNucliaProcessing()
+    payload = await nc.process_file(path, kbid="kbid", timeout=300, nc=nua_client)
+    assert payload
+    assert "This is a test ppt" in payload.extracted_text[0].body.text
+
+
+@pytest.mark.asyncio_cooperative
+async def test_manual_split(nua_client: AsyncNuaClient):
+    nc = AsyncNucliaProcessing()
+    path = get_asset_file_path("plaintext_manual_split.txt")
+    payload = await nc.process_file(path, kbid="kbid", timeout=300, nc=nua_client)
+    assert payload
+    assert len(payload.field_metadata[0].metadata.metadata.paragraphs) == 11
diff --git a/nuclia_e2e/nuclia_e2e/tests/test_kb_features.py b/nuclia_e2e/nuclia_e2e/tests/test_kb_features.py
@@ -1,3 +1,4 @@
+from collections.abc import Awaitable
 from collections.abc import Callable
 from datetime import datetime
 from datetime import timedelta
@@ -24,6 +25,7 @@
 from nuclia_models.worker.proto import Operation
 from nuclia_models.worker.tasks import ApplyOptions
 from nuclia_models.worker.tasks import DataAugmentation
+from nuclia_models.worker.tasks import SemanticModelMigrationParams
 from nuclia_models.worker.tasks import TaskName
 from nucliadb_models.metadata import ResourceProcessingStatus
 from nucliadb_sdk.v2.exceptions import ClientError
@@ -76,27 +78,21 @@ async def condition() -> tuple[bool, Any]:
 
         return condition
 
-    success, _ = (
-        await wait_for(resource_is_processed(rid), logger=logger),
-        "File was not processed in time, PROCESSED status not found in resource",
-    )
+    success, _ = await wait_for(resource_is_processed(rid), logger=logger)
+    assert success, "File was not processed in time, PROCESSED status not found in resource"
 
     # Wait for resource to be indexed by searching for a resource based on a content that just
     # the paragraph we're looking for contains
     def resource_is_indexed(rid):
         @wraps(resource_is_indexed)
         async def condition() -> tuple[bool, Any]:
-            result = await kb.search.find(ndb=ndb, features=["keyword"], query="Michiko")
+            result = await kb.search.find(ndb=ndb, features=["keyword"], reranker="noop", query="Michiko")
             return len(result.resources) > 0, None
 
         return condition
 
-    success, _ = (
-        await wait_for(resource_is_indexed(rid), logger=logger, max_wait=120, interval=1),
-        "File was not indexed in time, not enough paragraphs found on resource",
-    )
-
-    assert success
+    success, _ = await wait_for(resource_is_indexed(rid), logger=logger, max_wait=120, interval=1)
+    assert success, "File was not indexed in time, not enough paragraphs found on resource"
 
 
 async def run_test_import_kb(regional_api_config, ndb: AsyncNucliaDBClient, logger: Logger):
@@ -119,11 +115,10 @@ async def condition() -> tuple[bool, Any]:
 
         return condition
 
-    success, _ = (
-        await wait_for(resources_are_imported(["disney", "hp", "vaccines"]), max_wait=120, logger=logger),
-        "Expected imported resources not found",
+    success, _ = await wait_for(
+        resources_are_imported(["disney", "hp", "vaccines"]), max_wait=120, logger=logger
     )
-    assert success
+    assert success, "Expected imported resources not found"
 
 
 async def run_test_create_da_labeller(regional_api_config, ndb: AsyncNucliaDBClient, logger: Logger):
@@ -219,11 +214,79 @@ async def condition() -> tuple[bool, Any]:
 
         return condition
 
-    success, _ = (
-        await wait_for(resources_are_labelled(expected_resource_labels), logger=logger),
-        "Expected computed labels not found in resources",
+    success, _ = await wait_for(resources_are_labelled(expected_resource_labels), logger=logger)
+    assert success, "Expected computed labels not found in resources"
+
+
+async def run_test_start_embedding_model_migration_task(ndb: AsyncNucliaDBClient) -> str:
+    kbid = ndb.kbid
+
+    # XXX: this is a really naive way to select a self-hosted model for a KB
+    # without listing learning_config. At some point, we should implement
+    # something smarter.
+    #
+    # This model names are coupled with learning_models library and a change
+    # there could break this test
+    vectorsets = await ndb.ndb.list_vector_sets(kbid=kbid)
+    assert len(vectorsets.vectorsets) == 1
+    current_model = vectorsets.vectorsets[0].id
+    new_model = "multilingual-2024-05-06" if current_model == "en-2024-04-24" else "en-2024-04-24"
+
+    # we first need to add the new embedding model in the KB
+    created = await ndb.ndb.add_vector_set(kbid=kbid, vectorset_id=new_model)
+    vectorset_id = created.id
+
+    # now we can add a migration task that will reprocess all KB data with the
+    # new model and store it in nucliadb
+    kb = AsyncNucliaKB()
+    task = await kb.task.start(
+        ndb=ndb,
+        task_name=TaskName.SEMANTIC_MODEL_MIGRATOR,
+        apply=ApplyOptions.ALL,
+        parameters=SemanticModelMigrationParams(
+            semantic_model_id=vectorset_id,
+        ),
     )
-    assert success
+
+    return task.id
+
+
+async def run_test_check_embedding_model_migration(ndb: AsyncNucliaDBClient, task_id: str, logger: Logger):
+    def new_embedding_model_available() -> Callable[[], Awaitable[tuple[bool, bool | None]]]:
+        @wraps(new_embedding_model_available)
+        async def condition() -> tuple[bool, bool | None]:
+            search_returned_results = False
+            kb = AsyncNucliaKB()
+
+            task = await kb.task.get(ndb=ndb, task_id=task_id)
+            if not task.request.completed:
+                # we have to wait until task has finished to try if it worked
+                return (False, search_returned_results)
+
+            new_model = task.request.parameters.semantic_model_id
+
+            # once finished, let's try a fast /find and validate there are
+            # results with the new semantic model
+            result = await kb.search.find(
+                ndb=ndb,
+                rephrase=False,
+                reranker="noop",
+                features=["semantic"],
+                vectorset=new_model,
+                query=TEST_CHOCO_QUESTION,
+            )
+            search_returned_results = bool(result.resources)
+            return (True, search_returned_results)
+
+        return condition
+
+    success, search_returned_results = await wait_for(
+        new_embedding_model_available(), max_wait=120, logger=logger
+    )
+    assert success is True, "embedding migration task did not finish on time"
+    assert (
+        search_returned_results is True
+    ), "expected to be able to search with the new embedding model but nucliadb didn't return resources"
 
 
 @backoff.on_exception(backoff.constant, (AssertionError, ClientError), max_tries=5, interval=5)
@@ -405,18 +468,29 @@ def logger(msg):
     # Create a labeller configuration, with the goal of testing two tings:
     # - labelling of existing resources (the ones imported)
     # - labelling of new resources(will be created later)
-    await run_test_create_da_labeller(regional_api_config, async_ndb, logger)
+    #
+    # Add a new embedding model to the KB and start a task to compute all data
+    # with the new embedding model. This will test the data flow between
+    # nucliadb and learning to stream all KB data to reprocess with the new
+    # embedding model and ingest/index in nucliadb again.
+    # We want to do it before upload to validate, on one side the migration
+    # itself, and on the other ingestion in a KB with multiple vectorsets
+    (_, embedding_migration_task_id) = await asyncio.gather(
+        run_test_create_da_labeller(regional_api_config, async_ndb, logger),
+        run_test_start_embedding_model_migration_task(async_ndb),
+    )
 
     # Upload a new resource and validate that is correctly processed and stored in nuclia
     # Also check that its index are available, by checking the amount of extracted paragraphs
     await run_test_upload_and_process(regional_api_config, async_ndb, logger)
 
     # Wait for both labeller task results to be consolidated in nucliadb while we also run semantic search
     # This /find and /ask requests are crafted so they trigger all the existing calls to predict features
-    # We wait until find succeeds to run the ask tests to maximize the changes that all indexes will be
+    # We wait until find succeeds to run the ask tests to maximize the chances that all indexes will be
     # available and so minimize the llm costs retrying
     await asyncio.gather(
         run_test_check_da_labeller_output(regional_api_config, async_ndb, logger),
+        run_test_check_embedding_model_migration(async_ndb, embedding_migration_task_id, logger),
         run_test_find(regional_api_config, async_ndb, logger),
     )
     await run_test_ask(regional_api_config, async_ndb, logger)

diff --git a/nuclia_e2e/nuclia_e2e/utils.py b/nuclia_e2e/nuclia_e2e/utils.py
@@ -22,7 +22,7 @@ def get_asset_file_path(file: str):
 
 
 async def wait_for(
-    condition: Callable[[], Awaitable],
+    condition: Callable[[], Awaitable[tuple[bool, Any]]],
     max_wait: int = 60,
     interval: int = 5,
     logger: Logger = print,
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,6 +7,7 @@ lint: @@
     	mypy .
     lint-autofix:
+    	ruff format .
     	ruff check . --fix
     	mypy .
@@ Expand Down @@