Fix tests (#322)

slice4e · web-flow · commit 6520d3af99ea · 2025-10-27T15:45:08.000Z
* fix(tests): Use architecture-specific streams in coordinator tests Fix 7 failing tests in test_self_contained_coordinator_memtier.py by: - Adding messages to arch-specific streams instead of base stream - Fixing consumer group creation parameters (arch and id) - Updating assertions to check arch-specific streams This aligns tests with the arch-specific stream routing implemented in the coordinator, which reads from streams like: - oss:api:gh/redis/redis/builds:amd64 (for amd64) - oss:api:gh/redis/redis/builds:arm64 (for arm64) Fixes: - test_self_contained_coordinator_dockerhub_preload - test_self_contained_coordinator_dockerhub - test_self_contained_coordinator_dockerhub_iothreads - test_self_contained_coordinator_dockerhub_valkey - test_dockerhub_via_cli - test_dockerhub_via_cli_airgap - test_self_contained_coordinator_duplicated_ts * style: Format test file with black Apply black formatting to test_self_contained_coordinator_memtier.py to comply with CI code style checks. * style: Format runner.py with black Apply black formatting to runner.py for FLUSHALL changes from PR #320 to comply with CI code style checks. * fix(runner): Skip validation for untested operation types Improve metric validation to skip operation-specific metrics when those operations are not in the tested-commands list. For example, a SET-only test (--ratio 1:0) will now skip validation of Gets.Ops/sec metric, which would legitimately be 0. This fixes CI test failures where SET-only tests were failing validation because Gets.Ops/sec was 0 (below the 10 QPS threshold). The validation now: 1. Checks benchmark_config for 'tested-commands' list 2. Skips metrics for operation types (gets, sets, hgets, etc.) that are not in the tested-commands list 3. Still validates metrics for operations that are actually being tested * fix(tests): Add git_version to benchmark stream requests Add git_version parameter when calling generate_benchmark_stream_request in tests. This ensures version information is propagated through the stream to the coordinator, allowing by.version keys to be created in Redis. Without this, the tests expect by.version keys to exist but they were never created because artifact_version was None in the data export logic. Fixes test assertions that check for: - ci.benchmarks.redis/.../by.version/{version}/benchmark_end/.../memory_maxmemory Tests fixed: - test_self_contained_coordinator_dockerhub_preload - test_self_contained_coordinator_dockerhub - test_self_contained_coordinator_dockerhub_iothreads - test_self_contained_coordinator_dockerhub_valkey - test_self_contained_coordinator_duplicated_ts * fix(cli): Extract git_version from Docker image tag - Add regex-based version extraction from image names like 'redis:7.4.0' or 'valkey/valkey:7.2.6-bookworm' - Pass extracted git_version to generate_benchmark_stream_request() - Enables by.version Redis TimeSeries keys to be created for CLI-triggered tests - Fixes test_dockerhub_via_cli and test_dockerhub_via_cli_airgap assertion failures * style: Format cli.py with black Apply black formatting to comply with CI code style checks. * debug: Add logging to diagnose artifact_version issue * debug: Add print statements to track artifact_version value * style: Format runner.py with black * fix: Include running_platform in by.version key paths The export_redis_metrics function creates keys with the format: {prefix}/{test}/{by_variant}/benchmark_end/{running_platform}/{setup}/{metric} But tests were expecting keys without running_platform: {prefix}/{test}/{by_variant}/benchmark_end/{setup}/{metric} This fixes all 6 failing tests by adding running_platform to the expected key format. Also removes debug print statements that helped diagnose the issue. * fix: Add JSON module support and correct test metadata This commit fixes stats validation errors: 1. Add JSON module commands to commands.json: - JSON.GET: Return value at path in JSON serialized form - JSON.SET: Sets or updates JSON value at a path 2. Add 'json' group to groups.json with JSON commands 3. Fix test-suite metadata issues: - memtier_benchmark-playbook-rate-limiting-lua-100k-sessions.yml: Remove 'bitmap' from tested-groups (only scripting is tested via EVAL) - memtier_benchmark-playbook-realtime-analytics-membership.yml: Add 'sunion' to tested-commands (SUNION command was used but not listed) - memtier_benchmark-playbook-realtime-analytics-membership-pipeline-10.yml: Add 'sunion' to tested-commands These changes resolve all stats validation errors in CI. * fix missing quote from test case memtier_benchmark-1Mkeys-string-set-with-ex-100B-pipeline-10.yml
diff --git a/commands.json b/commands.json
diff --git a/groups.json b/groups.json
@@ -1,70 +1,74 @@
 {
   "bitmap": {
-    "display": "Bitmap",
-    "description": "Operations on the Bitmap data type"
+    "description": "Operations on the Bitmap data type",
+    "display": "Bitmap"
   },
   "cluster": {
-    "display": "Cluster",
-    "description": "Redis Cluster management"
+    "description": "Redis Cluster management",
+    "display": "Cluster"
   },
   "connection": {
-    "display": "Connection",
-    "description": "Client connections management"
+    "description": "Client connections management",
+    "display": "Connection"
   },
   "generic": {
-    "display": "Generic",
-    "description": "Generic commands"
+    "description": "Generic commands",
+    "display": "Generic"
   },
   "geo": {
-    "display": "Geospatial indices",
-    "description": "Operations on the Geospatial Index data type"
+    "description": "Operations on the Geospatial Index data type",
+    "display": "Geospatial indices"
   },
   "hash": {
-    "display": "Hash",
-    "description": "Operations on the Hash data type"
+    "description": "Operations on the Hash data type",
+    "display": "Hash"
   },
   "hyperloglog": {
-    "display": "HyperLogLog",
-    "description": "Operations on the HyperLogLog data type"
+    "description": "Operations on the HyperLogLog data type",
+    "display": "HyperLogLog"
   },
+  "json": [
+    "JSON.GET",
+    "JSON.SET"
+  ],
   "list": {
-    "display": "List",
-    "description": "Operations on the List data type"
+    "description": "Operations on the List data type",
+    "display": "List"
   },
   "pubsub": {
-    "display": "Pub/Sub",
-    "description": "Pub/Sub commands"
+    "description": "Pub/Sub commands",
+    "display": "Pub/Sub"
   },
   "scripting": {
-    "display": "Scripting and Functions",
-    "description": "Redis server-side scripting and functions"
+    "description": "Redis server-side scripting and functions",
+    "display": "Scripting and Functions"
   },
   "sentinel": {
-    "display": "Sentinel",
-    "description": "Redis Sentinel commands"
+    "description": "Redis Sentinel commands",
+    "display": "Sentinel"
   },
   "server": {
-    "display": "Server",
-    "description": "Server management commands"
+    "description": "Server management commands",
+    "display": "Server"
   },
   "set": {
-    "display": "Set",
-    "description": "Operations on the Set data type"
+    "description": "Operations on the Set data type",
+    "display": "Set"
   },
   "sorted-set": {
-    "display": "Sorted Set",
-    "description": "Operations on the Sorted Set data type"
+    "description": "Operations on the Sorted Set data type",
+    "display": "Sorted Set"
   },
   "stream": {
-    "display": "Stream",
-    "description": "Operations on the Stream data type"
+    "description": "Operations on the Stream data type",
+    "display": "Stream"
   },
   "string": {
-    "display": "String",
-    "description": "Operations on the String data type"
+    "description": "Operations on the String data type",
+    "display": "String"
   },
   "transactions": {
-    "display": "Transactions",
-    "description": "Redis Transaction management"
+    "description": "Redis Transaction management",
+    "display": "Transactions"
   }
-}
+}
diff --git a/redis_benchmarks_specification/__cli__/cli.py b/redis_benchmarks_specification/__cli__/cli.py
@@ -80,6 +80,22 @@ def trigger_tests_dockerhub_cli_command_logic(args, project_name, project_versio
         decode_responses=False,
     )
     conn.ping()
+
+    # Extract version from Docker image tag if possible
+    # e.g., "redis:7.4.0" -> "7.4.0"
+    # e.g., "valkey/valkey:7.2.6-bookworm" -> "7.2.6"
+    git_version = None
+    if ":" in args.run_image:
+        tag = args.run_image.split(":")[-1]
+        # Try to extract version number from tag
+        # Common patterns: "7.4.0", "7.2.6-bookworm", "latest"
+        import re
+
+        version_match = re.match(r"^(\d+\.\d+\.\d+)", tag)
+        if version_match:
+            git_version = version_match.group(1)
+            logging.info(f"Extracted git_version '{git_version}' from image tag")
+
     testDetails = {}
     build_stream_fields, result = generate_benchmark_stream_request(
         args.id,
@@ -96,7 +112,7 @@ def trigger_tests_dockerhub_cli_command_logic(args, project_name, project_versio
         None,
         None,
         None,
-        None,
+        git_version,  # Pass extracted version
         None,
         None,
         None,
diff --git a/redis_benchmarks_specification/__runner__/runner.py b/redis_benchmarks_specification/__runner__/runner.py
@@ -175,13 +175,20 @@ def validate_benchmark_metrics(
     Args:
         results_dict: Dictionary containing benchmark results
         test_name: Name of the test being validated
-        benchmark_config: Benchmark configuration (unused, for compatibility)
+        benchmark_config: Benchmark configuration (optional, contains tested-commands)
         default_metrics: Default metrics configuration (unused, for compatibility)
 
     Returns:
         tuple: (is_valid, error_message)
     """
     try:
+        # Get tested commands from config if available
+        tested_commands = []
+        if benchmark_config and "tested-commands" in benchmark_config:
+            tested_commands = [
+                cmd.lower() for cmd in benchmark_config["tested-commands"]
+            ]
+
         # Define validation rules
         throughput_patterns = [
             "ops/sec",
@@ -219,6 +226,29 @@ def check_nested_dict(data, path=""):
                 ):
                     return
 
+                # Skip operation-specific metrics for operations not being tested
+                # For example, skip Gets.Ops/sec if only SET commands are tested
+                if tested_commands:
+                    skip_metric = False
+                    operation_types = [
+                        "gets",
+                        "sets",
+                        "hgets",
+                        "hsets",
+                        "lpush",
+                        "rpush",
+                        "sadd",
+                    ]
+                    for op_type in operation_types:
+                        if (
+                            op_type in metric_path_lower
+                            and op_type not in tested_commands
+                        ):
+                            skip_metric = True
+                            break
+                    if skip_metric:
+                        return
+
                 # Check throughput metrics
                 for pattern in throughput_patterns:
                     if pattern in metric_path_lower:
@@ -2680,13 +2710,20 @@ def delete_temporary_files(
                             if not success:
                                 logging.error(f"Memtier benchmark failed: {stderr}")
                                 # Clean up database after failure (timeout or error)
-                                if args.flushall_on_every_test_end or args.flushall_on_every_test_start:
-                                    logging.warning("Benchmark failed - cleaning up database with FLUSHALL")
+                                if (
+                                    args.flushall_on_every_test_end
+                                    or args.flushall_on_every_test_start
+                                ):
+                                    logging.warning(
+                                        "Benchmark failed - cleaning up database with FLUSHALL"
+                                    )
                                     try:
                                         for r in redis_conns:
                                             r.flushall()
                                     except Exception as e:
-                                        logging.error(f"FLUSHALL failed after benchmark failure: {e}")
+                                        logging.error(
+                                            f"FLUSHALL failed after benchmark failure: {e}"
+                                        )
                                 # Continue with the test but log the failure
                                 client_container_stdout = f"ERROR: {stderr}"
 
@@ -3033,8 +3070,13 @@ def delete_temporary_files(
                     test_result = False
 
                     # Clean up database after exception to prevent contamination of next test
-                    if args.flushall_on_every_test_end or args.flushall_on_every_test_start:
-                        logging.warning("Exception caught - cleaning up database with FLUSHALL")
+                    if (
+                        args.flushall_on_every_test_end
+                        or args.flushall_on_every_test_start
+                    ):
+                        logging.warning(
+                            "Exception caught - cleaning up database with FLUSHALL"
+                        )
                         try:
                             for r in redis_conns:
                                 r.flushall()
diff --git a/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-set-with-ex-100B-pipeline-10.yml b/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-string-set-with-ex-100B-pipeline-10.yml
@@ -12,7 +12,7 @@ dbconfig:
     tool: memtier_benchmark
     arguments: '"--data-size" "100" "--ratio" "1:0" "--key-pattern" "P:P" "-c" "50"
       "-t" "2" "--hide-histogram" "--key-minimum" "1"  "--key-maximum" "1000000" -n
-      allkeys" --pipeline 50'
+      "allkeys" --pipeline 50'
   resources:
     requests:
       memory: 1g
diff --git a/redis_benchmarks_specification/test-suites/memtier_benchmark-playbook-rate-limiting-lua-100k-sessions.yml b/redis_benchmarks_specification/test-suites/memtier_benchmark-playbook-rate-limiting-lua-100k-sessions.yml
@@ -34,7 +34,6 @@ tested-commands:
 - bitcount
 - eval
 tested-groups:
-- bitmap
 - scripting
 
 redis-topologies:
diff --git a/redis_benchmarks_specification/test-suites/memtier_benchmark-playbook-realtime-analytics-membership-pipeline-10.yml b/redis_benchmarks_specification/test-suites/memtier_benchmark-playbook-realtime-analytics-membership-pipeline-10.yml
@@ -35,6 +35,7 @@ dbconfig:
 tested-commands:
 - smembers
 - sdiff
+- sunion
 redis-topologies:
 - oss-standalone
 build-variants:
diff --git a/redis_benchmarks_specification/test-suites/memtier_benchmark-playbook-realtime-analytics-membership.yml b/redis_benchmarks_specification/test-suites/memtier_benchmark-playbook-realtime-analytics-membership.yml
@@ -35,6 +35,7 @@ dbconfig:
 tested-commands:
 - smembers
 - sdiff
+- sunion
 redis-topologies:
 - oss-standalone
 build-variants:
diff --git a/utils/tests/test_self_contained_coordinator_memtier.py b/utils/tests/test_self_contained_coordinator_memtier.py