Skip to content

Conversation

@Kashyab19
Copy link
Owner

@Kashyab19 Kashyab19 commented Dec 25, 2025

  1. Migrated to uv package manager
  2. Added ffmpeg + faster-whisper (a model to detect audio)
  3. Dockerized the worker and orchestration as well

Summary by cubic

Add audio transcription and search, and dockerize the API, orchestrator, and worker with a shared /app/assets volume. Migrated to uv and Python 3.12; added a Postgres-backed video library with URL-hash dedupe and per-user linking.

  • New Features

    • Chunk-level audio transcription with faster-whisper; text embeddings via BGE small; stored in Qdrant as audio_transcripts.
    • Unified search that rank-fuses audio and visual results (RRF) and deduplicates nearby hits; returns timestamps and scores.
    • New endpoints: POST /video to queue jobs, POST /embed for image embeddings, GET /search for results.
  • Refactors

    • Dockerized services (api, orchestrator, worker) and wired Redpanda, Redis, Qdrant, and Postgres via docker-compose; shared /app/assets for videos.
    • Manager uses absolute paths, 30s chunks, and Redis counters; creates Kafka topics when missing.
    • Worker calls API for CLIP embeddings, ensures Qdrant collections, and saves scene cuts (structural detector with heartbeat) and audio segments.
    • Server runs as a gateway (CPU CLIP + FastEmbed), Kafka ingestion on video_ingestion, and combined search across collections.
    • Switched to uv and Python 3.12; Dockerfile installs ffmpeg and sets uvicorn as default; added pyproject.toml and uv.lock.

Written for commit 5acf1b2. Summary will update automatically on new commits.

1. Migrated to uv package manager
2. Added ffmpeg + faster-whisper (a model to detect audio)
3. Dockerized the worker and orchestration as well
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 16 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="app/audio.py">

<violation number="1" location="app/audio.py:17">
P1: If model loading fails, `WHISPER` and `EMBEDDER` will be undefined, causing `NameError` when `transcribe_chunk()` is called. Consider either re-raising the exception to fail fast, or setting the variables to `None` and checking before use.</violation>
</file>

<file name="app/server.py">

<violation number="1" location="app/server.py:53">
P1: Missing `qdrant = None` in exception handler. If Qdrant connection fails, `qdrant` will be undefined, causing a `NameError` when endpoints try to use it. This differs from the Kafka producer pattern above which correctly sets `producer = None` on failure.</violation>
</file>

<file name="Dockerfile">

<violation number="1" location="Dockerfile:13">
P2: Pin the `uv` image to a specific version instead of using `:latest` for reproducible and secure builds.</violation>

<violation number="2" location="Dockerfile:23">
P1: Builds are not reproducible without a lock file. Consider keeping `uv.lock` committed and using `uv sync --frozen` to ensure consistent dependency versions across builds.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

# Use 'tiny' or 'base' for speed during debugging
WHISPER = WhisperModel("base", device="cpu", compute_type="int8")
EMBEDDER = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
except Exception as e:
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: If model loading fails, WHISPER and EMBEDDER will be undefined, causing NameError when transcribe_chunk() is called. Consider either re-raising the exception to fail fast, or setting the variables to None and checking before use.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/audio.py, line 17:

<comment>If model loading fails, `WHISPER` and `EMBEDDER` will be undefined, causing `NameError` when `transcribe_chunk()` is called. Consider either re-raising the exception to fail fast, or setting the variables to `None` and checking before use.</comment>

<file context>
@@ -0,0 +1,80 @@
+    # Use &#39;tiny&#39; or &#39;base&#39; for speed during debugging
+    WHISPER = WhisperModel(&quot;base&quot;, device=&quot;cpu&quot;, compute_type=&quot;int8&quot;)
+    EMBEDDER = TextEmbedding(model_name=&quot;BAAI/bge-small-en-v1.5&quot;)
+except Exception as e:
+    logger.critical(f&quot;Failed to load Audio models: {e}&quot;)
+
</file context>
Fix with Cubic

qdrant = QdrantClient(host=QDRANT_HOST, port=6333)
logger.info(f"Qdrant connected at {QDRANT_HOST}")
except Exception as e:
logger.critical(f"Qdrant Failed: {e}")
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Missing qdrant = None in exception handler. If Qdrant connection fails, qdrant will be undefined, causing a NameError when endpoints try to use it. This differs from the Kafka producer pattern above which correctly sets producer = None on failure.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/server.py, line 53:

<comment>Missing `qdrant = None` in exception handler. If Qdrant connection fails, `qdrant` will be undefined, causing a `NameError` when endpoints try to use it. This differs from the Kafka producer pattern above which correctly sets `producer = None` on failure.</comment>

<file context>
@@ -2,307 +2,184 @@
+    qdrant = QdrantClient(host=QDRANT_HOST, port=6333)
+    logger.info(f&quot;Qdrant connected at {QDRANT_HOST}&quot;)
+except Exception as e:
+    logger.critical(f&quot;Qdrant Failed: {e}&quot;)
+
+redis_client = redis.Redis(host=REDIS_HOST, port=6379, decode_responses=True)
</file context>
Suggested change
logger.critical(f"Qdrant Failed: {e}")
logger.critical(f"Qdrant Failed: {e}")
qdrant = None
Fix with Cubic

# Copy requirements first to leverage Docker caching
COPY requirements.txt .
# 2. Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Pin the uv image to a specific version instead of using :latest for reproducible and secure builds.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At Dockerfile, line 13:

<comment>Pin the `uv` image to a specific version instead of using `:latest` for reproducible and secure builds.</comment>

<file context>
@@ -1,25 +1,32 @@
-# Copy requirements first to leverage Docker caching
-COPY requirements.txt .
+# 2. Install uv
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
+
+# 3. Set up the application
</file context>
Suggested change
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
COPY --from=ghcr.io/astral-sh/uv:0.5.11 /uv /bin/uv
Fix with Cubic

COPY pyproject.toml ./

# Run sync WITHOUT --frozen (this resolves dependencies fresh)
RUN uv sync
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Builds are not reproducible without a lock file. Consider keeping uv.lock committed and using uv sync --frozen to ensure consistent dependency versions across builds.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At Dockerfile, line 23:

<comment>Builds are not reproducible without a lock file. Consider keeping `uv.lock` committed and using `uv sync --frozen` to ensure consistent dependency versions across builds.</comment>

<file context>
@@ -1,25 +1,32 @@
+COPY pyproject.toml ./
+
+# Run sync WITHOUT --frozen (this resolves dependencies fresh)
+RUN uv sync
 
-# Install Python dependencies
</file context>
Fix with Cubic

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 12 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="app/manager.py">

<violation number="1" location="app/manager.py:94">
P0: The Redis key `job:{job_id}:pending for the user {user_id}` contains spaces and appears to be a typo where descriptive text was accidentally included. This key format doesn&#39;t match the key used in `minion.py` (`job:{job_id}:pending`), which will cause job completion tracking to break entirely - chunks will never be marked as processed.</violation>
</file>

<file name="app/minion.py">

<violation number="1" location="app/minion.py:58">
P2: Exception is silently swallowed without logging. This will make debugging embedding API failures very difficult in production. Consider logging the error before returning None.</violation>

<violation number="2" location="app/minion.py:58">
P1: Silently ignoring Redis exceptions with `pass` hides critical job completion tracking failures. If Redis is down or the key doesn&#39;t exist, you&#39;ll have no visibility into why jobs appear incomplete.</violation>
</file>

<file name="app/database.py">

<violation number="1" location="app/database.py:66">
P2: Exception is caught and printed but not re-raised. This silently swallows database errors, preventing callers from knowing if the operation succeeded. Consider either re-raising the exception or using proper logging with error propagation.</violation>
</file>

Reply to cubic to teach it or ask questions. Tag @cubic-dev-ai to re-run a review.

Comment on lines 58 to 59
except Exception:
return None
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Exception is silently swallowed without logging. This will make debugging embedding API failures very difficult in production. Consider logging the error before returning None.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/minion.py, line 58:

<comment>Exception is silently swallowed without logging. This will make debugging embedding API failures very difficult in production. Consider logging the error before returning None.</comment>

<file context>
@@ -48,78 +44,65 @@
-    except Exception as e:
-        logger.error(f&quot;Embedding API Error: {e}&quot;)
+        return res.json()[&quot;vector&quot;] if res.status_code == 200 else None
+    except Exception:
         return None
 
</file context>
Suggested change
except Exception:
return None
except Exception as e:
logger.error(f"Embedding API Error: {e}")
return None
Fix with Cubic

video.status = status
db.commit()
except Exception as e:
print(f"DB Error: {e}")
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Exception is caught and printed but not re-raised. This silently swallows database errors, preventing callers from knowing if the operation succeeded. Consider either re-raising the exception or using proper logging with error propagation.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/database.py, line 66:

<comment>Exception is caught and printed but not re-raised. This silently swallows database errors, preventing callers from knowing if the operation succeeded. Consider either re-raising the exception or using proper logging with error propagation.</comment>

<file context>
@@ -1,60 +1,108 @@
+            video.status = status
+            db.commit()
+    except Exception as e:
+        print(f&quot;DB Error: {e}&quot;)
+    finally:
+        db.close()
</file context>
Suggested change
print(f"DB Error: {e}")
print(f"DB Error: {e}")
raise
Fix with Cubic

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 14 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="app/utils/scene_detect.py">

<violation number="1" location="app/utils/scene_detect.py:2">
P2: Typo in deprecation notice: &#39;StructualSceneDetector&#39; should be &#39;StructuralSceneDetector&#39; (missing &#39;r&#39; in &#39;Structural&#39;).</violation>
</file>

<file name="app/worker.py">

<violation number="1" location="app/worker.py:131">
P1: Job will never complete if any chunk fails. When `process_chunk` raises an exception, the reduce step that decrements `job:{job_id}:pending` is skipped. The error handler in `start()` commits the message but doesn&#39;t decrement the counter, so jobs with failed chunks hang indefinitely. Consider moving the reduce step to a `finally` block, or decrementing the counter in the error handling path.</violation>
</file>

<file name="app/server.py">

<violation number="1" location="app/server.py:116">
P1: Exposing raw exception details in HTTP responses is a security risk. The full exception message could leak sensitive database information (table/column names, SQL errors, connection strings). The error is already logged for debugging - return a generic message to clients instead.</violation>
</file>

Reply to cubic to teach it or ask questions. Tag @cubic-dev-ai to re-run a review.

@@ -1,3 +1,7 @@
"""
Deprecated: Use StructualSceneDetector instead.
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Typo in deprecation notice: 'StructualSceneDetector' should be 'StructuralSceneDetector' (missing 'r' in 'Structural').

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/utils/scene_detect.py, line 2:

<comment>Typo in deprecation notice: &#39;StructualSceneDetector&#39; should be &#39;StructuralSceneDetector&#39; (missing &#39;r&#39; in &#39;Structural&#39;).</comment>

<file context>
@@ -1,3 +1,7 @@
+&quot;&quot;&quot;
+Deprecated: Use StructualSceneDetector instead.
+&quot;&quot;&quot;
+
</file context>
Suggested change
Deprecated: Use StructualSceneDetector instead.
Deprecated: Use StructuralSceneDetector instead.
Fix with Cubic


except Exception as e:
logger.error(f"Visual processing failed: {e}")
raise e # Re-raise to signal failure to the thread wrapper
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Job will never complete if any chunk fails. When process_chunk raises an exception, the reduce step that decrements job:{job_id}:pending is skipped. The error handler in start() commits the message but doesn't decrement the counter, so jobs with failed chunks hang indefinitely. Consider moving the reduce step to a finally block, or decrementing the counter in the error handling path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/worker.py, line 131:

<comment>Job will never complete if any chunk fails. When `process_chunk` raises an exception, the reduce step that decrements `job:{job_id}:pending` is skipped. The error handler in `start()` commits the message but doesn&#39;t decrement the counter, so jobs with failed chunks hang indefinitely. Consider moving the reduce step to a `finally` block, or decrementing the counter in the error handling path.</comment>

<file context>
@@ -131,8 +128,9 @@ def process_chunk(task):
 
     except Exception as e:
         logger.error(f&quot;Visual processing failed: {e}&quot;)
+        raise e  # Re-raise to signal failure to the thread wrapper
 
-    # 3. AUDIO PROCESSING (Standard)
</file context>
Fix with Cubic

return {"error": str(e)}
db.rollback()
logger.error(f"Database error: {e}")
raise HTTPException(status_code=500, detail=f"Database error: {e}")
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Exposing raw exception details in HTTP responses is a security risk. The full exception message could leak sensitive database information (table/column names, SQL errors, connection strings). The error is already logged for debugging - return a generic message to clients instead.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/server.py, line 116:

<comment>Exposing raw exception details in HTTP responses is a security risk. The full exception message could leak sensitive database information (table/column names, SQL errors, connection strings). The error is already logged for debugging - return a generic message to clients instead.</comment>

<file context>
@@ -95,8 +98,24 @@ async def queue_video(
+    except Exception as e:
+        db.rollback()
+        logger.error(f&quot;Database error: {e}&quot;)
+        raise HTTPException(status_code=500, detail=f&quot;Database error: {e}&quot;)
+    finally:
+        db.close()
</file context>
Suggested change
raise HTTPException(status_code=500, detail=f"Database error: {e}")
raise HTTPException(status_code=500, detail="Failed to process video request")
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants