Conversation
WalkthroughThis update introduces a comprehensive database integration for a fact-checking system using Supabase and PGVector. It adds modules for schema management, database operations, and testing, and enhances the main pipeline to cache and reuse fact-checking results by querying for similar claims. Dependency and configuration files are updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant VerifactManager
participant DatabaseManager
participant OpenAI
participant Supabase
User->>VerifactManager: Submit query
VerifactManager->>DatabaseManager: find_similar_claims(query)
DatabaseManager->>OpenAI: generate_embedding(query)
DatabaseManager->>Supabase: vector similarity search
DatabaseManager-->>VerifactManager: Return similar claims (with verdicts if any)
alt Similar claim with verdict found
VerifactManager-->>User: Return cached verdict and evidence
else No similar claim or no verdict
VerifactManager->>OpenAI: Detect claims, gather evidence, generate verdict
VerifactManager->>DatabaseManager: store_claim, store_evidence, store_verdict
DatabaseManager->>Supabase: Insert data
VerifactManager-->>User: Return new verdict and evidence
end
Possibly related issues
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
src/tests/test_database.py (1)
11-13: Consider using a more robust path resolution approach.The current path manipulation approach works but could be improved for better maintainability and reliability.
Consider using a more robust approach:
-# Add the project root to Python path -project_root = Path(__file__).parent.parent.parent -sys.path.insert(0, str(project_root)) +# Add the project root to Python path +project_root = Path(__file__).resolve().parent.parent.parent +if str(project_root) not in sys.path: + sys.path.insert(0, str(project_root))This approach uses
resolve()for absolute path resolution and checks if the path is already insys.pathbefore adding it.src/utils/db_schema.py (2)
60-60: Consider making the embedding dimension configurable.The embedding dimension is hardcoded to 1536. Consider defining this as a constant or making it configurable to support different embedding models in the future.
Add a constant at the module level:
EMBEDDING_DIMENSION = 1536Then use it:
-test_embedding = [0.1] * 1536 # Create a dummy embedding +test_embedding = [0.1] * EMBEDDING_DIMENSION # Create a dummy embedding
118-119: Remove unnecessary blank line.).execute() -src/utils/db.py (3)
115-116: Check openai_client instead of openai_api_key for consistency.Since you set
self.openai_client = Nonewhen the API key is missing, check the client instead of the key for consistency.-if not self.openai_api_key: +if not self.openai_client:
119-122: Make the embedding model configurable.The embedding model is hardcoded. Consider making it configurable to support different models or future upgrades.
Add a configuration parameter to the class:
class DatabaseManager: def __init__(self, embedding_model: str = "text-embedding-3-small"): # ... existing code ... self.embedding_model = embedding_modelThen use it:
-model="text-embedding-3-small", +model=self.embedding_model,
309-309: Add newline at end of file.db_manager = DatabaseManager() +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
.env-example(1 hunks).gitignore(1 hunks)pyproject.toml(1 hunks)src/main.py(1 hunks)src/tests/test_database.py(1 hunks)src/utils/db.py(1 hunks)src/utils/db_schema.py(1 hunks)src/verifact_manager.py(6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/main.py (1)
src/utils/logging_utils/logging_config.py (1)
setup_logging(4-38)
src/tests/test_database.py (2)
src/verifact_agents/evidence_hunter.py (1)
Evidence(6-12)src/verifact_agents/verdict_writer.py (1)
Verdict(9-18)
🔇 Additional comments (14)
pyproject.toml (1)
44-44: LGTM! Supabase dependency addition is correct.The
supabase>=2.0.0dependency is properly added to support the new database integration features. The version constraint is appropriate for a new integration..gitignore (2)
184-185: LGTM! Development artifacts properly ignored.Adding
.cursorandmcp.jasonfiles to the ignore list is appropriate for development artifacts that shouldn't be tracked in version control.
190-190: LGTM! Test file with real data properly ignored.Adding
src/utils/test_real_data.pyto the ignore list is appropriate as test files containing real data should not be committed to version control for security and privacy reasons..env-example (1)
82-84: LGTM! Minor cleanup of trailing comments.The removal of trailing comments from the logging configuration lines is a reasonable cleanup that maintains clarity without affecting functionality.
src/main.py (1)
5-5: LGTM! Import path update aligns with module restructuring.The import path change from
utils.logging.logging_configtoutils.logging_utils.logging_configcorrectly reflects the module reorganization shown in the relevant code snippets.src/tests/test_database.py (7)
21-40: LGTM! Comprehensive embedding test with good validation.The embedding test function provides excellent validation of the embedding generation process with proper error handling and informative output including dimension checks and sample values.
55-68: LGTM! Proper claim storage test with good validation.The claim storage test creates a realistic test claim with appropriate scores and validates the storage operation correctly.
70-86: LGTM! Evidence storage test covers contradictory evidence properly.The evidence storage test includes a realistic contradictory evidence example with proper stance classification, which is crucial for testing the full fact-checking pipeline.
88-102: LGTM! Verdict storage test includes all required fields.The verdict storage test properly includes all required fields from the
Verdictmodel, including the claim field, verdict classification, confidence score, explanation, and sources.
104-120: LGTM! Similarity search test covers edge cases well.The similarity search test uses a semantically similar but differently phrased claim ("The Earth is not round" vs "The Earth is flat") and handles the case where no similar claims are found in a new database, which is good defensive programming.
124-126: LGTM! Proper exception handling with informative error messages.The global exception handling provides clear error messages while maintaining the test's exit code behavior for CI/CD integration.
128-130: LGTM! Proper async execution with exit codes.The main execution block correctly uses
asyncio.run()and provides appropriate exit codes for success/failure scenarios, which is essential for automated testing environments.src/utils/db_schema.py (1)
114-117: Ensure theexec_sqlRPC function is defined in your Supabase databaseThe call to
supabase.rpc('exec_sql', …)relies on a custom PL/pgSQL function that isn’t provided by default. Without it, this RPC invocation will fail at runtime.Action items:
- Verify you have a migration (e.g. under
supabase/migrations/…) that defines theexec_sqlfunction.- If it’s not present, add a SQL migration such as:
create or replace function exec_sql(sql text) returns setof record language sql as $$ execute sql; $$;src/utils/db.py (1)
293-293: Confirm Supabase Python client ordering syntaxPlease verify that
.order("created_at", desc=True)is a supported signature in the supabase-py client (SyncRequestBuilder from postgrest). If the client instead requires chaining a.desc()call, update the query accordingly:- ).order("created_at", desc=True).limit(limit).execute() + ).order("created_at").desc().limit(limit).execute()
There was a problem hiding this comment.
Actionable comments posted: 16
🔭 Outside diff range comments (1)
src/verifact_manager.py (1)
43-206: Consider refactoring the run method to reduce complexity.The
runmethod has very high cyclomatic complexity (54) and exceeds the line limit (124 lines). Consider breaking it into smaller, focused methods for better maintainability.+ async def _check_similar_claims(self, claims, progress_callback, progress_msg): + """Check for similar claims in database and return processed claims.""" + # Lines 75-126 logic here + + async def _process_new_claims(self, new_claims, processed_claims, progress_callback, progress_msg): + """Process new claims through evidence gathering and verdict generation.""" + # Lines 136-187 logic here + + async def _store_results(self, processed_claims): + """Store new results in database.""" + # Lines 189-206 logic here
♻️ Duplicate comments (4)
src/utils/db_schema.py (1)
152-160: Lazy initialization pattern implemented correctly.The lazy initialization pattern is properly implemented, addressing the previous review concern about eager initialization.
src/utils/db.py (2)
226-228: Hash function usage is acceptable and cache initialization is handled.The use of MD5 for cache keys is acceptable here since it's not used for security purposes, only for creating consistent cache keys. The cache initialization is properly handled with
hasattrchecks.
242-244: Embedding parsing is handled correctly by the model validator.The
DBClaimmodel'sparse_embeddingvalidator properly handles string-to-list conversion, so the manual parsing mentioned in past comments is not needed here.src/verifact_manager.py (1)
35-36: Similarity threshold is now configurable.The similarity threshold has been made configurable in the
ManagerConfigclass, addressing the previous review concern.
🧹 Nitpick comments (5)
src/utils/db_schema.py (3)
5-5: Remove unused import.The
Pathimport frompathlibis not used in this file.-from pathlib import Path
6-6: Remove unused import.The
Clientimport fromsupabaseis not used in this file.-from supabase import create_client, Client +from supabase import create_client
89-122: SQL construction is safe but consider improving readability.While the f-string usage here is safe since
EMBEDDING_DIMENSIONis a constant, the long SQL string could be improved for readability. The static analysis warning about SQL injection is a false positive in this case.+ # SQL function definition - safe since EMBEDDING_DIMENSION is a constant function_sql = f""" CREATE OR REPLACE FUNCTION match_claims_with_verdicts( query_embedding vector({EMBEDDING_DIMENSION}),src/utils/db.py (1)
8-8: Remove unused import.The
Clientimport fromsupabaseis not used in this file.-from supabase import create_client, Client +from supabase import create_clientsrc/verifact_manager.py (1)
18-18: Remove unused import.The
SimilarClaimResultimport is not used in this file.-from src.utils.db import db_manager, SimilarClaimResult +from src.utils.db import db_manager
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
app.py(1 hunks)src/tests/test_database.py(1 hunks)src/utils/db.py(1 hunks)src/utils/db_schema.py(1 hunks)src/verifact_manager.py(5 hunks)
✅ Files skipped from review due to trivial changes (1)
- app.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/utils/db_schema.py (2)
src/tests/test_database.py (1)
test_embedding(22-40)src/verifact_manager.py (1)
run(43-206)
src/verifact_manager.py (6)
src/utils/db.py (6)
SimilarClaimResult(66-70)find_similar_claims(214-262)get_claim_with_evidence_and_verdict(264-286)store_claim(133-164)store_evidence(166-190)store_verdict(192-212)app.py (1)
progress_callback(13-16)src/verifact_agents/claim_detector.py (1)
Claim(8-12)src/verifact_agents/verdict_writer.py (1)
Verdict(9-18)src/verifact_agents/evidence_hunter.py (1)
Evidence(6-12)src/utils/logging_utils/logging_config.py (1)
setup_logging(4-38)
src/utils/db.py (3)
src/utils/db_schema.py (2)
get_schema_manager(155-160)verify_schema_exists(33-84)src/verifact_agents/evidence_hunter.py (1)
Evidence(6-12)src/verifact_agents/verdict_writer.py (1)
Verdict(9-18)
🪛 GitHub Check: Codacy Static Code Analysis
src/utils/db_schema.py
[warning] 5-5: src/utils/db_schema.py#L5
Unused Path imported from pathlib
[warning] 6-6: src/utils/db_schema.py#L6
Unused Client imported from supabase
[warning] 86-86: src/utils/db_schema.py#L86
Method create_vector_similarity_function has 62 lines of code (limit is 50)
[warning] 89-89: src/utils/db_schema.py#L89
Possible SQL injection vector through string-based query construction.
src/verifact_manager.py
[warning] 18-18: src/verifact_manager.py#L18
Unused SimilarClaimResult imported from src.utils.db
[warning] 43-43: src/verifact_manager.py#L43
Method run has 124 lines of code (limit is 50)
[warning] 43-43: src/verifact_manager.py#L43
Method run has a cyclomatic complexity of 54 (limit is 8)
src/tests/test_database.py
[warning] 42-42: src/tests/test_database.py#L42
Method test_database_operations has 68 lines of code (limit is 50)
[warning] 42-42: src/tests/test_database.py#L42
Method test_database_operations has a cyclomatic complexity of 9 (limit is 8)
src/utils/db.py
[warning] 8-8: src/utils/db.py#L8
Unused Client imported from supabase
[warning] 226-226: src/utils/db.py#L226
Detected MD5 hash algorithm which is considered insecure.
[warning] 226-226: src/utils/db.py#L226
Use of weak MD5 hash for security. Consider usedforsecurity=False
[warning] 227-227: src/utils/db.py#L227
Access to member '_cache' before its definition line 255
[warning] 228-228: src/utils/db.py#L228
Access to member '_cache' before its definition line 255
🪛 GitHub Actions: CI
src/utils/db_schema.py
[error] 15-15: Ruff D101: Missing docstring in public class 'DatabaseSchemaManager'.
[error] 16-16: Ruff D107: Missing docstring in 'init' method.
[warning] 107-107: Ruff W291: Trailing whitespace detected.
src/verifact_manager.py
[error] 40-40: Ruff D101: Missing docstring in public class 'VerifactManager'.
[error] 41-41: Ruff D107: Missing docstring in 'init' method.
[warning] 140-140: Ruff B007: Loop control variable 'verdict' not used within loop body; consider renaming to '_verdict'.
[error] 290-290: Ruff D103: Missing docstring in public function 'test_manager'.
src/tests/test_database.py
[error] 2-4: Ruff D205: 1 blank line required between summary line and description.
[error] 15-15: Ruff E402: Module level import not at top of file for 'from dotenv import load_dotenv'.
[error] 17-17: Ruff E402: Module level import not at top of file for 'from src.utils.db import db_manager'.
[error] 18-18: Ruff E402: Module level import not at top of file for 'from src.verifact_agents.claim_detector import Claim'.
[error] 19-19: Ruff E402: Module level import not at top of file for 'from src.verifact_agents.evidence_hunter import Evidence'.
[error] 20-20: Ruff E402: Module level import not at top of file for 'from src.verifact_agents.verdict_writer import Verdict'.
src/utils/db.py
[error] 23-23: Ruff D101: Missing docstring in public class 'DBClaim'.
[error] 49-49: Ruff D101: Missing docstring in public class 'DBEvidence'.
[error] 59-59: Ruff D101: Missing docstring in public class 'DBVerdict'.
[error] 69-69: Ruff D415: First line should end with a period, question mark, or exclamation point.
[error] 74-74: Ruff D101: Missing docstring in public class 'DatabaseManager'.
[error] 75-75: Ruff D107: Missing docstring in 'init' method.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (2)
src/tests/test_database.py (1)
90-96: Ignore the verification request—test matches theVerdictmodel signature
TheVerdictclass insrc/verifact_agents/verdict_writer.pydefines exactly these fields:
- claim: str
- verdict: Literal[…]
- confidence: float
- explanation: str
- sources: list[str]
The test instantiates all required fields correctly, so no change is needed.
Likely an incorrect or invalid review comment.
src/verifact_manager.py (1)
78-81: Database integration implementation looks correct.The similarity search integration properly uses the configurable threshold and handles the database lookup workflow correctly.
src/tests/test_database.py
Outdated
| # Add the project root to Python path | ||
| project_root = Path(__file__).resolve().parent.parent.parent | ||
| if str(project_root) not in sys.path: | ||
| sys.path.insert(0, str(project_root)) | ||
|
|
||
| from dotenv import load_dotenv | ||
| from src.utils.db import db_manager | ||
| from src.verifact_agents.claim_detector import Claim | ||
| from src.verifact_agents.evidence_hunter import Evidence | ||
| from src.verifact_agents.verdict_writer import Verdict |
There was a problem hiding this comment.
Move imports to the top of the file.
Module-level imports should be at the top to comply with PEP 8 standards.
-# Add the project root to Python path
-project_root = Path(__file__).resolve().parent.parent.parent
-if str(project_root) not in sys.path:
- sys.path.insert(0, str(project_root))
-
-from dotenv import load_dotenv
-from src.utils.db import db_manager
-from src.verifact_agents.claim_detector import Claim
-from src.verifact_agents.evidence_hunter import Evidence
-from src.verifact_agents.verdict_writer import Verdict
+from dotenv import load_dotenv
+from src.utils.db import db_manager
+from src.verifact_agents.claim_detector import Claim
+from src.verifact_agents.evidence_hunter import Evidence
+from src.verifact_agents.verdict_writer import Verdict
+
+# Add the project root to Python path
+project_root = Path(__file__).resolve().parent.parent.parent
+if str(project_root) not in sys.path:
+ sys.path.insert(0, str(project_root))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Add the project root to Python path | |
| project_root = Path(__file__).resolve().parent.parent.parent | |
| if str(project_root) not in sys.path: | |
| sys.path.insert(0, str(project_root)) | |
| from dotenv import load_dotenv | |
| from src.utils.db import db_manager | |
| from src.verifact_agents.claim_detector import Claim | |
| from src.verifact_agents.evidence_hunter import Evidence | |
| from src.verifact_agents.verdict_writer import Verdict | |
| from dotenv import load_dotenv | |
| from src.utils.db import db_manager | |
| from src.verifact_agents.claim_detector import Claim | |
| from src.verifact_agents.evidence_hunter import Evidence | |
| from src.verifact_agents.verdict_writer import Verdict | |
| # Add the project root to Python path | |
| project_root = Path(__file__).resolve().parent.parent.parent | |
| if str(project_root) not in sys.path: | |
| sys.path.insert(0, str(project_root)) |
🧰 Tools
🪛 GitHub Actions: CI
[error] 15-15: Ruff E402: Module level import not at top of file for 'from dotenv import load_dotenv'.
[error] 17-17: Ruff E402: Module level import not at top of file for 'from src.utils.db import db_manager'.
[error] 18-18: Ruff E402: Module level import not at top of file for 'from src.verifact_agents.claim_detector import Claim'.
[error] 19-19: Ruff E402: Module level import not at top of file for 'from src.verifact_agents.evidence_hunter import Evidence'.
[error] 20-20: Ruff E402: Module level import not at top of file for 'from src.verifact_agents.verdict_writer import Verdict'.
🤖 Prompt for AI Agents
In src/tests/test_database.py around lines 11 to 20, the imports are placed
after some code that modifies sys.path. To comply with PEP 8 standards, move all
module-level import statements to the very top of the file, before any other
code, including the sys.path modification. You can keep the sys.path
modification after the imports if necessary, but ideally, adjust the code so
that imports are at the top.
src/utils/db_schema.py
Outdated
| AS $$ | ||
| BEGIN | ||
| RETURN QUERY | ||
| SELECT |
There was a problem hiding this comment.
Remove trailing whitespace.
- SELECT
+ SELECT📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SELECT | |
| SELECT |
🧰 Tools
🪛 GitHub Actions: CI
[warning] 107-107: Ruff W291: Trailing whitespace detected.
🤖 Prompt for AI Agents
In src/utils/db_schema.py at line 107, there is trailing whitespace after the
SELECT statement. Remove any spaces or tabs at the end of this line to clean up
the code.
| embedding_model: str = Field("text-embedding-3-small", description="OpenAI embedding model to use") | ||
|
|
||
| class VerifactManager: | ||
| def __init__(self, config: ManagerConfig = None): |
There was a problem hiding this comment.
Add method docstring.
The __init__ method is missing a docstring.
- def __init__(self, config: ManagerConfig = None):
+ def __init__(self, config: ManagerConfig = None):
+ """Initialize the manager with optional configuration."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, config: ManagerConfig = None): | |
| def __init__(self, config: ManagerConfig = None): | |
| """Initialize the manager with optional configuration.""" |
🤖 Prompt for AI Agents
In src/verifact_manager.py at line 39, the __init__ method lacks a docstring.
Add a concise docstring immediately below the method definition that describes
the purpose of the constructor and the role of the config parameter, specifying
that it initializes the instance with an optional ManagerConfig object.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/utils/db_schema.py (1)
110-110: Fix trailing whitespace.There's trailing whitespace after the SELECT statement that needs to be removed.
- SELECT + SELECTsrc/verifact_manager.py (1)
40-42: Add missing docstring for init method.The
__init__method is missing a docstring as indicated by the pipeline failure.- def __init__(self, config: ManagerConfig = None): + def __init__(self, config: ManagerConfig = None): + """Initialize the manager with optional configuration."""
🧹 Nitpick comments (1)
src/verifact_manager.py (1)
102-114: Consider caching evidence retrieval.The code retrieves evidence for similar claims from the database on every request. Since evidence is already stored and shouldn't change, consider if this additional database call is necessary or if the evidence could be included in the similarity search results.
If evidence is frequently needed with similar claims, consider modifying the
find_similar_claimsmethod to optionally include evidence in the response to reduce database calls.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/tests/test_database.py(1 hunks)src/utils/db.py(1 hunks)src/utils/db_schema.py(1 hunks)src/verifact_manager.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/tests/test_database.py
- src/utils/db.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/utils/db_schema.py (1)
src/verifact_manager.py (1)
run(44-207)
🪛 GitHub Actions: CI
src/utils/db_schema.py
[warning] 110-110: Ruff W291: Trailing whitespace detected. Remove trailing whitespace.
src/verifact_manager.py
[error] 42-42: Ruff D107: Missing docstring in 'init' method.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (6)
src/utils/db_schema.py (3)
87-124: LGTM: Well-structured SQL function generation.The vector similarity function SQL is properly constructed using string formatting with a constant dimension value, which is safe from SQL injection. The function correctly implements cosine distance similarity search with configurable thresholds.
34-85: Excellent comprehensive schema verification.The verification logic thoroughly checks all required database components:
- Tests table existence through actual queries rather than metadata checks
- Validates the vector similarity function with a realistic test call
- Provides clear logging and error messages
- Handles missing components gracefully by attempting automatic creation
169-174: Good implementation of lazy initialization pattern.This addresses the previous review comment about avoiding module-level initialization that could fail if environment variables aren't set. The pattern ensures the DatabaseSchemaManager is only created when first accessed.
src/verifact_manager.py (3)
77-127: Excellent database integration with similarity search.The implementation efficiently checks for similar claims before processing new ones. Key strengths:
- Uses configurable similarity threshold from config
- Properly converts database models to agent models for consistency
- Includes comprehensive logging and progress updates
- Handles the case where all claims are found in database
190-207: Robust error handling for database storage.The implementation correctly handles database storage failures without breaking the pipeline - this is the right approach since the core fact-checking functionality should work even if database storage fails. The logging appropriately distinguishes between storage failures and successful completion.
139-149: In-place list updates inprocessed_claimsare safeAfter searching the codebase, the only
for … enumerate(processed_claims)loops occur insrc/verifact_manager.pyand both only replace existing tuple elements (no insertions or deletions):
- Step 2 (gathering evidence): lines 139–149 update
processed_claims[idx]with new evidence- Step 3 (generating verdicts): immediately after, a similar loop updates
processed_claims[idx]with the verdictSince neither loop changes the list’s length, this pattern is safe. No other in-place modifications were found.
You can optionally add a brief comment above each loop to clarify intent for future maintainers, but no code changes are required.
Description
This PR implements proper storage of claims, evidence, and verdicts in Supabase with PGVector for semantic search capabilities. This enables persistence of fact-checking results and future lookup of similar claims, addressing the need for a robust database layer in the VeriFact system.
Type of change
Changes Made
Core Database Implementation (
src/utils/db.py)DBClaim,DBEvidence,DBVerdict,SimilarClaimResultwith proper Pydantic validationstore_claim()- Stores claims with OpenAI embeddingsstore_evidence()- Stores evidence with stance validationstore_verdict()- Stores verdicts with confidence scoresfind_similar_claims()- Vector similarity search with cachingget_claim_with_evidence_and_verdict()- Complete data retrievalDatabase Schema Management (
src/utils/db_schema.py)match_claims_with_verdictsfunctionTesting Implementation (
src/tests/test_database.py)Dependencies (
pyproject.toml,uv.lock)supabase>=2.0.0dependencyAcceptance Criteria Met
✅ Claims, evidence, and verdicts are properly stored in Supabase
✅ Vector embeddings are generated and stored for semantic search
text-embedding-3-smallintegration✅ Similar claim lookup functionality works effectively
✅ Database connections are properly managed
✅ Error handling for database operations is robust
✅ Performance is optimized for vector similarity searches
Files Changed
New Files
src/utils/db.py- Main database operations managersrc/utils/db_schema.py- Database schema managementsrc/tests/test_database.py- Comprehensive database testsModified Files
pyproject.toml- Added supabase dependencyuv.lock- Updated dependency lock fileManual Setup Required
Supabase Database Setup:
Environment Variables:
Database Verification:
Testing
python src/utils/db_schema.pypython src/tests/test_database.pyChecklist
Agent Changes (if applicable)
Limitations and Future Improvements
exec_sqlfunction in Supabase (one-time setup)This implementation provides a solid foundation for database operations in the VeriFact system with proper error handling, performance optimization, and comprehensive testing.
Summary by CodeRabbit
New Features
Improvements
Chores
.gitignoreto exclude additional files.