-
Notifications
You must be signed in to change notification settings - Fork 23
Issue #6 Enhance Verdict Writer #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lizaj99
wants to merge
11
commits into
vibing-ai:main
Choose a base branch
from
lizaj99:feature/verdict-writer-enhancement
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
c9222d0
Restore test_agents.py
lizaj99 e30869b
Restore test_verdict_writer.py with comprehensive test cases
lizaj99 d646caf
Update .env-example to remove sensitive information and provide clear…
lizaj99 a43caab
Restore conftest.py with test configuration and fixtures
lizaj99 82c640f
Implement VerdictWriter agent with comprehensive verdict generation
lizaj99 f798aec
Add warning filter to suppress starlette multipart deprecation warning
lizaj99 c22999b
Implement VerdictWriter with custom agent class and enhanced verdict …
lizaj99 baed9ed
Add pytest warning filter to suppress starlette multipart deprecation…
lizaj99 8834095
Fix Codacy issues: Remove unused imports and add test for agent imports
lizaj99 cbe8edc
Fix confidence score calculation for mixed evidence and update test c…
lizaj99 dcda9e4
Update test_agents.py with proper imports and test cases
lizaj99 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| import pytest | ||
| import warnings | ||
| import sys | ||
| from pathlib import Path | ||
| from typing import Generator, Dict, Any | ||
|
|
||
| # Add the project root directory to the Python path | ||
| project_root = str(Path(__file__).parent.parent.parent) | ||
| sys.path.insert(0, project_root) | ||
|
|
||
| from src.verifact_agents.verdict_writer import VerdictWriter | ||
|
|
||
| # Suppress the starlette multipart deprecation warning | ||
| warnings.filterwarnings("ignore", category=PendingDeprecationWarning, module="starlette.formparsers") | ||
|
|
||
| @pytest.fixture(scope="session") | ||
| def test_evidence() -> Dict[str, Any]: | ||
| """Provide sample evidence for testing.""" | ||
| return { | ||
| "content": "This is a sample evidence content for testing purposes.", | ||
| "source": "Test Source", | ||
| "relevance": 0.9, | ||
| "stance": "supporting" | ||
| } | ||
|
|
||
| @pytest.fixture(scope="session") | ||
| def test_claim() -> str: | ||
| """Provide a sample claim for testing.""" | ||
| return "This is a test claim that needs verification." | ||
|
|
||
| @pytest.fixture(scope="session") | ||
| def verdict_writer() -> Generator[VerdictWriter, None, None]: | ||
| """Create a VerdictWriter instance for testing.""" | ||
| writer = VerdictWriter() | ||
| yield writer | ||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def setup_test_environment(): | ||
| """Set up test environment variables.""" | ||
| import os | ||
| os.environ["ENVIRONMENT"] = "test" | ||
| os.environ["LOG_LEVEL"] = "DEBUG" | ||
| yield | ||
| # Cleanup after tests if needed | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| import pytest | ||
| from src.verifact_agents.verdict_writer import VerdictWriter, Verdict | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_verdict_writer_returns_valid_verdict(): | ||
| writer = VerdictWriter() | ||
| claim = "Bananas are a good source of potassium." | ||
| evidence = [ | ||
| { | ||
| "content": "Bananas contain around 422 mg of potassium per medium fruit, making them a good dietary source.", | ||
| "source": "Healthline", | ||
| "relevance": 0.9, | ||
| "stance": "supporting" | ||
| }, | ||
| { | ||
| "content": "Potatoes and beans contain more potassium than bananas, but bananas are still a decent source.", | ||
| "source": "Harvard School of Public Health", | ||
| "relevance": 0.8, | ||
| "stance": "neutral" | ||
| } | ||
| ] | ||
|
|
||
| verdict = await writer.run(claim=claim, evidence=evidence, detail_level="standard") | ||
|
|
||
| assert isinstance(verdict, Verdict) | ||
| assert verdict.claim == claim | ||
| assert verdict.verdict in ["true", "false", "partially true", "unverifiable"] | ||
| assert 0.0 <= verdict.confidence <= 1.0 | ||
| assert isinstance(verdict.explanation, str) and len(verdict.explanation.strip()) > 0 | ||
| assert isinstance(verdict.sources, list) and all(isinstance(s, str) for s in verdict.sources) | ||
| assert len(verdict.sources) > 0 | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_verdict_writer_brief_detail(): | ||
| writer = VerdictWriter() | ||
| claim = "Water freezes at 0 degrees Celsius." | ||
| evidence = [ | ||
| { | ||
| "content": "Under standard atmospheric conditions, water freezes at 0°C.", | ||
| "source": "Britannica", | ||
| "relevance": 0.95, | ||
| "stance": "supporting" | ||
| } | ||
| ] | ||
|
|
||
| verdict = await writer.run(claim=claim, evidence=evidence, detail_level="brief") | ||
|
|
||
| assert isinstance(verdict.explanation, str) | ||
| assert len(verdict.explanation.split()) <= 30 # brief explanation is usually short | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_verdict_writer_detailed_includes_sources(): | ||
| writer = VerdictWriter() | ||
| claim = "The Earth revolves around the Sun." | ||
| evidence = [ | ||
| { | ||
| "content": "Astronomical evidence and observations confirm the heliocentric model.", | ||
| "source": "NASA", | ||
| "relevance": 0.99, | ||
| "stance": "supporting" | ||
| } | ||
| ] | ||
|
|
||
| verdict = await writer.run(claim=claim, evidence=evidence, detail_level="detailed") | ||
|
|
||
| assert any(source.lower().startswith("http") == False for source in verdict.sources) | ||
| assert "nasa" in " ".join(verdict.sources).lower() | ||
| assert "sun" in verdict.explanation.lower() or "earth" in verdict.explanation.lower() | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_confidence_score_considers_evidence_relevance(): | ||
| writer = VerdictWriter() | ||
| claim = "Cats are better pets than dogs." | ||
| evidence = [ | ||
| {"content": "Cats require less maintenance.", "source": "PetGuide", "relevance": 0.9, "stance": "supporting"}, | ||
| {"content": "Dogs are more loyal and emotionally supportive.", "source": "DogWorld", "relevance": 0.85, "stance": "contradicting"}, | ||
| ] | ||
| verdict = await writer.run(claim=claim, evidence=evidence, detail_level="standard") | ||
| assert 0.4 <= verdict.confidence <= 0.6 # Mixed evidence should yield moderate confidence | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_explanation_maintains_political_neutrality(): | ||
| writer = VerdictWriter() | ||
| claim = "Voter ID laws reduce election fraud." | ||
| evidence = [ | ||
| {"content": "Some studies suggest voter ID laws deter fraud.", "source": "Heritage Foundation", "relevance": 0.8, "stance": "supporting"}, | ||
| {"content": "Other studies show minimal fraud cases regardless of ID laws.", "source": "Brennan Center", "relevance": 0.9, "stance": "contradicting"}, | ||
| ] | ||
| verdict = await writer.run(claim=claim, evidence=evidence, detail_level="detailed") | ||
| explanation = verdict.explanation.lower() | ||
| assert "republican" not in explanation | ||
| assert "democrat" not in explanation | ||
| assert "bias" not in explanation | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_alternative_perspectives_are_included(): | ||
| writer = VerdictWriter() | ||
| claim = "Electric cars are better for the environment." | ||
| evidence = [ | ||
| {"content": "EVs emit less CO2 over their lifetime.", "source": "EPA", "relevance": 0.9, "stance": "supporting"}, | ||
| {"content": "Battery production has environmental impacts.", "source": "Nature", "relevance": 0.8, "stance": "contradicting"}, | ||
| ] | ||
| verdict = await writer.run(claim=claim, evidence=evidence, detail_level="detailed") | ||
| explanation = verdict.explanation.lower() | ||
| assert "battery" in explanation or "production" in explanation | ||
| assert any(term in explanation for term in ["emissions", "co2", "co₂", "co 2"]) | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_explanation_detail_levels_vary(): | ||
| writer = VerdictWriter() | ||
| claim = "Electric cars are better for the environment than gas cars." | ||
| evidence = [ | ||
| { | ||
| "content": "Electric vehicles produce fewer greenhouse gas emissions over their lifetime.", | ||
| "source": "EPA", | ||
| "relevance": 0.95, | ||
| "stance": "supporting" | ||
| }, | ||
| { | ||
| "content": "Battery production for electric vehicles involves mining and emissions.", | ||
| "source": "Nature", | ||
| "relevance": 0.8, | ||
| "stance": "contradicting" | ||
| } | ||
| ] | ||
|
|
||
| brief = await writer.run(claim=claim, evidence=evidence, detail_level="brief") | ||
| standard = await writer.run(claim=claim, evidence=evidence, detail_level="standard") | ||
| detailed = await writer.run(claim=claim, evidence=evidence, detail_level="detailed") | ||
|
|
||
| # Ensure increasing richness in explanation | ||
| assert len(brief.explanation.split()) < len(standard.explanation.split()) < len(detailed.explanation.split()) | ||
|
|
||
| # Optional: Check sources and content presence in detailed output | ||
| assert len(detailed.sources) >= 2 | ||
| explanation = detailed.explanation.lower() | ||
| assert "battery" in explanation | ||
| assert "emissions" in explanation or "co2" in explanation |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Restore env vars after tests to avoid leaking state.
The autouse fixture mutates global env without cleaning up, which can bite when running tests in parallel or in interactive sessions.
📝 Committable suggestion
🤖 Prompt for AI Agents