This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
MapSee-AI is a Python-based SNS content data extraction pipeline that processes Instagram and YouTube content to extract place/location information. It's a FastAPI service that receives URLs, downloads media content, performs speech-to-text (STT), and uses LLM (Gemini) to extract structured place data.
# Install dependencies (Python 3.13+)
uv sync
# Run the development server
uv run uvicorn src.main:app --host 0.0.0.0 --port 8001 --reload
# Alternative: run directly
uv run python -m src.main- ffmpeg/ffprobe: Required for audio/video processing
- yt-dlp: Used for downloading Instagram/YouTube content
/api/extract-placesreceivescontentId+snsUrl- Request returns immediately (async processing)
- Background task runs the extraction pipeline
- Results sent to backend via callback URL
URL → sns_router → get_audio → get_transcription (STT) → get_video_narration → get_llm_response → callback
↓
Platform detection (YouTube/Instagram)
Content type detection (video/image)
Download media via yt-dlp
src/apis/: FastAPI routers
place_router.py: Main API endpoint for place extraction
src/services/: Business logic
workflow.py: Main extraction pipeline orchestrationcontent_router.py: Routes to appropriate downloader based on platform/content typebackground_tasks.py: Async task execution and callback handlingsmb_service.py: SMB file server integration
src/services/modules/: Processing modules
llm.py: Gemini API integration for place extractionstt.py: Faster-Whisper speech-to-text
src/services/preprocess/: Media preprocessing
sns.py: Instagram/YouTube content download (yt-dlp)audio.py: FFmpeg audio extractionvideo.py: Video frame extraction (OCR currently disabled)
src/models/: Pydantic schemas
ExtractionState: TypedDict that flows through the pipeline, accumulating data at each stage
src/core/: Configuration and utilities
config.py: Settings from .env (API keys, SMB config, etc.)exceptions.py: CustomError class for pipeline errors
The pipeline uses ExtractionState (TypedDict) as a mutable state object that gets passed through each processing stage. Each stage updates specific fields:
contentStream/imageStream: Downloaded mediacaptionText: Post caption/descriptionaudioStream: Extracted audiotranscriptionText: STT outputocrText: Video text (currently disabled)result: Final extracted places
Required environment variables in .env:
GOOGLE_API_KEY: Gemini API keyAI_SERVER_API_KEY: API key for this serviceYOUTUBE_API_KEY: YouTube Data API keyINSTAGRAM_POST_DOC_ID,INSTAGRAM_APP_ID: Instagram API configBACKEND_CALLBACK_URL,BACKEND_API_KEY: Callback endpoint configSMB_*: SMB file server settings (optional)
- OCR functionality is currently disabled (noted with comments throughout)
- The service uses in-memory BytesIO streams for media processing
- Faster-Whisper runs on CPU with int8 quantization by default
- LLM responses are validated against Pydantic schemas using
response_json_schema