A multimodal AI agent for weather station fleet management, built with Strands Agents SDK and tested with Scenario. This project demonstrates multimodal tool calling (satellite imagery + text), knowledge base retrieval, and end-to-end evaluation with LangWatch.
The InField Agent assists field technicians and agronomists managing Davis Instruments weather stations. It showcases:
- Multimodal tool calling — satellite image analysis with NDVI estimation via vision models
- Knowledge base retrieval — calibration procedures grounded in documentation
- Fleet monitoring — station inventory, battery health, reporting gaps
- Evaluation with LangWatch — experiments in Jupyter notebooks with inline satellite images
- Simulation testing — multi-turn conversation testing with Scenario
-
Knowledge Base 📚
- Calibration procedures for Davis Instruments Vantage Pro2
- Temperature, humidity, wind direction, barometric pressure
- Keyword search with weighted scoring (title 3x, category 2x, content 1x)
-
Station Status 📊
- Fleet inventory from Excel data
- Battery health monitoring (flags voltage < 3.0V)
- Stale station detection (no data in 90+ days)
- Filtering by country, region, company
-
Satellite Imagery 🛰️
- NDVI estimation from satellite images using OpenAI Vision
- Vegetation coverage percentage and land type classification
- Confidence levels for analysis results
- Python 3.10+
- OpenAI API key
- LangWatch API key
- uv package manager
-
Clone the project:
git clone https://github.com/langwatch/satellite-agent.git cd satellite-agent -
Install dependencies:
uv venv && uv pip install -e .
-
Set up environment variables:
cp .env.example .env # Edit .env and add your keysOPENAI_API_KEY=your-openai-api-key LANGWATCH_API_KEY=your-langwatch-api-key
uv run python main.py=== InField Agent (Strands) ===
Type 'quit' to exit.
You: How do I calibrate the barometric pressure on my Vantage Pro2?
Agent: To calibrate the barometric pressure on your Vantage Pro2:
1. Obtain a known reference pressure...
2. Enter the calibration offset through the console setup menu...
You: Which stations have low battery?
Agent: The following stations have battery voltage below 3.0V:
- Station 25_101 (NL, 2.8V)
- Station 25_205 (DE, 2.6V)
You: Analyze satellite image 01 for NDVI.
Agent: Based on the satellite image analysis:
- NDVI estimate: 0.65
- Vegetation coverage: 72%
- Dominant land types: cropland, grassland
Multi-turn conversation tests using Scenario:
uv run pytest tests/ -m agent_test -vTests include:
| Test | What it validates |
|---|---|
test_basic_ndvi_analysis |
NDVI estimation with coverage and land types |
test_vegetation_health_inquiry |
Broad vegetation health assessment |
test_multi_turn_vegetation_comparison |
Comparing two satellite images across turns |
test_ndvi_coverage_estimation |
Detailed coverage data and land classification |
test_customer_follow_up_on_ndvi_meaning |
Follow-up grounded in tool results |
test_invalid_image_handling |
Graceful handling of non-existent images |
@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_basic_ndvi_analysis():
result = await scenario.run(
name="basic NDVI analysis",
description="A farmer asks the agent to analyze satellite image 01 for NDVI estimation.",
agents=[
InFieldAgent(),
scenario.UserSimulatorAgent(),
scenario.JudgeAgent(),
],
script=[
scenario.user("Can you analyze satellite image 01 and tell me the NDVI?"),
scenario.agent(),
scenario.judge(criteria=[
"Agent provides an NDVI estimate (a number between -1.0 and 1.0)",
"Agent mentions vegetation coverage percentage",
"Agent describes the dominant land types visible in the image",
]),
],
)
assert result.successRun the multimodal evaluation notebook:
uv run jupyter notebook evaluation.ipynbEvaluates all three capabilities with:
ragas/answer_relevancy— is the answer relevant to the question?langevals/llm_answer_match— does the output match the expected output?
Satellite images render inline in the LangWatch UI as markdown images.
satellite-agent/
├── agent/
│ ├── agent.py # Agent factory
│ ├── prompts.py # System prompts
│ └── tools/
│ ├── knowledge_base/ # Davis Instruments docs (6 articles)
│ │ ├── documents.py # Embedded knowledge documents
│ │ ├── search.py # Weighted keyword search
│ │ └── tool.py # @tool-decorated search function
│ ├── satellite/ # NDVI analysis via OpenAI Vision
│ │ └── tool.py # @tool-decorated image analysis
│ └── station_data/ # Fleet inventory management
│ ├── loader.py # Excel data loader
│ ├── models.py # StationRecord dataclass
│ ├── search.py # Station filtering & battery status
│ └── tool.py # @tool-decorated station search
├── data/
│ ├── satellite/ # 11 satellite images (01–11.png)
│ └── station_inventory.xlsx # Station fleet data
├── tests/
│ └── test_satellite_scenarios.py # Scenario-based agent simulations
├── evaluation.ipynb # Multimodal evaluation notebook
├── main.py # CLI entry point
├── pyproject.toml
└── .env.example
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Your OpenAI API key (required) |
LANGWATCH_API_KEY |
Your LangWatch API key (required for tracing/evals) |
The agent uses gpt-5-mini by default. Change the model in agent/agent.py.
- Strands Agents SDK — model-driven AI agent framework by AWS
- OpenAI — LLM provider (including Vision for satellite analysis)
- LangWatch — tracing, evaluations, and monitoring
- LangWatch Scenario — simulation-based agent testing