InField Agent — Weather Station Advisory System

A multimodal AI agent for weather station fleet management, built with Strands Agents SDK and tested with Scenario. This project demonstrates multimodal tool calling (satellite imagery + text), knowledge base retrieval, and end-to-end evaluation with LangWatch.

🎯 Project Overview

The InField Agent assists field technicians and agronomists managing Davis Instruments weather stations. It showcases:

Multimodal tool calling — satellite image analysis with NDVI estimation via vision models
Knowledge base retrieval — calibration procedures grounded in documentation
Fleet monitoring — station inventory, battery health, reporting gaps
Evaluation with LangWatch — experiments in Jupyter notebooks with inline satellite images
Simulation testing — multi-turn conversation testing with Scenario

🏗️ Architecture

Three Capabilities

Knowledge Base 📚
- Calibration procedures for Davis Instruments Vantage Pro2
- Temperature, humidity, wind direction, barometric pressure
- Keyword search with weighted scoring (title 3x, category 2x, content 1x)
Station Status 📊
- Fleet inventory from Excel data
- Battery health monitoring (flags voltage < 3.0V)
- Stale station detection (no data in 90+ days)
- Filtering by country, region, company
Satellite Imagery 🛰️
- NDVI estimation from satellite images using OpenAI Vision
- Vegetation coverage percentage and land type classification
- Confidence levels for analysis results

📋 Requirements

Python 3.10+
OpenAI API key
LangWatch API key
uv package manager

🛠️ Installation

Clone the project:

git clone https://github.com/langwatch/satellite-agent.git
cd satellite-agent

Install dependencies:
```
uv venv && uv pip install -e .
```

Set up environment variables:

cp .env.example .env
# Edit .env and add your keys

OPENAI_API_KEY=your-openai-api-key
LANGWATCH_API_KEY=your-langwatch-api-key

🎮 Usage

Run the Agent

uv run python main.py

=== InField Agent (Strands) ===
Type 'quit' to exit.

You: How do I calibrate the barometric pressure on my Vantage Pro2?

Agent: To calibrate the barometric pressure on your Vantage Pro2:
       1. Obtain a known reference pressure...
       2. Enter the calibration offset through the console setup menu...

You: Which stations have low battery?

Agent: The following stations have battery voltage below 3.0V:
       - Station 25_101 (NL, 2.8V)
       - Station 25_205 (DE, 2.6V)

You: Analyze satellite image 01 for NDVI.

Agent: Based on the satellite image analysis:
       - NDVI estimate: 0.65
       - Vegetation coverage: 72%
       - Dominant land types: cropland, grassland

🧪 Testing

Scenario Simulations

Multi-turn conversation tests using Scenario:

uv run pytest tests/ -m agent_test -v

Tests include:

Test	What it validates
`test_basic_ndvi_analysis`	NDVI estimation with coverage and land types
`test_vegetation_health_inquiry`	Broad vegetation health assessment
`test_multi_turn_vegetation_comparison`	Comparing two satellite images across turns
`test_ndvi_coverage_estimation`	Detailed coverage data and land classification
`test_customer_follow_up_on_ndvi_meaning`	Follow-up grounded in tool results
`test_invalid_image_handling`	Graceful handling of non-existent images

@pytest.mark.agent_test
@pytest.mark.asyncio
async def test_basic_ndvi_analysis():
    result = await scenario.run(
        name="basic NDVI analysis",
        description="A farmer asks the agent to analyze satellite image 01 for NDVI estimation.",
        agents=[
            InFieldAgent(),
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(),
        ],
        script=[
            scenario.user("Can you analyze satellite image 01 and tell me the NDVI?"),
            scenario.agent(),
            scenario.judge(criteria=[
                "Agent provides an NDVI estimate (a number between -1.0 and 1.0)",
                "Agent mentions vegetation coverage percentage",
                "Agent describes the dominant land types visible in the image",
            ]),
        ],
    )
    assert result.success

LangWatch Evaluations

Run the multimodal evaluation notebook:

uv run jupyter notebook evaluation.ipynb

Evaluates all three capabilities with:

ragas/answer_relevancy — is the answer relevant to the question?
langevals/llm_answer_match — does the output match the expected output?

Satellite images render inline in the LangWatch UI as markdown images.

📁 Project Structure

satellite-agent/
├── agent/
│   ├── agent.py                     # Agent factory
│   ├── prompts.py                   # System prompts
│   └── tools/
│       ├── knowledge_base/          # Davis Instruments docs (6 articles)
│       │   ├── documents.py         # Embedded knowledge documents
│       │   ├── search.py            # Weighted keyword search
│       │   └── tool.py              # @tool-decorated search function
│       ├── satellite/               # NDVI analysis via OpenAI Vision
│       │   └── tool.py              # @tool-decorated image analysis
│       └── station_data/            # Fleet inventory management
│           ├── loader.py            # Excel data loader
│           ├── models.py            # StationRecord dataclass
│           ├── search.py            # Station filtering & battery status
│           └── tool.py              # @tool-decorated station search
├── data/
│   ├── satellite/                   # 11 satellite images (01–11.png)
│   └── station_inventory.xlsx       # Station fleet data
├── tests/
│   └── test_satellite_scenarios.py  # Scenario-based agent simulations
├── evaluation.ipynb                 # Multimodal evaluation notebook
├── main.py                          # CLI entry point
├── pyproject.toml
└── .env.example

🔧 Configuration

Variable	Description
`OPENAI_API_KEY`	Your OpenAI API key (required)
`LANGWATCH_API_KEY`	Your LangWatch API key (required for tracing/evals)

The agent uses gpt-5-mini by default. Change the model in agent/agent.py.

🤝 Built With

Strands Agents SDK — model-driven AI agent framework by AWS
OpenAI — LLM provider (including Vision for satellite analysis)
LangWatch — tracing, evaluations, and monitoring
LangWatch Scenario — simulation-based agent testing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
data		data
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
evaluation.ipynb		evaluation.ipynb
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InField Agent — Weather Station Advisory System

🎯 Project Overview

🏗️ Architecture

Three Capabilities

📋 Requirements

🛠️ Installation

🎮 Usage

Run the Agent

🧪 Testing

Scenario Simulations

LangWatch Evaluations

📁 Project Structure

🔧 Configuration

🤝 Built With

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InField Agent — Weather Station Advisory System

🎯 Project Overview

🏗️ Architecture

Three Capabilities

📋 Requirements

🛠️ Installation

🎮 Usage

Run the Agent

🧪 Testing

Scenario Simulations

LangWatch Evaluations

📁 Project Structure

🔧 Configuration

🤝 Built With

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages