Skip to content

Conversation

@j-atkins
Copy link
Collaborator

@j-atkins j-atkins commented Nov 7, 2025

Overview

This PR introduces substantial refactoring, new features, and documentation updates to VirtualShip. Overall, the changes aim to unify configuration, centralise instrument logic and overhaul data ingestion.

Note

Update: this PR will incorporate changes intended for a v0.3 release (title now also updated) - it was previously slated to be a v1 release. Since then, we have decided to delay the v1 release until the public API is fully stable. The roadmap to a v1 is outlined via the Issue tracker with those marked with a "v1-dev" label.

Major Changes

  1. Configuration unification

    • schedule.yaml and ship_config.yaml are now merged into a single expedition.yaml. This simplifies configs and streamlines things up a bit.
  2. Instrument logic refactor

    • Instrument logic is centralised and consolidated in the instruments/ directory, using a base class and subclass structure.
      • To elaborate, there is now a base Instrument class in base.py which handles universal instrument simulation logic, and then each instrument has a sub-class (e.g. CTDInstrument in ctd.py) which handles instrument-specific logic. This leads to a lot less repetition across the codebase.
    • The expedition/ directory is now largely empty, but schedule_simulate.py remains.
      • I intend on refactoring this in a future PR (see To Dos / Nice to haves section), which should mean that the expedition/ directory can evaporate.
    • Logic from the old do_expedition.py has been moved to _run.py in the CLI.
  3. Data ingestion overhaul

    • Data is now ingested directly via copernicusmarine data 'streaming'. This step takes place in the virtualship run step, and the previous fetch logic is fully removed.
    • virtualship run also now supports linking to pre-downloaded data via a -from-data CLI argument. This allows users to point to local or remote data stores (e.g., SURF) instead of downloading on-the-fly, if preferred.
      • Note, however, this comes with some relatively strict requirements on data structure and naming conventions.
      • These are thoroughly documented in pre_download_data.md and new tests are introduced to check that the code implementation doesn't drift from the expectations laid out in the documentation.
      • Side note: I wonder whether returning to the conversation about intake (Use intake for data fetching #190) could help here to ease some of the rigidity...? @VeckoTheGecko thoughts?
  4. Performance considerations

    • The current implementation of FieldSet ingestion is handled by loading each variable as its own FieldSet and combining them by adding each to a base FieldSet, rather than building all variables in one go. This is due to complications arising from certain variables (BGC in particular) coming from different products and temporal resolutions.
      • So, an open question of whether we can be clevere with this in the future to improve performance...?
    • Local, pre-downloaded data access is generally faster given sufficient resources, but performance varies by instrument. Of note, underway instruments (ADCP, underwater ST) are quite slow when using local data (faster when streaming). The fact it's just the underway instruments though may suggest it's something to do with the fact they are the only two ScipyParticle-based instruments. So this can probably be fixed in a future PR and/or when Parcels v4 is integrated.
    • This is all to say that proper, machine-specification-based performance testing is to follow! I will report back on this soon.
  5. Tests

    • Tests updated to reflect code changes and make first attempts to improve overall coverage and reliability.
      • I think we can continue improving this in future PRs to generate a more robust test suite.

Other Considerations

  • Removing space_time_region is not feasible currently due to RAM limitations. I think this can be revisted at the stage of integration with Parcels v4.
  • Quickstart and primary README need further updates.

To Dos / Nice-to-Haves

  • Improve Copernicus credentials logic for collaborative environments.
  • Add ARGO_BGC instrument.
  • Support multiple drifter deployments per waypoint.
  • Refactor simulate_schedule.py to base/subclass structure; remove expedition/ dir.
  • Expand documentation. I am particularly interested in making an instrument 'recipe book' which runs through how to add new instruments to VirtualShip.
    • The refactoring changes in this PR already lend themselves to a standard routine for adding instruments. That is, each instrument file (e.g. ctd.py is already set up so that each takes an instrument dataclass, a particle class, kernels, and Instrument sub-class.
  • Add support for Python 3.13/3.14.
  • Pixi integration.
  • Documentation contributions.
    • @iuryt has expressed interest in helping with this. @reint-fischer is this something you would also like to contribute to?
    • Things like the instrument 'recipe book' and generally ensuring that the documentation is suitable for users are priorities from my perspective!

Please note...!

  • XBT simulations are currently broken (test failures present). I will fix this shortly, but I wanted to get everyone reviewing the main changes first. I think this is just a small bug in the XBT-specific logic, not the core framework.
  • I have put warnings and/or TODOs in the Quickstart and README about pending updates. I will add these soon.

Reviewers...

@erikvansebille @VeckoTheGecko @ammedd

  • Please focus on the high-level logic changes, e.g. the unified config YAML, refactored and centralised instrument logic, and new data ingestion workflows.
  • Performance feedback and documentation suggestions are welcome but lower priority.
  • I will get the XBT fixed as soon as possible, but wanted to prioritise review of the main changes first.

j-atkins and others added 30 commits October 21, 2025 13:38
Consolidates/unifies the old dual ship_config.yaml and schedule.yaml config files into one expedition.yaml file, in line with  v1 dev objectives.
Fix unnecessarily divergent branches.
@VeckoTheGecko
Copy link
Collaborator

A bit of a general comment, and tangential to this PR (I know that this PR doesn't really touch the configs - maybe for future)

This is an ERD diagram for the Expedition file - in particular I'm interested in the instrument configs

In the /design-doc.md we didn't really discuss the responsibility of the instrument configs - what role do they play in the software (i.e., is it possible to have multiple different configurations for the same instrument?)

Perhaps in future we can define instruments to be used in the expedition, and then refer to specific instrument IDs in the waypoints (i.e., instead of InstrumentType we can actually identify specific instrumnets)

open in Mermaid editor (much easier to see)

---
title: Entity Relationship Diagram for the Expedition Pydantic model
---
erDiagram
	direction TB
	Expedition {
		Schedule schedule  ""  
		InstrumentsConfig instruments_config  ""  
		ShipConfig ship_config  ""  
	}
	Schedule {
		list[Waypoint] waypoints  ""  
		Optional[SpaceTimeRegion] space_time_region  ""  
	}
	InstrumentsConfig {
		ArgoFloatConfig_optional argo_float_config  ""  
		ADCPConfig_optional adcp_config  ""  
		CTDConfig_optional ctd_config  ""  
		CTD_BGCConfig_optional ctd_bgc_config  ""  
		ShipUnderwaterSTConfig_optional ship_underwater_st_config  ""  
		DrifterConfig_optional drifter_config  ""  
		XBTConfig_optional xbt_config  ""  
	}
	ShipConfig {
		float ship_speed_knots  ""  
	}
	Waypoint {
		Location location  ""  
		datetime_optional time  ""  
		InstrumentType_optional instrument  ""  
	}
	SpaceTimeRegion {
		SpatialRange spatial_range  ""  
		TimeRange time_range  ""  
	}
	Location {
		float latitude  ""  
		float longitude  ""  
	}
	InstrumentType {
	}
	SpatialRange {
		float minimum_longitude  ""  
		float maximum_longitude  ""  
		float minimum_latitude  ""  
		float maximum_latitude  ""  
		float_optional minimum_depth  ""  
		float_optional maximum_depth  ""  
	}
	TimeRange {
		datetime_optional start_time  ""  
		datetime_optional end_time  ""  
	}
	ArgoFloatConfig {
		float min_depth_meter  ""  
		float max_depth_meter  ""  
		float drift_depth_meter  ""  
		float vertical_speed_meter_per_second  ""  
		float cycle_days  ""  
		float drift_days  ""  
	}
	ADCPConfig {
		float max_depth_meter  ""  
		int num_bins  ""  
		timedelta period  ""  
	}
	CTDConfig {
		timedelta stationkeeping_time  ""  
		float min_depth_meter  ""  
		float max_depth_meter  ""  
	}
	CTD_BGCConfig {
		timedelta stationkeeping_time  ""  
		float min_depth_meter  ""  
		float max_depth_meter  ""  
	}
	ShipUnderwaterSTConfig {
		timedelta period  ""  
	}
	DrifterConfig {
		float depth_meter  ""  
		timedelta lifetime  ""  
	}
	XBTConfig {
		float min_depth_meter  ""  
		float max_depth_meter  ""  
		float fall_speed_meter_per_second  ""  
		float deceleration_coefficient  ""  
	}

	Expedition||--||Schedule:"contains"
	Expedition||--||InstrumentsConfig:"contains"
	Expedition||--||ShipConfig:"contains"
	Schedule||--|{Waypoint:"contains"
	Schedule||--o|SpaceTimeRegion:"optional"
	Waypoint||--||Location:"has"
	Waypoint||--o|InstrumentType:"optional (single or list)"
	SpaceTimeRegion||--||SpatialRange:"contains"
	SpaceTimeRegion||--||TimeRange:"contains"
	InstrumentsConfig||--o|ArgoFloatConfig:"optional"
	InstrumentsConfig||--o|ADCPConfig:"optional"
	InstrumentsConfig||--o|CTDConfig:"optional"
	InstrumentsConfig||--o|CTD_BGCConfig:"optional"
	InstrumentsConfig||--o|ShipUnderwaterSTConfig:"optional"
	InstrumentsConfig||--o|DrifterConfig:"optional"
	InstrumentsConfig||--o|XBTConfig:"optional"

Loading

@j-atkins
Copy link
Collaborator Author

A bit of a general comment, and tangential to this PR (I know that this PR doesn't really touch the configs - maybe for future)

This is an ERD diagram for the Expedition file - in particular I'm interested in the instrument configs

In the /design-doc.md we didn't really discuss the responsibility of the instrument configs - what role do they play in the software (i.e., is it possible to have multiple different configurations for the same instrument?)

Perhaps in future we can define instruments to be used in the expedition, and then refer to specific instrument IDs in the waypoints (i.e., instead of InstrumentType we can actually identify specific instrumnets)

open in Mermaid editor (much easier to see)

---
title: Entity Relationship Diagram for the Expedition Pydantic model
---
erDiagram
	direction TB
	Expedition {
		Schedule schedule  ""  
		InstrumentsConfig instruments_config  ""  
		ShipConfig ship_config  ""  
	}
	Schedule {
		list[Waypoint] waypoints  ""  
		Optional[SpaceTimeRegion] space_time_region  ""  
	}
	InstrumentsConfig {
		ArgoFloatConfig_optional argo_float_config  ""  
		ADCPConfig_optional adcp_config  ""  
		CTDConfig_optional ctd_config  ""  
		CTD_BGCConfig_optional ctd_bgc_config  ""  
		ShipUnderwaterSTConfig_optional ship_underwater_st_config  ""  
		DrifterConfig_optional drifter_config  ""  
		XBTConfig_optional xbt_config  ""  
	}
	ShipConfig {
		float ship_speed_knots  ""  
	}
	Waypoint {
		Location location  ""  
		datetime_optional time  ""  
		InstrumentType_optional instrument  ""  
	}
	SpaceTimeRegion {
		SpatialRange spatial_range  ""  
		TimeRange time_range  ""  
	}
	Location {
		float latitude  ""  
		float longitude  ""  
	}
	InstrumentType {
	}
	SpatialRange {
		float minimum_longitude  ""  
		float maximum_longitude  ""  
		float minimum_latitude  ""  
		float maximum_latitude  ""  
		float_optional minimum_depth  ""  
		float_optional maximum_depth  ""  
	}
	TimeRange {
		datetime_optional start_time  ""  
		datetime_optional end_time  ""  
	}
	ArgoFloatConfig {
		float min_depth_meter  ""  
		float max_depth_meter  ""  
		float drift_depth_meter  ""  
		float vertical_speed_meter_per_second  ""  
		float cycle_days  ""  
		float drift_days  ""  
	}
	ADCPConfig {
		float max_depth_meter  ""  
		int num_bins  ""  
		timedelta period  ""  
	}
	CTDConfig {
		timedelta stationkeeping_time  ""  
		float min_depth_meter  ""  
		float max_depth_meter  ""  
	}
	CTD_BGCConfig {
		timedelta stationkeeping_time  ""  
		float min_depth_meter  ""  
		float max_depth_meter  ""  
	}
	ShipUnderwaterSTConfig {
		timedelta period  ""  
	}
	DrifterConfig {
		float depth_meter  ""  
		timedelta lifetime  ""  
	}
	XBTConfig {
		float min_depth_meter  ""  
		float max_depth_meter  ""  
		float fall_speed_meter_per_second  ""  
		float deceleration_coefficient  ""  
	}

	Expedition||--||Schedule:"contains"
	Expedition||--||InstrumentsConfig:"contains"
	Expedition||--||ShipConfig:"contains"
	Schedule||--|{Waypoint:"contains"
	Schedule||--o|SpaceTimeRegion:"optional"
	Waypoint||--||Location:"has"
	Waypoint||--o|InstrumentType:"optional (single or list)"
	SpaceTimeRegion||--||SpatialRange:"contains"
	SpaceTimeRegion||--||TimeRange:"contains"
	InstrumentsConfig||--o|ArgoFloatConfig:"optional"
	InstrumentsConfig||--o|ADCPConfig:"optional"
	InstrumentsConfig||--o|CTDConfig:"optional"
	InstrumentsConfig||--o|CTD_BGCConfig:"optional"
	InstrumentsConfig||--o|ShipUnderwaterSTConfig:"optional"
	InstrumentsConfig||--o|DrifterConfig:"optional"
	InstrumentsConfig||--o|XBTConfig:"optional"

Loading

Yes, I think this could also relate to something @iuryt , @ammedd and I were discussing offline - whether we could combine e.g. CTD and CTD_BGC and the config/which variables are on or off dependent on some user inputs... food for thought and indeed potentially something for a future PR!

Copy link
Collaborator

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great effort and a great step in the right direction - I've left some suggestions

def __init__(
self,
name: str,
expedition: "Expedition",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that is accessed in the Instrument class is self.expedition.schedule.space_time_region - passing in the whole expedition is a bit overkill.

Can we just update to space_time_region

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some instruments rely on other parts of the expedition object. For example, ADCPInstrument takes:

MAX_DEPTH = self.expedition.instruments_config.adcp_config.max_depth_meter
...
NUM_BINS = self.expedition.instruments_config.adcp_config.num_bins

So, for now, I favour leaving it as it is (even if overkill for the most part). That being said, I think underway instruments (including ADCPs) can be overhauled (new Issue to follow). At which point, I think some of this can be tidied up and then we can update to just using space_time_region.

@@ -0,0 +1,284 @@
import abc
Copy link
Collaborator

@VeckoTheGecko VeckoTheGecko Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import abc
from __future__ import annotations
import abc

then you don't have to do "Expedition" in the type annotation


def __init__(
self,
name: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is not needed. Where you call self.name you can instead do self.__class__.__name__

"depth": "depth",
} # same dimensions for all instruments
self.add_bathymetry = add_bathymetry
self.allow_time_extrapolation = allow_time_extrapolation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is actually used anywhere in the class?

Copy link
Collaborator Author

@j-atkins j-atkins Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be fed through to _generate_fieldset() - will add back in!

Comment on lines 47 to 48
self.directory = directory
self.filenames = filenames
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think directory and filenames are used in this class

Comment on lines +208 to +216
class ArgoFloatConfig(pydantic.BaseModel):
"""Configuration for argos floats."""

min_depth_meter: float = pydantic.Field(le=0.0)
max_depth_meter: float = pydantic.Field(le=0.0)
drift_depth_meter: float = pydantic.Field(le=0.0)
vertical_speed_meter_per_second: float = pydantic.Field(lt=0.0)
cycle_days: float = pydantic.Field(gt=0.0)
drift_days: float = pydantic.Field(gt=0.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the Pydantic models are in the models folder - this makes it clear what is configuration and what is the rest of the code. Although the instrument config is not with the instrument logic, it is with all the other config related code which is nice to easily see the structure of the overall config. If we move configuration code into the rest of the codebase we lose this separation and would significantly increase our chance of circular imports.

spec = self.buffer_spec if spec_type == "buffer" else self.limit_spec
return spec.get(key) if spec and spec.get(key) is not None else default

def _find_files_in_timerange(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a stand along function and not a method? (no dependency on self)

from virtualship.models import Expedition


class Instrument(abc.ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in future we can further refactor out the dataset component (which I think would make it easier to work with intake etc)


By default, VirtualShip will automatically 'stream' data from the Copernicus Marine Service via the [copernicusmarine toolbox](https://github.com/mercator-ocean/copernicus-marine-toolbox?tab=readme-ov-file). However, for users who wish to manage data locally, it is possible to pre-download the required datasets and feed them into VirtualShip simulations.

<!-- TODO: quickstart guide needs full update! -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the Quickstart guide is a priority after this PR gets merged, as we have students who will be working with VirtualShip on November 27th again


class ConfigError(RuntimeError):
class InstrumentsConfigError(RuntimeError):
"""An error in the config."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "the config" refer to expedition.yaml?


def simulate(self, measurements, out_path) -> None:
"""Simulate ADCP measurements."""
MAX_DEPTH = self.expedition.instruments_config.adcp_config.max_depth_meter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For realism I would limit this to a max of 1000 and the max from the config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean max should be hard coded to 1000 or determined from the config? Or did you mean max and min?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please determine it from the config. But if students set it way to high, let's have a limit on that. To my understanding there is a hard limit from technology and the amount of scattering material in the ocean. In my memory it was 1000m but I now find many sources that say 1300m (like https://oceanexplorer.noaa.gov/technology/acoust-doppler/) or even 1600m.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood - I'll add in a warning and a revert to default if max depth is too deep.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've left it as users being able to use their prescribed max depth even if it exceeds the authentic limit, in case it is intentional and there is a non-education/non-authentic sea-expedition related reason. The warning message will be clear in stating that it's inauthentic though and that performance may suffer.

Adjust t_min to the first day of the month based on schedule start date.
@j-atkins j-atkins changed the title V1 dev v0.3 dev Nov 20, 2025
@j-atkins j-atkins merged commit 3bc4324 into main Nov 21, 2025
10 of 11 checks passed
@j-atkins j-atkins deleted the v1-dev branch November 21, 2025 09:10
@j-atkins j-atkins mentioned this pull request Nov 21, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants