Skip to content

Support Single Monthly Values in Scalar and Generic Classes#42

Open
QSparks wants to merge 13 commits intoi38-generic-var-templatesfrom
i39-single-val-per-month
Open

Support Single Monthly Values in Scalar and Generic Classes#42
QSparks wants to merge 13 commits intoi38-generic-var-templatesfrom
i39-single-val-per-month

Conversation

@QSparks
Copy link
Contributor

@QSparks QSparks commented Oct 30, 2024

Summary

This pull request introduces new wrapper functions to support single monthly values for both scalar and vector climate data. These functions create ClimdexGenericScalar and ClimdexGenericVector objects with a constraint that enforces only one value per month. Additionally, they automatically set the max.missing.days parameter to +Inf, enabling more flexible data handling.

Key Enhancements

New climdexSingleMonthlyScalar.raw / .csv and climdexSingleMonthlyVector.raw / .csv Functions:

  • Added functions to handle scalar or vector climate data with a single value per month constraint.
  • Automatically sets max.missing.days to +Inf.

Data Validation:

  • Single value per month constraint: Checks to ensure that there is no more than one value per month in input data and dates vectors.
  • Ensures that data and dates are neither entirely NA nor empty vectors.
  • Validates that each date corresponds to the 1st day of each month and raises errors if there is more than one value per month or if the dates are inconsistent.

Index Calculations:

  • Added warnings for cases where single-value-per-month data might impact index calculations (e.g., exact dates or monthly statistics).

Test Cases:

New test cases ensure that:

  • Single-value-per-month constraints are enforced correctly.
  • Errors are raised for invalid or inconsistent data (e.g., NA values, empty vectors, multiple values per month).
  • Index calculations produce accurate results with single-value-per-month data.

@QSparks QSparks self-assigned this Oct 30, 2024
@QSparks QSparks requested a review from rod-glover October 30, 2024 21:22
@QSparks QSparks marked this pull request as ready for review October 30, 2024 21:22
Copy link

@rod-glover rod-glover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall excellent, nice clean code.

A few minor suggested changes, which you can adopt at your discretion, except I do recommend DRYing up the monthly data date-checking code, which is copy-pasted.

unique_months <- unique(format(valid_dates, "%Y-%m"))
day_of_month <- as.integer(format(valid_dates, "%d"))

# Check that the length of unique months matches the number of dates, ensuring only one value per month

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to require 12 months? Or is it legitimate to have fewer than that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no requirement for inputs to have multiples of 12 months. I’ve added a test to reflect that.

northern.hemisphere = northern.hemisphere,
calendar = calendar
)
return(obj)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean.

Q: Would it be cleaner to simplify to

return climdexGenericScalar.raw(
  ...
);

and not use obj?

(This would apply in several places in this codebase.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
if (length(secondary) == 0) {
stop("Secondary must not be an empty vector.")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you instead call check.generic.argument.validity on secondary? (Might need to make its stop messages more generic -- or parametrize the name of the data used in them.) That would repeat some checks but it might end up being simpler to be sure we are validating everything completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve moved the secondary and format checks into check.generic.argument.validity with a flag for when we pass the extra secondary and format vector parameters. I’ve moved the data & dates check to an internal validate_data_dates function that we call with scalar or primary and secondary vector data.

# Check that all dates correspond to the 1st day of each month
if (!all(day_of_month == 1)) {
stop("Data must be on the 1st day of each month.")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date checking is an exact copy of code in another function. Suggest we DRY that up in a check function called in each place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a util function check.single.month.dates.

dates = dates,
northern.hemisphere = TRUE,
calendar = "gregorian"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check basic things about the output of climdexSingleMonthlyScalar.raw, such as

  • data out is the same as scalar_data
  • dates out same as dates in
  • etc.

Especially as we are relying on it to check the result of climdexSingleMonthlyScalar.csv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced a new validation function that applies to both scalar and vector, raw and CSV construction tests, to check basic items. Please note that the output is infilled with NA values and missing dates, and some filtering is necessary to align the output with the original input set.

format = "polar",
northern.hemisphere = TRUE,
calendar = "gregorian"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question re. basic things about output of climdexSingleMonthlyVector.ra

checkTrue(
!inherits(result, "try-error"),
"Function raised an error despite valid monthly data."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand: Why wouldn't an NA value make it throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that climate data may be incomplete for certain dates, for example, due to sensor issues. While we cannot accept data with NA dates, we do accept missing data for all our input classes.

)

checkEquals(length(scalar_obj@data[!is.na(scalar_obj@data)]), n_months, "Large dataset not handled correctly")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without myself checking for every possible error message, this looks thorough and like it covers all those error conditions. Well done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants