Support Single Monthly Values in Scalar and Generic Classes#42
Support Single Monthly Values in Scalar and Generic Classes#42QSparks wants to merge 13 commits intoi38-generic-var-templatesfrom
Conversation
…nto i39-single-val-per-month
rod-glover
left a comment
There was a problem hiding this comment.
Overall excellent, nice clean code.
A few minor suggested changes, which you can adopt at your discretion, except I do recommend DRYing up the monthly data date-checking code, which is copy-pasted.
R/climdexGenericScalar.R
Outdated
| unique_months <- unique(format(valid_dates, "%Y-%m")) | ||
| day_of_month <- as.integer(format(valid_dates, "%d")) | ||
|
|
||
| # Check that the length of unique months matches the number of dates, ensuring only one value per month |
There was a problem hiding this comment.
Do we need to require 12 months? Or is it legitimate to have fewer than that?
There was a problem hiding this comment.
There is no requirement for inputs to have multiples of 12 months. I’ve added a test to reflect that.
R/climdexGenericScalar.R
Outdated
| northern.hemisphere = northern.hemisphere, | ||
| calendar = calendar | ||
| ) | ||
| return(obj) |
There was a problem hiding this comment.
Nice and clean.
Q: Would it be cleaner to simplify to
return climdexGenericScalar.raw(
...
);
and not use obj?
(This would apply in several places in this codebase.)
R/climdexGenericVector.R
Outdated
| } | ||
| if (length(secondary) == 0) { | ||
| stop("Secondary must not be an empty vector.") | ||
| } |
There was a problem hiding this comment.
Could you instead call check.generic.argument.validity on secondary? (Might need to make its stop messages more generic -- or parametrize the name of the data used in them.) That would repeat some checks but it might end up being simpler to be sure we are validating everything completely.
There was a problem hiding this comment.
I’ve moved the secondary and format checks into check.generic.argument.validity with a flag for when we pass the extra secondary and format vector parameters. I’ve moved the data & dates check to an internal validate_data_dates function that we call with scalar or primary and secondary vector data.
R/climdexGenericVector.R
Outdated
| # Check that all dates correspond to the 1st day of each month | ||
| if (!all(day_of_month == 1)) { | ||
| stop("Data must be on the 1st day of each month.") | ||
| } |
There was a problem hiding this comment.
The date checking is an exact copy of code in another function. Suggest we DRY that up in a check function called in each place.
There was a problem hiding this comment.
Moved to a util function check.single.month.dates.
| dates = dates, | ||
| northern.hemisphere = TRUE, | ||
| calendar = "gregorian" | ||
| ) |
There was a problem hiding this comment.
Do we need to check basic things about the output of climdexSingleMonthlyScalar.raw, such as
- data out is the same as scalar_data
- dates out same as dates in
- etc.
Especially as we are relying on it to check the result of climdexSingleMonthlyScalar.csv
There was a problem hiding this comment.
I've introduced a new validation function that applies to both scalar and vector, raw and CSV construction tests, to check basic items. Please note that the output is infilled with NA values and missing dates, and some filtering is necessary to align the output with the original input set.
| format = "polar", | ||
| northern.hemisphere = TRUE, | ||
| calendar = "gregorian" | ||
| ) |
There was a problem hiding this comment.
Same question re. basic things about output of climdexSingleMonthlyVector.ra
| checkTrue( | ||
| !inherits(result, "try-error"), | ||
| "Function raised an error despite valid monthly data." | ||
| ) |
There was a problem hiding this comment.
Not sure I understand: Why wouldn't an NA value make it throw an error?
There was a problem hiding this comment.
It's possible that climate data may be incomplete for certain dates, for example, due to sensor issues. While we cannot accept data with NA dates, we do accept missing data for all our input classes.
| ) | ||
|
|
||
| checkEquals(length(scalar_obj@data[!is.na(scalar_obj@data)]), n_months, "Large dataset not handled correctly") | ||
| } |
There was a problem hiding this comment.
Without myself checking for every possible error message, this looks thorough and like it covers all those error conditions. Well done.
Summary
This pull request introduces new wrapper functions to support single monthly values for both scalar and vector climate data. These functions create
ClimdexGenericScalarandClimdexGenericVectorobjects with a constraint that enforces only one value per month. Additionally, they automatically set themax.missing.daysparameter to+Inf, enabling more flexible data handling.Key Enhancements
New
climdexSingleMonthlyScalar.raw/.csvandclimdexSingleMonthlyVector.raw/.csvFunctions:max.missing.daysto+Inf.Data Validation:
dataanddatesvectors.dataanddatesare neither entirelyNAnor empty vectors.Index Calculations:
Test Cases:
New test cases ensure that:
NAvalues, empty vectors, multiple values per month).