Skip to content

Conversation

@harshagr70
Copy link
Contributor

@harshagr70 harshagr70 commented Aug 21, 2025

This PR introduces an optional flat-file bypass for get.trait.data(), allowing trait and prior data to be loaded directly from a CSV specified via <file_path> in <pfts> within settings.xml. When a file path is provided, the function reads PFTs, priors, and trait names from the flat file instead of querying BETYdb, enabling DB-independent runs (useful for offline workflows, testing, and CI). If no file path is specified, the original database workflow remains unchanged.

fixes a part of #3602

@github-actions github-actions bot added the Base label Aug 21, 2025
Copy link
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this important step in the decoupling process!

A few requests

  • Could you please update the docs
  • Add a test that uses the file option

Questions:

  • Is there a reason not to have an argument to specify the location of the file? It seems having an input_file argument to would make the option more clear and straightforward to use.
  • Will this require updating query_pfts and query_priors so that they can also take a file?

@infotroph
Copy link
Member

infotroph commented Aug 26, 2025

Is there a reason not to have an argument to specify the location of the file?

I'm not opposed and I agree it'd make this easier to call directly, but am comfortable leaving it for a separate PR (and thus also punting on changes in query_pfts and query_priors) because the existing trait workflow already assumes each step modifies a settings object.

+1 to all the other suggestions

@harshagr70
Copy link
Contributor Author

Yes , it would be good if we keep that to next separate PR , or i can just add a commit in this PR to make it pass as an argument .
What you suggest now @infotroph ?

@harshagr70
Copy link
Contributor Author

@dlebauer , we don't need to specifically change the functions : query_pfts and query_priors , as we completely bypass these functions when using the flatfile , just filter data and pass the data frame to the downstream function get.trait.data.pft

Yes i will update the docs and write a test as well !

I have added the support to pass the file_path as in argument directly meanwhile keeping the logic of passing the path in the settings intact as well .

@dlebauer
Copy link
Member

@harshagr70 re: "we completely bypass these functions [query_pfts and query_priors] when using the flatfile" - perhaps out of scope for this PR, but where will the [pft,variable,prior] table used in the meta-analysis come from?

if(is.null(file_path)){
file.path <- pfts$file_path
}
use_flatfile <- !is.null(file_path) && file.exists(file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest throwing an error if file_path is specified but the file doesn't exist, rather than revert to DB mode (which would be surprising to the user).

PEcAn.logger::logger.info("Using flat file for trait data instead of database")

# Load flat file as data.frame
trait_data_flat <- read.csv(pfts$file_path, stringsAsFactors = FALSE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trait_data_flat <- read.csv(pfts$file_path, stringsAsFactors = FALSE)
trait_data_flat <- read.csv(file_path, stringsAsFactors = FALSE)

@infotroph
Copy link
Member

where will the [pft,variable,prior] table used in the meta-analysis come from?

@dlebauer It will be a CSV file specified by the user as settings$pft$file_path; I propose we try that out a bit and refine if that turns out to be a hassle.

@harshagr70 when we discussed this live we realized it could use a line in the documentation -- can you add that, please?

Comment on lines +46 to +53
#check for flatfile path if present use it
file_path <- input_file
if(is.null(file_path)){
file.path <- pfts$file_path
}
use_flatfile <- !is.null(file_path) && file.exists(file_path)

if (use_flatfile) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible implementation of the error-if-not-exist pattern (If you like this, need to also add @importFrom rlang %||% in the Roxygen block above):

Suggested change
#check for flatfile path if present use it
file_path <- input_file
if(is.null(file_path)){
file.path <- pfts$file_path
}
use_flatfile <- !is.null(file_path) && file.exists(file_path)
if (use_flatfile) {
#check for flatfile path, if present use it
file_path <- input_file %||% pfts$file_path
if (!is.null(file_path) {
if (!file.exists(file_path) {
PEcAn.logger::logger.error("trait data file not found at specified path", sQuote(file_path))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants