FunQ

FunQ is an R package that provides methodologies for functional data analysis from a quantile regression perspective. It currently implements the Functional Quantile Principal Component Analysis (FQPCA) methodology explained in detail in this paper, which extends traditional Functional Principal Component Analysis (FPCA) to the quantile regression framework. This allows for a richer understanding of functional data by capturing the full curve and time-specific probability distributions underlying individual measurements, especially useful when data distributions are skewed, vary across subjects, or contain outliers.

Installation

You can install the development version of FunQ from GitHub with:

# install.packages("devtools")
devtools::install_github("alvaromc317/FunQ")

Overview

FunQ provides functions for implementing the Functional Quantile Principal Component Analysis methodology. Key features include:

Flexible Input Formats: accepts functional data as an $(I\times T)$ matrix, a tf functional vector, or a dataframe containing a column with a tf functional vector (using the data and colname parameters).
Handling Missing Data: capable of dealing with irregular time grids and missing data.
Smoothness Control: allows control over the smoothness of the results using two approaches:
- Spline Degrees of Freedom: Using the splines.df parameter to set the degrees of freedom in spline basis reconstruction.
- Second Derivative Penalty: experimental method involving a second derivative penalty on the spline coefficients by setting penalized = TRUE and specifying lambda.ridge. Requires setting splines.method = "conquer".

Additionally, the package includes functions for cross-validation of parameters such as splines.df (cross_validation_df) and lambda.ridge (cross_validation_lambda), using the quantile error as the reference prediction error metric (quantile_error function).

Example 1: basics of `fqpca`

Let’s start by loading the necessary libraries:

# We use the tidy structure of the tidyfun package to deal with the functional data
devtools::install_github("tidyfun/tidyfun")

library(FunQ)
library(fda)
library(tidyverse)
library(tidyfun)

We use the tidyfun package for handling functional data structures.

Generating synthetic data

We generate a synthetic dataset with 200 observations taken every 10 minutes over a day (144 time points). The data follows the formula:

$$x_i = c_{i1}(\text{sin}(t)+\text{sin}(0.5t))+\varepsilon_i$$

where

$c_1\sim N(0,0.4)$
$\varepsilon_i\sim\chi^2(3)$

set.seed(5)

n = 200
t = 144
time.points = seq(0, 2*pi, length.out=t)
Y = matrix(rep(sin(time.points) + sin(0.5*time.points), n), byrow=TRUE, nrow=n)
Y = Y + matrix(rnorm(n*t, 0, 0.4), nrow=n) + rchisq(n, 3)

Y[1:50,] %>% tf::tfd() %>% plot(alpha=0.2)

The above plot visualizes a subset of the data generated this way. To simulate missing data and test the method’s robustness, we introduce 50% missing observations:

Y[sample(n*t, as.integer(0.50*n*t))] = NA

Aplying `fqpca`

We split the data into training and testing sets:

Y.train = Y[1:150,]
Y.test = Y[151:n,]

We then apply the fqpca function to decompose the data in terms of the median (quantile.value = 0.5), providing a robust alternative to mean-based predictions:

results = fqpca(data=Y.train, npc=2, quantile.value=0.5)

loadings = results$loadings
scores = results$scores

# Recover Y.train based on decomposition
Y.train.estimated = fitted(results, pve = 0.95)

For new observations, we can compute scores using the existing loadings:

scores.test = predict(results, newdata=Y.test)
Y.test.estimated = scores.test %*% t(loadings)

Plotting the computed loadings:

plot(results, pve=0.95)

Calculating the quantile error between the reconstructed and true data:

quantile_error(Y=Y.train, Y.pred=Y.train.estimated, quantile.value=0.5)
#> [1] 0.1597332

Cross validation

We perform cross-validation to optimize the splines.df parameter:

splines.df.grid = c(5, 10, 15, 20)
cv_result = cross_validation_df(Y, splines.df.grid=splines.df.grid, n.folds=3, verbose.cv=F)
cv_result$error.matrix
#>         Fold 1    Fold 2    Fold 3
#> [1,] 0.1623378 0.1685013 0.1676664
#> [2,] 0.1630681 0.1683739 0.1680962
#> [3,] 0.1644931 0.1689397 0.1683112
#> [4,] 0.1641285 0.1694750 0.1683882

The dimensions of the error matrix are (length(splines.df.grid), n.folds). We can determine the optimal degrees of freedom by taking the mean of each row and picking the minimum.

optimal_df = which.min(rowMeans(cv_result$error.matrix))
paste0('Optimal number of degrees of freedom: ', splines.df.grid[optimal_df])
#> [1] "Optimal number of degrees of freedom: 5"

Example 2: Using penalized `fqpca`

In this example, we demonstrate how to apply the second derivative penalty approach by setting penalized = TRUE and lambda.ridge = 0.001. This method is experimental but has shown promising results.

# Apply fqpca with penalization
results_penalized <- fqpca(data = Y.train, npc = 2, quantile.value = 0.5, penalized = TRUE, lambda.ridge = 0.001, splines.method = "conquer")

# Reconstruct the training data
Y.train.estimated_penalized <- fitted(results_penalized, pve = 0.95)

The package also includes a function that allows to perform cross validation on the hyper-parameter controlling the effect of a second derivative penalty on the splines.

cv_result = cross_validation_lambda(Y, lambda.grid=c(0, 1e-5, 1e-3), n.folds=3, verbose.cv=F)
cv_result$error.matrix
#>         Fold 1    Fold 2    Fold 3
#> [1,] 0.1686230 0.1659071 0.1675024
#> [2,] 0.1696160 0.1681106 0.1691590
#> [3,] 0.1696826 0.1679575 0.1695439

The dimensions of the error matrix are (length(lambda.grid), n.folds).

Example 3: The Canadian Weather dataset

Let’s explore the use of FunQ with the well-known Canadian Weather dataset. FunQ can handle:

Data matrices
tf vectors from the tidyfun package
data.frames containing tf vectors

Loading and Visualizing Data

matrix.data = t(fda::CanadianWeather$dailyAv[,,1])
data = tibble(temperature = tf::tfd(matrix.data, arg = 1:365),
              province = CanadianWeather$province)
head(data)
#> # A tibble: 6 × 2
#>                      temperature province     
#>                        <tfd_reg> <chr>        
#> 1 [1]: (1,-4);(2,-3);(3,-3); ... Newfoundland 
#> 2 [2]: (1,-4);(2,-4);(3,-5); ... Nova Scotia  
#> 3 [3]: (1,-4);(2,-4);(3,-5); ... Nova Scotia  
#> 4 [4]: (1,-1);(2,-2);(3,-2); ... Nova Scotia  
#> 5 [5]: (1,-6);(2,-6);(3,-7); ... Ontario      
#> 6 [6]: (1,-8);(2,-8);(3,-9); ... New Brunswick

Plotting the temperature data:

data %>% 
  ggplot(aes(y=temperature, color=province)) + 
  geom_spaghetti() + 
  theme_light()

Cross-Validation for `splines.df`

We perform cross-validation to find the optimal splines.df value:

splines.df.grid = c(5, 10, 15, 20)
cv_result = cross_validation_df(data=data, colname='temperature', splines.df.grid=splines.df.grid, n.folds=3, verbose.cv=F)
optimal_df = splines.df.grid[which.min(rowMeans(cv_result$error.matrix))]
paste0('Optimal number of degrees of freedom: ', optimal_df)
#> [1] "Optimal number of degrees of freedom: 20"

Building the Final Model

Using the optimal splines.df, we fit the fqpca model and examine the explained variance:

results = fqpca(data=data$temperature, npc=10, quantile.value=0.5, splines.df=optimal_df, seed=5)
cumsum(results$pve)
#>  [1] 0.8869913 0.9723314 0.9918136 0.9970323 0.9981916 0.9990270 0.9995228
#>  [8] 0.9997607 0.9999038 1.0000000

With 3 components, we achieve over 99% of explained variability. Plotting the components:

plot(results, pve = 0.99)

Computing various quantile levels

One of the advantages of FunQ is its ability to recover the distribution of data at different quantile levels. We can compute the 10%, 50%, and 90% quantile curves:

m01 = fqpca(data=data$temperature, npc=10, quantile.value=0.1, splines.df=15, seed=5)
m05 = fqpca(data=data$temperature, npc=10, quantile.value=0.5, splines.df=15, seed=5)
m09 = fqpca(data=data$temperature, npc=10, quantile.value=0.9, splines.df=15, seed=5)

Y01 = fitted(m01, pve = 0.99)
Y05 = fitted(m05, pve = 0.99)
Y09 = fitted(m09, pve = 0.99)

Now given an observation we can visualize it’s quantile curves along with the raw data

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github		.github
R		R
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
FunQ.Rproj		FunQ.Rproj
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

FunQ

Installation

Overview

Example 1: basics of `fqpca`

Generating synthetic data

Aplying `fqpca`

Cross validation

Example 2: Using penalized `fqpca`

Example 3: The Canadian Weather dataset

Loading and Visualizing Data

Cross-Validation for `splines.df`

Building the Final Model

Computing various quantile levels

About

Licenses found

Releases

Packages

Languages

License

Licenses found

alvaromc317/FunQ

Folders and files

Latest commit

History

Repository files navigation

FunQ

Installation

Overview

Example 1: basics of fqpca

Generating synthetic data

Aplying fqpca

Cross validation

Example 2: Using penalized fqpca

Example 3: The Canadian Weather dataset

Loading and Visualizing Data

Cross-Validation for splines.df

Building the Final Model

Computing various quantile levels

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example 1: basics of `fqpca`

Aplying `fqpca`

Example 2: Using penalized `fqpca`

Cross-Validation for `splines.df`

Packages