Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions docs/source/discovery_methods/LaHiCaSI/LaHiCaSI.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
LaHiCaSI (Latent Hierarchical Causal Structure Learning)
==========================================================

Introduction
------------

LaHiCaSI is a causal discovery method that focuses on learning hierarchical causal structures in the presence of latent variables. It operates in two main phases: first locating latent variables by identifying causal clusters, and then inferring the causal structure among these latent variables.

Usage
-----

.. code-block:: python

from cdmir.discovery.LaHiCaSl.LaHiCaSl import Latent_Hierarchical_Causal_Structure_Learning
import pandas as pd
import numpy as np

# Load or generate your dataset
# Example: Generate random data with 10 variables and 1000 samples
data = pd.DataFrame(np.random.randn(1000, 10), columns=[f'X{i}' for i in range(10)])

# Set significance level
alpha = 0.05

# Run LaHiCaSI algorithm
Latent_Hierarchical_Causal_Structure_Learning(data, alpha)

Parameters
----------

- **data**: Dataset of observed variables, typically a pandas DataFrame or numpy array.
- **alpha**: Statistical significance threshold (default: 0.05), used to determine the significance of causal relationships during the learning process.

Returns
-------

The function prints the resulting causal structure in the form of an adjacency matrix. It also generates intermediate results during the two-phase learning process.

Algorithm Overview
------------------

LaHiCaSI consists of two main phases:

1. **Phase I: Locate latent variables**
- **Stage I-S1**: Identify global causal clusters using `IdentifyGlobalCausalClusters`
- **Stage I-S2**: Determine latent variables by merging clusters using `Determine_Latent_Variables`
- **Stage I-S3**: Update active data and cluster information using `UpdateActiveData`

2. **Phase II: Infer causal structure among latent variables**
- Use `LocallyInferCausalStructure` to learn the causal relationships between the identified latent variables

The algorithm iteratively identifies clusters of variables that share common latent causes, updates the data representation to include these latent variables, and then infers the causal structure among them.

References
----------

[1] Xie F, Huang B, Chen Z, et al. Generalized independent noise condition for estimating causal structure with latent variables[J]. Journal of Machine Learning Research, 2024, 25(191): 1-61.
7 changes: 7 additions & 0 deletions docs/source/discovery_methods/LaHiCaSI/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
LaHiCaSI (Latent Hierarchical Causal Structure Learning)
=================

.. toctree::
:maxdepth: 2

LaHiCaSI
5 changes: 5 additions & 0 deletions docs/source/discovery_methods/constraint/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@ Constraint-based
:maxdepth: 2

PBSCM_PGF/pbscm_pgf

.. toctree::
:maxdepth: 2

pc/pc
55 changes: 55 additions & 0 deletions docs/source/discovery_methods/constraint/pc/pc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
PC (Peter-Clark Algorithm)
==========================

Introduction
------------

PC is a constraint-based causal discovery algorithm that infers causal relationships between variables from observational data. It starts with a complete undirected graph and iteratively removes edges based on conditional independence tests, then applies a set of rules to orient edges, resulting in a Partially Directed Acyclic Graph (PDAG) that represents causal relationships.

Usage
-----

.. code-block:: python

from cdmir.discovery.constraint.pc import PC
from cdmir.utils.independence import ConditionalIndependentTest

# Initialize PC algorithm with default parameters
pc = PC(alpha=0.05, verbose=False)

# Fit the model to data
pc.fit(data, var_names, ConditionalIndependentTest)

# Access results
causal_graph = pc.causal_graph
skeleton = pc.skeleton
sep_set = pc.sep_set

Parameters
----------

PC Class Parameters:

- alpha: Significance level for independence tests (default: 0.05)
- adjacency_search_method: Function for adjacency search phase (default: adjacency_search)
- verbose: Whether to print algorithm progress (default: False)

fit() Method Parameters:

- data: Input dataset containing variable observations
- var_names: List of variable names corresponding to the columns in data
- indep_cls: Conditional independence test class implementing the ConditionalIndependentTest interface
- args: Positional arguments passed to the independence test constructor
- kwargs: Keyword arguments passed to the independence test constructor

Returns
-------

- causal_graph: Partially Directed Acyclic Graph (PDAG) representing inferred causal relationships
- skeleton: Undirected graph representing the skeleton of causal relationships
- sep_set: Separation sets for node pairs, stored as a dictionary where keys are node pairs and values are sets of separating nodes

References
----------

[1] Spirtes, P., Glymour, C. N., Scheines, R., & Heckerman, D. (2000). Causation, prediction, and search. MIT press.
95 changes: 93 additions & 2 deletions docs/source/discovery_methods/functional_based/OLC/olc.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,93 @@
OLC (One latent Component)
=================================
OLC (One-Component Latent Confounder Detection)
====================================================

Introduction
------------

OLC is a functional-based causal discovery method that detects latent confounders using higher-order cumulants. Based on the paper "Causal Discovery with Latent Confounders Based on Higher-Order Cumulants", this algorithm identifies causal relationships and latent confounders by leveraging the properties of higher-order cumulants and conditional independence tests.

Usage
-----

.. code-block:: python

import numpy as np
from cdmir.discovery.funtional_based.one_component.olc import olc

# Generate or load data
# Example: 1000 samples, 5 variables
data = np.random.randn(1000, 5)

# Set significance thresholds
alpha = 0.05 # Primary significance level
beta = 0.01 # Secondary significance level for more stringent tests

# Run OLC algorithm
adjmat, coef = olc(data, alpha=alpha, beta=beta, verbose=False)

# Print results
print("Adjacency Matrix:")
print(adjmat)
print("\nCoefficient Matrix:")
print(coef)

Parameters
----------

- **data**: Input data matrix of shape (n_samples, n_variables), where rows represent samples and columns represent variables.
- **alpha**: Significance threshold for initial edge orientation tests (default: 0.05).
- **beta**: Significance threshold for more stringent tests involving higher-order cumulants (default: 0.01).
- **verbose**: If True, prints detailed information during the algorithm execution (default: False).

Returns
-------

- **adjmat**: Adjacency matrix of the discovered causal graph. The matrix has shape (n_variables + n_latents, n_variables + n_latents), where:
- 0: No edge
- 1: Directed edge
- 2: Undirected edge (ambiguous direction)
- Latent variables are indexed from n_variables onwards.

- **coef**: Coefficient matrix of the discovered causal relationships. It has the same shape as adjmat and contains the estimated coefficients for each directed edge.

Algorithm Overview
------------------

OLC follows a structured approach to causal discovery with latent confounder detection:

1. **Initialization**
- Create an undirected graph (UDG) with all possible edges
- Create an empty directed graph (CG) for causal relationships
- Initialize KCI (Kernel-based Conditional Independence) test for independence testing

2. **Edge Orientation Phase**
- Test edge orientations using linear regression and KCI tests
- Remove edges and orient them in the directed graph based on significance tests
- Normalize residuals and update data

3. **Clique Detection and Latent Confounder Detection**
- Identify cliques in the undirected graph
- Use surrogate regression to handle complex relationships
- Apply higher-order cumulant (4th order) analysis to detect latent confounders
- Update the adjacency matrix with detected latent confounders

4. **Refinement**
- Iteratively refine the graph structure using conditional independence tests
- Update surrogate variables and exogenous variables
- Adjust edge orientations based on cumulant-based tests

Key Techniques
--------------

- **Higher-Order Cumulants**: Uses 4th order cumulants to detect latent confounders that cannot be identified using traditional covariance-based methods.

- **KCI Tests**: Employs Kernel-based Conditional Independence tests for robust independence testing between variables and residuals.

- **Surrogate Regression**: Implements surrogate regression to handle complex causal relationships involving multiple variables.

- **Fisher's Combination Test**: Combines multiple p-values to enhance statistical power.

References
----------

.. [1] Cai R, Huang Z, Chen W, et al. Causal discovery with latent confounders based on higher-order cumulants[C]//International conference on machine learning. PMLR, 2023: 3380-3407.
1 change: 1 addition & 0 deletions docs/source/discovery_methods/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ In this section, we introduce discovery methods implemented in CDMIR.
Constraint-based methods <constraint/index>
Functional-based methods <functional_based/index>
Tensor-Rank methods <tensor_rank/index>
LaHiCaSI <LaHiCaSI/index>
Loading
Loading