Skip to content

[Feature] <title>[Feature] Implement Bayesian (MCMC) parameter estimation for the Standard Tofts Model #76

@Aditya200247

Description

@Aditya200247

Feature description

Background & Motivation

Hello OSIPI team! I am highly interested in contributing to osipy, particularly regarding the blood-brain barrier module and robust parameter reconstruction initiatives. As I am preparing a proposal for GSoC 2026, I have been reviewing the current optimization methods in the repository.

Currently, the parameter estimation (e.g., for $K^{trans}$ and $v_e$) relies primarily on deterministic least-squares fitting. While standard, these methods can be highly sensitive to noise in DCE-MRI data and do not naturally provide uncertainty quantification. I would like to propose—and volunteer to implement—a Bayesian framework to address this.

Proposed Solution

I propose adding a probabilistic parameter estimation module using Markov Chain Monte Carlo (MCMC) sampling. This will allow users to define physiological priors and obtain full posterior distributions for permeability parameters, improving the robustness of the fits.

Proposed Implementation Workflow

If given the green light, I plan to execute this in four distinct phases:

**Phase 1: Architecture & Dependency Alignment **

  • Open a discussion in the draft PR regarding the preferred probabilistic backend. My initial thought is to use emcee as it is lightweight and widely used in scientific Python, or PyMC if a more comprehensive framework is desired.
  • Review the OSIPI CAPLEX lexicon to ensure all new variable names (e.g., posteriors, priors) match the official nomenclature.

**Phase 2: Core Mathematical Implementation **

  • Implement a Bayesian log-likelihood wrapper around the existing Standard Tofts Model forward equation:
    $$C_t(t) = K^{trans} \int_0^t C_p(\tau) e^{-k_{ep}(t-\tau)} d\tau$$
  • Build a function to accept user-defined prior distributions for $K^{trans}$ and $v_e$ (e.g., uniform or truncated normal bounds based on known human physiology).
  • Integrate the MCMC sampler to return both the Maximum A Posteriori (MAP) estimates and the raw sampler chains.

**Phase 3: Validation & Testing **

  • Generate synthetic noisy tissue concentration curves using osipy's forward models.
  • Write unit tests in the tests/ directory to ensure the MCMC sampler correctly recovers the ground-truth parameters within an acceptable margin of error.
  • Ensure all new code passes pre-commit hooks (black, flake8) and maintains the project's >95% code coverage requirement via check_coverage.py.

**Phase 4: Documentation & Tutorial **

  • Add comprehensive docstrings to the new classes/functions.
  • Create a tutorial_bayesian_tofts.ipynb notebook in the examples folder. This notebook will visually compare a standard least-squares fit against the Bayesian fit on noisy data, plotting the posterior corner plots to visualize parameter uncertainty.

I am looking forward to any feedback on this approach. Please let me know if I should adjust the scope or if there is a specific branch I should target for this work!

Even if I start early it will take me at least 5-6 do finish if things go well

Describe the solution

**Phase 1: Architecture & Dependency Alignment **

  • Open a discussion in the draft PR regarding the preferred probabilistic backend. My initial thought is to use emcee as it is lightweight and widely used in scientific Python, or PyMC if a more comprehensive framework is desired.
  • Review the OSIPI CAPLEX lexicon to ensure all new variable names (e.g., posteriors, priors) match the official nomenclature.

**Phase 2: Core Mathematical Implementation **

  • Implement a Bayesian log-likelihood wrapper around the existing Standard Tofts Model forward equation:
    $$C_t(t) = K^{trans} \int_0^t C_p(\tau) e^{-k_{ep}(t-\tau)} d\tau$$
  • Build a function to accept user-defined prior distributions for $K^{trans}$ and $v_e$ (e.g., uniform or truncated normal bounds based on known human physiology).
  • Integrate the MCMC sampler to return both the Maximum A Posteriori (MAP) estimates and the raw sampler chains.

**Phase 3: Validation & Testing **

  • Generate synthetic noisy tissue concentration curves using osipy's forward models.
  • Write unit tests in the tests/ directory to ensure the MCMC sampler correctly recovers the ground-truth parameters within an acceptable margin of error.
  • Ensure all new code passes pre-commit hooks (black, flake8) and maintains the project's >95% code coverage requirement via check_coverage.py.

**Phase 4: Documentation & Tutorial **

  • Add comprehensive docstrings to the new classes/functions.
  • Create a tutorial_bayesian_tofts.ipynb notebook in the examples folder. This notebook will visually compare a standard least-squares fit against the Bayesian fit on noisy data, plotting the posterior corner plots to visualize parameter uncertainty.

Describe alternatives

Instead of full MCMC sampling, we can add a regularization framework to the existing optimization pipeline. By incorporating prior physiological distributions as penalty terms in the objective function, we can perform Maximum A Posteriori (MAP) estimation. This stabilizes the fitting of $K^{trans}$ and $v_e$ against noise without sacrificing computational speed.

The workflow for this: -

**Phase 1: Architecture **

  • Review the current osipy fitting infrastructure.
  • Design an extensible Prior class (or module) that allows users to define expected physiological means and variances for parameters (e.g., Gaussian or Log-Normal priors).
  • Ensure the approach relies strictly on existing dependencies (numpy and scipy), avoiding the need for external probabilistic libraries.

**Phase 2: Core Mathematical Implementation **

  • Modify the standard Sum of Squared Errors (SSE) objective function to include a regularization term based on the priors. The new objective function to minimize will look like:
    $$\text{Cost} = \sum_{t} \left( C_{data}(t) - C_{model}(t, \theta) \right)^2 - \lambda \sum_{k} \log(P(\theta_k))$$
    (Where $\theta$ represents the Tofts parameters, $P(\theta)$ is the prior probability, and $\lambda$ is a weighting factor).
  • Integrate this custom cost function with scipy.optimize.minimize (using bounds-supported methods like L-BFGS-B or trust-constr).

**Phase 3: Validation & Testing **

  • Generate synthetic noisy tissue concentration curves using the Tofts forward model.
  • Write unit tests in tests/ comparing the standard unpenalized fit vs. the MAP fit at high noise levels, proving that the MAP estimator stays closer to ground-truth values.
  • Ensure >95% code coverage and verify pre-commit hooks pass.

**Phase 4: Documentation & Tutorial **

  • Update docstrings to CAPLEX standards.
  • Create a tutorial_MAP_tofts.ipynb notebook demonstrating how users can apply physiological priors to stabilize their fits on noisy data.

This approach will take me a little longer than the other one but it will work well too if given the permission to work on this

Additional context

No response

Are you working on this?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions