Solves a batch of stiff ODE systems on the GPU using the RODAS4 method.
from nvqsp import sparse
result = sparse.solve(
A0,
A1_csr,
A2_csr,
y0,
times,
doses=None,
opts=None,
lib_path=None,
)Parameters:
| Parameter | Type | Description |
|---|---|---|
A0 |
ndarray (neq,) or (batch, neq) |
Zeroth-order coefficients. Shape (neq,) is broadcast to all patients. Shape (batch, neq) provides per-patient values. |
A1_csr |
tuple (rowptr, col, val) |
First-order coefficient matrix in CSR format. rowptr: int32 (neq+1,). col: int32 (A1_nnz,). val: float64 (A1_nnz,) or (batch, A1_nnz). |
A2_csr |
tuple (rowptr, col1, col2, val) |
Second-order bilinear coefficient tensor in CSR format. rowptr: int32 (neq+1,). col1, col2: int32 (A2_nnz,). val: float64 (A2_nnz,) or (batch, A2_nnz). |
y0 |
ndarray (neq,) or (batch, neq) |
Initial conditions. Shape (neq,) is broadcast to all patients. |
times |
ndarray (n_times,) |
Output time points. Must be strictly increasing and positive. |
doses |
list of (float, float) or None |
Dosing events as (time, amount) pairs. Dose is added to state index 0. None or empty list means no dosing. |
opts |
SparseOptions or None |
Solver options. None uses library defaults. |
lib_path |
str, Path, or None |
Override path to libsparse_rodas4.so. None uses the bundled library. |
Returns: SolveResult
| Field | Type | Description |
|---|---|---|
y |
ndarray (batch, n_times, neq) |
Solution trajectories. y[b, t, i] is state i of patient b at time times[t]. |
steps |
int |
Total ODE steps taken across all patients. |
Raises:
| Exception | Condition |
|---|---|
FileNotFoundError |
Library .so not found at any search location. |
ValueError |
A2_nnz == 0 (at least one A2 entry is required). |
Notes:
- Batch size is inferred from
A0ory0if they are 2-D. If both are 1-D, batch size defaults to 1. - CSR value arrays accept three shapes:
(nnz,)broadcasts to all patients,(batch, nnz)provides per-patient values,(batch * nnz,)is treated as pre-flattened per-patient values. - The library is loaded once and cached for subsequent calls.
Dataclass controlling solver behaviour. All fields have sensible defaults.
from nvqsp.options import SparseOptions
opts = SparseOptions(
rtol = 1e-6,
atol = 1e-12,
fac_min = 0.2,
fac_max = 10.0,
fac_safe = 0.9,
fac_rej = 0.1,
max_steps = 20000,
linsol_rtol = 1e-8,
linsol_atol = 1e-12,
linsol_max_iters = 30,
)Fields:
| Field | Type | Default | Description |
|---|---|---|---|
rtol |
float | 1e-6 | Relative tolerance for the ODE solution. Controls the accepted error relative to the solution magnitude. |
atol |
float | 1e-12 | Absolute tolerance for the ODE solution. Dominates for state values near zero. |
fac_min |
float | 0.2 | Minimum step-size growth/shrink factor. Prevents the step size from changing by more than 5x in one step. |
fac_max |
float | 10.0 | Maximum step-size growth factor. Limits aggressive step-size increases after easy steps. |
fac_safe |
float | 0.9 | Safety factor applied to the optimal step-size estimate. Values < 1.0 make the controller conservative. |
fac_rej |
float | 0.1 | Factor applied after a rejected step. On rejection, h_new = h * max(fac_rej, computed_factor). |
max_steps |
int | 20000 | Maximum ODE steps per integration interval (between dose events). If exceeded, the solver returns the last accepted state. |
linsol_rtol |
float | 1e-8 | Relative tolerance for the BiCGSTAB iterative linear solver used in each RODAS4 stage. |
linsol_atol |
float | 1e-12 | Absolute tolerance for the BiCGSTAB solver. |
linsol_max_iters |
int | 30 | Maximum BiCGSTAB iterations per linear solve. |
Guidance on tuning:
- For most QSP/PBPK models, the defaults work well.
- If the solver takes too many steps, try relaxing
rtol(e.g.1e-4). - If solutions show oscillation or instability, tighten
linsol_rtolor increaselinsol_max_iters. - For very stiff systems (stiffness ratio > 10^8), increasing
max_stepsto 50,000 may be needed. - Passing
opts=Nonetosparse.solve()uses the same defaults listed above.
When lib_path is not specified, the Python wrapper searches for the shared
library in this order:
- Bundled:
<nvqsp_package>/lib/libsparse_rodas4.so(inside the wheel) - Environment variable:
$QSP_ENGINE_LIB_SPARSE - System search:
ctypes.util.find_library("sparse_rodas4")
Doses are specified as a list of (time, amount) tuples. Each dose adds
amount to state index 0 (the primary dosing compartment).
doses = [
(0.0, 100.0), # 100 mg at t=0
(24.0, 50.0), # 50 mg at t=24 h
(48.0, 50.0), # 50 mg at t=48 h
]Conventions:
| Behaviour | Detail |
|---|---|
| Dose at t=0 | Applied to y0 before integration starts (within tolerance 1e-8). |
| Mid-run doses | Solver integrates to the dose time, applies the dose, then continues. |
| Output at dose time | Recorded post-dose (after the dose is applied). |
| Dose target | Always state index 0. Reorder the state vector if your dosing compartment differs. |
| Dose schedule | Both times and amounts are shared across all patients in the batch. Per-patient PK variation is supported through A0/A1/A2 coefficient values, not through dosing. |
| No doses | Pass doses=None or doses=[]. |
All three coefficient arrays support independent values per patient. The sparsity structure (row pointers and column indices) is shared — only the values differ.
| Array | Shared shape | Per-patient shape |
|---|---|---|
A0 |
(neq,) |
(batch, neq) |
A1_csr val |
(A1_nnz,) |
(batch, A1_nnz) |
A2_csr val |
(A2_nnz,) |
(batch, A2_nnz) |
Example: per-patient clearance variation
import numpy as np
from nvqsp import sparse
batch_size = 1000
A1_nnz = len(A1_col)
# Lognormal clearance distribution
cl = np.random.lognormal(mean=np.log(0.5), sigma=0.2, size=batch_size)
# Build per-patient A1 values
A1_val_batch = np.tile(A1_val_base, (batch_size, 1)) # (batch, A1_nnz)
A1_val_batch[:, cl_index] = -cl # vary clearance
result = sparse.solve(
A0=A0,
A1_csr=(A1_rowptr, A1_col, A1_val_batch),
A2_csr=(A2_rowptr, A2_col1, A2_col2, A2_val),
y0=y0,
times=times,
)
# result.y[i] is the trajectory for patient iPer-patient variation produces bit-exact results compared to singleton runs.
#include "sparse_rodas4.h"void sparse_rodas4_solve(
int batch_size,
int neq,
const double* h_A0, /* [batch_size * neq] */
int A1_nnz,
const int* h_A1_rowptr, /* [neq + 1] */
const int* h_A1_col, /* [A1_nnz] */
const double* h_A1_val, /* [batch_size * A1_nnz] */
int A2_nnz,
const int* h_A2_rowptr, /* [neq + 1] */
const int* h_A2_col1, /* [A2_nnz] */
const int* h_A2_col2, /* [A2_nnz] */
const double* h_A2_val, /* [batch_size * A2_nnz] */
const double* h_times, /* [n_times] */
int n_times,
const double* h_y0, /* [batch_size * neq] */
const double* h_dose_times, /* [n_doses] */
const double* h_dose_amounts, /* [n_doses] */
int n_doses,
const SparseRodas4Options* opts, /* NULL for defaults */
double* h_y_out, /* [batch_size * n_times * neq] */
int* h_total_steps /* out, may be NULL */
);Parameters:
| Parameter | Description |
|---|---|
batch_size |
Number of patients to solve in parallel. |
neq |
Number of ODE states per patient. |
h_A0 |
Zeroth-order coefficients. Row-major: patient b starts at b * neq. |
A1_nnz |
Number of nonzero entries in the A1 matrix. |
h_A1_rowptr |
CSR row pointers for A1. Shared across all patients. |
h_A1_col |
CSR column indices for A1. Shared across all patients. |
h_A1_val |
CSR values for A1. Patient b starts at b * A1_nnz. |
A2_nnz |
Number of nonzero entries in the A2 tensor. Must be >= 1. |
h_A2_rowptr |
CSR row pointers for A2. Shared across all patients. |
h_A2_col1 |
First column indices for A2. A2[i, col1[p], col2[p]] is the p-th entry in row i. |
h_A2_col2 |
Second column indices for A2. |
h_A2_val |
Values for A2. Patient b starts at b * A2_nnz. |
h_times |
Strictly increasing output time points. |
n_times |
Number of output time points. |
h_y0 |
Initial conditions. Patient b starts at b * neq. |
h_dose_times |
Dose event times. NULL if n_doses is 0. |
h_dose_amounts |
Dose amounts. NULL if n_doses is 0. |
n_doses |
Number of dose events. 0 for no dosing. |
opts |
Solver options. Pass NULL to use library defaults. |
h_y_out |
Output buffer. Patient b, time t, state i is at b * n_times * neq + t * neq + i. |
h_total_steps |
Receives the total ODE steps taken. May be NULL. |
#include "sparse_rodas4_options.h"
typedef struct SparseRodas4Options {
double rtol; /* 1e-6 — relative ODE tolerance */
double atol; /* 1e-12 — absolute ODE tolerance */
double facMin; /* 0.2 — minimum step-size factor */
double facMax; /* 10.0 — maximum step-size factor */
double facSafe; /* 0.9 — safety factor */
double facRej; /* 0.1 — factor after step rejection */
int max_steps; /* 20000 — max ODE steps per interval */
double linsol_rtol; /* 1e-8 — BiCGSTAB relative tolerance */
double linsol_atol; /* 1e-12 — BiCGSTAB absolute tolerance */
int linsol_max_iters;/* 30 — BiCGSTAB iteration limit */
} SparseRodas4Options;The struct is 80 bytes (includes internal padding after max_steps). Pass NULL
instead of an options pointer to use library defaults.
All arrays are row-major (C order), contiguous in memory.
| Array | Layout | Stride |
|---|---|---|
h_A0 |
[patient_0 | patient_1 | ...] |
neq |
h_A1_val |
[patient_0_vals | patient_1_vals | ...] |
A1_nnz |
h_A2_val |
[patient_0_vals | patient_1_vals | ...] |
A2_nnz |
h_y0 |
[patient_0_ic | patient_1_ic | ...] |
neq |
h_y_out |
[patient_0_traj | patient_1_traj | ...] |
n_times * neq |
Within each patient's trajectory, the inner layout is:
[time_0: state_0..state_N] [time_1: state_0..state_N] ...
#include <stdio.h>
#include <math.h>
#include "sparse_rodas4.h"
int main() {
int neq = 1, batch = 1, n_times = 3, n_doses = 1;
double A0[] = {0.0};
int A1_rowptr[] = {0, 1};
int A1_col[] = {0};
double A1_val[] = {-0.1};
int A2_rowptr[] = {0, 1};
int A2_col1[] = {0};
int A2_col2[] = {0};
double A2_val[] = {1e-30};
double times[] = {1.0, 10.0, 24.0};
double y0[] = {0.0};
double dose_times[] = {0.0};
double dose_amounts[] = {100.0};
double y_out[3];
int steps;
sparse_rodas4_solve(
batch, neq, A0,
1, A1_rowptr, A1_col, A1_val,
1, A2_rowptr, A2_col1, A2_col2, A2_val,
times, n_times, y0,
dose_times, dose_amounts, n_doses,
NULL, /* use defaults */
y_out, &steps
);
for (int t = 0; t < n_times; t++)
printf("y(%.0f) = %.4f (expected %.4f)\n",
times[t], y_out[t], 100.0 * exp(-0.1 * times[t]));
printf("Steps: %d\n", steps);
return 0;
}Compile:
gcc -o test_api test_api.c \
-I/usr/include/nvqsp \
-L/usr/lib/nvqsp -lsparse_rodas4 \
-Wl,-rpath,/usr/lib/nvqsp -lmCUDA errors call exit(1) and print a diagnostic message to stderr. GPU
out-of-memory will terminate the process. Validate batch sizes against available
GPU memory before calling the solver.
The solver automatically selects between two GPU kernel backends at runtime:
| Backend | When selected | Strategy |
|---|---|---|
| Single-kernel | neq fits in shared memory |
One persistent CUDA block per patient. All RODAS4 stages execute in a single kernel launch. Lowest overhead. |
| Multi-kernel | neq exceeds shared memory capacity |
Host-driven loop. Each RODAS4 stage is a separate kernel launch. Uses global memory. |
Backend selection is automatic and transparent. Both backends produce identical results.