Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ outputs/
extra_info/
eval/
extra_info/
plan/

debug_**

Expand All @@ -24,6 +25,7 @@ thirdparty/kaolin/

threedgrt_tracer/.ninja_log
threedgrt_tracer/include/3dgrt/kernels/slang/*.cuh*
threedgut_tracer/include/threedgutSlang.cuh
*.egg-info
.idea

Expand Down
23 changes: 23 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# TODO

## 3DGRT: half-precision particle features

`conf.render.particle_feature_half` is compiled into the 3DGRT kernel via `-DPARTICLE_FEATURE_HALF`
but the Python-side cast is missing. In `threedgrt_tracer/tracer.py`, `gaussians.get_features()`
must be cast to `.half()` before being passed to `_Autograd.apply` when the flag is set,
matching what 3DGUT already does.

See the `TODO` comment in `threedgrt_tracer/tracer.py`.

## 3DGRT: NHT support in CUDA path (`gaussianParticles.cuh`)

The NHT feature transform (`FEATURE_TRANSFORM_TYPE=1`) is implemented for the Slang path
(`gaussianParticles.slang`) but not yet in the CUDA path (`gaussianParticles.cuh`).
Full NHT support in 3DGRT requires extending `gaussianParticles.cuh` with the NHT
interpolation and activation logic currently only present in the Slang kernel.

## 3DGUT: refactor `evalBackwardNoKBuffer` to share path with k-buffer backward

`evalBackwardNoKBuffer` (`gutKBufferRenderer.cuh`) duplicates logic from the k-buffer backward
path. The two should be unified into a shared implementation to reduce code duplication and
ensure future fixes apply to both.
144 changes: 144 additions & 0 deletions TODO_half_3dgrt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# 3DGRT half-precision feature support

Goal: make `conf.render.particle_feature_half=true` and `conf.render.feature_output_half=true`
work end-to-end in `threedgrt_tracer`, matching the behavior already implemented in
`threedgut_tracer`. Gradient buffers remain fp32 on both paths.

Semantics (mirroring 3dgut):
- `particle_feature_half=true`: storage for `particleRadiance` (per-particle feature buffer)
is fp16. Slang entry points already expect `feat_elem_t*` (`__half*` when the macro is set).
Gradient `particleRadianceGrad` stays fp32.
- `feature_output_half=true`: storage for the per-ray integrated feature buffer (`rayRadiance`)
is fp16. Gradient `rayRadianceGrad` stays fp32. Tracer `.forward()` casts fp16 back to fp32
before returning, mirroring 3dgut.

## Scope

Files to touch (by layer):

- C++ pipeline type layer
- `threedgrt_tracer/include/3dgrt/pipelineParameters.h`
Introduce `TFeatureDensityElem` (output/ray feature) and `TParticleFeatureElem` (particle
storage) typedefs, guarded on the two macros. Change `particleRadiance` from `const float*`
to `const TParticleFeatureElem*`, and `rayRadiance` from
`PackedTensorAccessor32<float, 4>` to `PackedTensorAccessor32<TFeatureDensityElem, 4>`.
`particleRadianceGrad` and `rayRadianceGrad` stay fp32.
- OptiX raygen kernels
- `threedgrt_tracer/src/kernels/cuda/referenceSlangOptix.cu`
- `threedgrt_tracer/src/kernels/cuda/referenceSlangBwdOptix.cu`
1. Replace `const_cast<float*>(params.particleRadiance)` with
`const_cast<TParticleFeatureElem*>(params.particleRadiance)`.
2. FWD write to `rayRadiance`: wrap with `__float2half` when `FEATURE_OUTPUT_HALF`.
3. BWD read from `rayRadiance`: wrap with `__half2float` when `FEATURE_OUTPUT_HALF`.
`rayRadianceGrad` reads stay fp32.
- Host launcher
- `threedgrt_tracer/src/optixTracer.cpp`
1. `trace()`: allocate `rayRad` with `torch::kHalf` when `FEATURE_OUTPUT_HALF=1`,
build `packed_accessor32<TFeatureDensityElem, 4>(rayRad)`, and
`getPtr<const TParticleFeatureElem>(particleRadiance)`.
2. `traceBwd()`: same dtype for the forward `rayRad` input; the grad tensors remain fp32.
- Python tracer
- `threedgrt_tracer/tracer.py`
1. Cast `gaussians.get_features()` to `.half()` when `conf.render.particle_feature_half`.
2. Keep `ray_features.float()` return to caller; `ray_features` saved in ctx may be fp16
when `feature_output_half=true` (already saves the raw output, consistent with 3dgut).

No changes required in Slang `.slang` or generated `.cuh`: the generalization already landed
and compiles correctly once `SLANG_CUDA_ENABLE_HALF=1` is set (done).

## Task breakdown

Each task is independently reviewable and testable (run validate.py for the relevant flag
combinations after each).

### T1 — Introduce typedefs in `pipelineParameters.h`
- Add `TFeatureDensityElem` and `TParticleFeatureElem` (guarded by `FEATURE_OUTPUT_HALF` and
`PARTICLE_FEATURE_HALF`), include `cuda_fp16.h` when either is set.
- Change `particleRadiance` to `const TParticleFeatureElem*` and `rayRadiance` accessor to
`PackedTensorAccessor32<TFeatureDensityElem, 4>`.
- No functional change when both macros are 0 (typedefs resolve to `float`).
- Test: build with both flags false (current default) → no-op rebuild; CI NeRF-Synthetic 3dgrt
smoke test still passes.

### T2 — Update OptiX kernels for fp16 reads/writes
- Apply the `__float2half` / `__half2float` wrappers in `referenceSlangOptix.cu` and
`referenceSlangBwdOptix.cu` under `FEATURE_OUTPUT_HALF`.
- Update `const_cast` sites to `TParticleFeatureElem*`.
- Test: build with both flags false → identical numerical output to baseline (no wrappers
compiled in).

### T3 — Host buffer allocation and accessor typing
- `optixTracer.cpp`: select dtype `kHalf` vs `kFloat32` for `rayRad`; use
`packed_accessor32<TFeatureDensityElem, 4>(rayRad)`.
- `getPtr<const TParticleFeatureElem>(particleRadiance)` for the particle buffer.
- Test: with flags false → unchanged; build-time assert that tensor dtype matches the
typedef via `TORCH_CHECK(rayRad.scalar_type() == ...)` in DEBUG.

### T4 — Python cast for `particle_feature_half`
- `tracer.py`: mirror 3dgut's conditional `.half()` cast on `gaussians.get_features()`.
- Test: flags false → unchanged.

### T5 — End-to-end validation with flags enabled
- Run `validate.py` with `render.particle_feature_half=true render.feature_output_half=true`
using an existing NHT config (e.g. `nerf_synthetic_3dgrt_mcmc_nht.yaml`).
- Compare PSNR after N iterations against the fp32 baseline — expected within 0.1 dB.
- Gradients: single backward pass on a fixed seed; check that
`particleRadianceGrad` and `rayRadianceGrad` are finite and within tolerance of the
fp32 reference.

### T6 — Rename `*Radiance*` → `*Features*` in 3dgrt
Naming cleanup to align with the post-SH NHT feature abstraction. The legacy `Radiance`
suffix comes from the SH-only era; the buffers now carry arbitrary per-particle / per-ray
features. Purely mechanical rename, no behavioral change. Runs AFTER T1–T5 land so we are
not also chasing name drift during the fp16 functional work.

Rename mapping (all scopes):
- `PipelineParameters::particleRadiance` → `particleFeatures`
- `PipelineParameters::rayRadiance` → `rayFeatures`
- `PipelineBackwardParameters::particleRadianceGrad` → `particleFeaturesGrad`
- `PipelineBackwardParameters::rayRadianceGrad` → `rayFeaturesGrad`
- `OptixTracer::trace(..., torch::Tensor particleRadiance, ...)` arg → `particleFeatures`
- `OptixTracer::traceBwd(..., torch::Tensor particleRadiance, rayRad, rayRadGrd, ...)` args
→ `particleFeatures`, `rayFeat`, `rayFeatGrd` (local tensors + Python side kwargs).
- `particleRadianceGrad` local in `optixTracer.cpp::traceBwd` → `particleFeaturesGrad`.
- Python: `tracer.py` local variables `ray_features` / `ray_features_grd` are already
feature-named; cross-check that the pybind11 binding signature in `bindings.cpp` uses
the new C++ arg names.

Out of scope for T6 (per resolved decisions above):
- `particleRadianceSphDegree` C++ field and `conf.render.particle_radiance_sph_degree` YAML.
- `shRadiativeParticles.slang` filename and internal `shRadiance*` identifiers (SH path).
- Any `*Radiance*` identifiers that only exist on the SH-specific code path.

Test:
- Build + full `validate.py` run with fp32 flags (both false) → identical numerical
output to pre-T6 baseline (bit-identical expected since only identifier renames).
- Build + `validate.py` with fp16 flags (both true) → identical output to T5 result.

## Tests to write up-front

- `tests/test_3dgrt_half_flags.py` (new, small)
- Parametrize over `(particle_feature_half, feature_output_half) ∈ {(F,F),(T,F),(F,T),(T,T)}`.
- Forward only, single frame, fixed scene; compare `pred_features.float()` to the (F,F)
baseline with `atol=5e-3, rtol=1e-2`.
- Forward + backward; compare `mog_sph.grad` to the (F,F) baseline at the same tolerance.

## Decisions (resolved with user)

1. T5 validation ownership: user runs validation; the plan only needs to keep the hooks in
place (no tolerance tuning required from the implementer).
2. Gradient buffers stay fp32 end-to-end (no half-grad path).
3. T6 rename scope is restricted to identifiers naming buffers that can carry NHT features
(i.e. the per-particle feature storage and per-ray integrated feature output, plus their
fp32 gradients). Scalars and SH-specific paths are NOT renamed:
- keep `particleRadianceSphDegree` (C++ field) and `conf.render.particle_radiance_sph_degree`
(YAML) — scalar, shared with the SH path
- keep `shRadiativeParticles.slang` filename and its internal `shRadiance*` identifiers —
SH-only code path.
4. T6 runs AFTER T1–T5.

## Non-goals

- No changes to CUDA fallback path (`gaussianParticles.cuh`) — per the existing TODO that is
a separate workstream.
- No changes to `threedgrt_playground`.
73 changes: 73 additions & 0 deletions TODO_nht_cuda.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Handwritten CUDA port of `featuresIntegrateBwdToLocalGrad` (NHT path)

## Status
- [x] **T1** — Tetrahedron constants (`tetraV0`, `tetraN0..N3`) placed in
`nht_detail` namespace at the top of `shRadiativeGaussianParticles.cuh`.
Values derived from Slang's vertex ordering; verified via script
(w_k == 1 at v_k, 0 at other vertices).
- [x] **T2** — Method body replaced, gated by `#if NHT_FEATURES_BWD_LOCAL_GRAD_CUDA`.
Default = `1` (native CUDA). Flip to `0` in
`threedgut_tracer/include/3dgut/kernels/cuda/models/shRadiativeGaussianParticles.cuh`
to restore the Slang-autodiff path (kept unchanged in the `#else` branch).
- [ ] **T3** — Rebuild, run `validate.py` (or one training step) with the macro
at 1 vs 0. Compare:
- feature gradient buffer L2 (primary parity check)
- density / position gradients (sanity; should be identical since we don't
touch those paths)
- `renderBackward` ms in nsys.
- [ ] **T4** — If parity holds, keep default = 1. Otherwise flip to 0 and iterate.

## What the handwritten CUDA does (semantics to match Slang exactly)

Replicates the sequence inside Slang's `particleFeaturesIntegrateBwdToBuffer`
called with `exclusiveGradient=true` and the shifted `featureLocalGrad` buffer:

1. Early-out when `alpha <= 0`.
2. Recover pre-hit accumulator:
`acc_prev[i] = (integratedFeatures[i] - features[i]*alpha) / (1-alpha)`.
3. VJP of back-to-front `y_i = (1-alpha)*acc_prev_i + alpha*f_i` against
incoming `dy = integratedFeaturesGrad`:
- `dFeatures[i] = alpha * dy_i`
- `alphaGrad += sum_i (features[i] - acc_prev[i]) * dy_i`
- `integratedFeaturesGrad[i] = (1-alpha) * dy_i` (new accumulator grad)
4. Barycentric weights `w[0..3]` from `canonicalIntersection` (Cramer form
matching Slang, precomputed `N_k` face normals).
5. Load 4 vertex feature blocks × `InterpPointFeatureDim` once
(`__half2float` when `PARTICLE_FEATURE_HALF=1`).
6. Activation backward → `dBase[InterpPointFeatureDim]`:
| Activation | Forward | Backward |
|---|---|---|
| None (0) | `out = base` | `dBase = dFeatures` |
| Siren (1) | `sin(base * 2^f)` | `dBase += cos(base*freq) * freq * dOut` |
| Sincos (2) | `sin + cos` | `dBase += (cos - sin) * freq * dOut` |
| Relu (3) | `max(0, base)` | `dBase = (features[i] > 0) ? dFeatures[i] : 0` |
7. Barycentric backward:
- `featureLocalGrad[k*IPFD + i] += w[k] * dBase[i]` (matches Slang's `+=` with exclusiveGradient=true)
- `canonicalIntersectionGrad += sum_k (sum_i vert[k][i] * dBase[i]) * N_k`

## Guardrails
- `static_assert(FeatureTransformType == 1)` — NHT-only.
- `static_assert(FEATURE_INTERPOLATION_TYPE == 0)` — barycentric only.
- `static_assert(FEATURE_INTERPOLATION_SUPPORT == 1)` — tetrahedra only.
- `static_assert` on `RAY_FEATURE_DIM` / `INTERP_POINT_FEATURE_DIM` / activation consistency.
- `static_assert(4 * IPFD == ParticleFeatureDim)` — buffer layout.

Any unsupported config fails at compile time — fallback is to flip the macro to 0.

## Confidence

- **Forward parity** (interpolation + integration, current config `activation=relu`):
high (see comparison with `neural-harmonic-textures/Interpolation.cuh` — same
tetrahedron geometry, different indexing; same integration math).
- **Backward numerical parity**: medium-high. The Relu path is trivial. The
(1-α)/α lerp VJP + barycentric VJP is standard. Main risk is a sign or
vertex-index swap — covered by T3 gradient diff.
- **Perf win**: medium-high. Expected 3–5× on this single kernel.

## Open reference points

- Slang source: `threedgut_tracer/include/3dgut/kernels/slang/models/neuralHarmonicFeaturesParticle.slang`
- External CUDA ref: `/nv/dev/neural-harmonic-textures/gsplat/gsplat/cuda/csrc/RasterizeToPixelsFromWorldNHT3DGSBwd.cu`
(sincos activation; do NOT copy the activation bwd verbatim — see
"Caveats" in the forward-parity discussion: Slang's sincos sums into one
channel, ref's keeps them separate).
23 changes: 23 additions & 0 deletions configs/apps/colmap_3dgrt_mcmc_nht.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# @package _global_
# NHT (Neural Harmonic Textures) variant for colmap datasets with 3DGRT and MCMC

defaults:
- /base_mcmc
- /dataset: colmap
- /initialization: colmap
- /render: 3dgrt
- _self_

model:
feature_type: "nht"

render:
pipeline_type: referenceSlang
backward_pipeline_type: referenceSlangBwd
particle_kernel_max_alpha: 0.999

loss:
use_opacity: true
lambda_opacity: 0.02
use_scale: true
lambda_scale: 0.005
21 changes: 21 additions & 0 deletions configs/apps/colmap_3dgut_mcmc_nht.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# @package _global_
# NHT (Neural Harmonic Textures) variant for colmap datasets with 3DGUT and MCMC

defaults:
- /base_mcmc
- /dataset: colmap
- /initialization: colmap
- /render: 3dgut
- _self_

model:
feature_type: "nht"

render:
particle_kernel_max_alpha: 0.999

loss:
use_opacity: true
lambda_opacity: 0.02
use_scale: true
lambda_scale: 0.005
23 changes: 23 additions & 0 deletions configs/apps/nerf_synthetic_3dgrt_mcmc_nht.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# @package _global_
# NHT (Neural Harmonic Textures) variant for nerf_synthetic with 3DGRT and MCMC

defaults:
- /base_mcmc
- /dataset: nerf
- /initialization: random
- /render: 3dgrt
- _self_

model:
feature_type: "nht"

render:
pipeline_type: referenceSlang
backward_pipeline_type: referenceSlangBwd
particle_kernel_max_alpha: 0.999

loss:
use_opacity: true
lambda_opacity: 0.02
use_scale: true
lambda_scale: 0.005
21 changes: 21 additions & 0 deletions configs/apps/nerf_synthetic_3dgut_mcmc_nht.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# @package _global_
# NHT (Neural Harmonic Textures) variant for nerf_synthetic with 3DGUT and MCMC

defaults:
- /base_mcmc
- /dataset: nerf
- /initialization: random
- /render: 3dgut
- _self_

model:
feature_type: "nht"

render:
particle_kernel_max_alpha: 0.999

loss:
use_opacity: true
lambda_opacity: 0.02
use_scale: true
lambda_scale: 0.005
Loading
Loading