Name	Name	Last commit message	Last commit date
parent directory ..
crates	crates
docs	docs
known-miscompiles	known-miscompiles
scripts	scripts
AGENTS.md	AGENTS.md
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
DESIGN.md	DESIGN.md
README.md	README.md
rust-toolchain.toml	rust-toolchain.toml

FuzzX PTX

This section is human-written.

This directory contains the FuzzX PTX fuzzer. It looks for correctness bugs in NVIDIA's PTX toolchain.

Currently it fuzzes ptxas.

The process of looking for a bug is:

Generate a random PTX program.
Compile it with -O0 and -O2.
Run both programs.
Compare their outputs.

So long as the random program is "legal" (meaning, mostly, it doesn't have undefined behavior) the output from the two programs should be identical. If they are not the same, that indicates a likely miscompile.

Most of the complexity in the fuzzer is around generating random programs. Obviously we can't generate truly arbitrary programs; they might have UB or infinite loops. Perhaps less obvious is that, after we've found one bug, we need to generate programs that avoid that bug, otherwise we'll just keep finding it over and over. So we have many flags that let you disable particular known-buggy idioms.

Fuzzers like libFuzzer and AFL++ allow you to do "directed" fuzzing, where you observe the branches taken by the binary under test and steer fuzzing towards "interesting" inputs. It is possible to do directed fuzzing on black-box binaries like ptxas using e.g. AFL++'s QEMU mode. But we don't currently do this, because we've found that undirected fuzzing is sufficient (for now).

All of the code here is AI-written, using ChatGPT 5.5 and Opus 4.7. I haven't read it at all. Fuzzing is inherently messy, and anyway the goal here is to find bugs, not to build a beautiful fuzzer.

After finding a miscompile, you'll want to:

come up with a minimal testcase,
root-cause the bug,
write a reproducer to share with the vendor, and
add a flag to the fuzzer so it avoids finding the same bug again.

I also use AI for this. Eventually it writes a reproducer into the known-miscompiles directory.

I've had good luck using /goal to get the AI to run the fuzzer, wait for a bug to appear, process it as above, and then restart the fuzzer. The biggest issue seems to be that it's slow at minimizing testcases.

Everything below this line is AI-written slop. Good luck!

Requirements

Component	Notes
Rust	Uses the toolchain in `rust-toolchain.toml`.
CUDA driver + `libcuda`	Required for fuzzing, verification, and reduction.
CUDA Toolkit `ptxas`	Set `PTXAS=/path/to/ptxas` for reproducible runs.
CUDA Toolkit `nvcc`	Required for the standalone CUDA inline-PTX reproducers.
NVIDIA GPU matching `TARGET_ARCH`	`fuzzx-execgen` currently defaults to `sm_103`.

Layout

Path	Purpose
`crates/fuzzx-execgen`	PTX kernel generator for differential testing.
`crates/fuzzx-exec`	`ptxas` compiler wrapper plus CUDA launch/diff helpers.
`crates/fuzzx-diff`	Differential fuzzer plus show/verify/reduce helpers.
`known-miscompiles/`	Reduced or standalone reproducers for confirmed findings.
`scripts/check-gen.sh`	Generator acceptance-rate smoke test against `ptxas`.

`ptxas` Bugs Found

Except where otherwise noted, these have been tested on sm_103 (i.e. B300). The primary table lists reproducers still observed on the latest tested ptxas.

Version	Description
13.2.78	m001-seed-050f: Uniform loop-latch optimization mishandles divergent loop-header entry.
13.2.78	m002-structured-lop3: `selp` / `lop3` / `xor` fold computes the wrong truth-table result.
13.2.78	m003-no-lop3-max-chain: `sub.u32` plus `max.s32` chain fold incorrectly includes the pre-subtract value.
13.2.78	m051-sat-sub-add-fold: `sub.sat.s32` followed by adding back the subtrahend folds as if saturation cannot occur.
13.2.78	m004-mulhi-loop-tripcount: Loop removal drops two `mul.hi.s32` accumulator updates.
13.2.78	m005-prmt-ifconvert-mask: If-converted `prmt.b32` mask fold drops a source operand.
13.2.78	m006-ifconvert-not-xor: If-converted `not.b32` plus `xor.b32` fold uses the wrong truth table.
13.2.78	m007-signed-unsigned-ifconvert: Nested if-conversion conflates signed and unsigned predicates.
13.2.78	m008-funnel-shift-loop-unroll: Loop unroll rewrites a loop-carried `shf.r.wrap.b32` recurrence incorrectly.
13.2.78	m009-neg-loop-after-counted-loop: Loop deletion stores a pre-`neg.s32` value after counted-loop simplification.
13.2.78	m010-shr-s32-range-fold: Range fold treats `shr.s32` as if it were unsigned before an unsigned compare.
13.2.78	m011-bfind-after-empty-loop: Empty-loop simplification folds a `bfind.u32`-derived value incorrectly.
13.2.78	m012-empty-loop-intmax-sub: Counted empty-loop fold miscomputes an `INT_MAX` subtraction sequence.
13.2.78	m048-intmax-popc-sub-mask-fold: Likely related to m012; structured branch context misfolds a `popc`-derived `INT_MAX` subtract before an `and` mask.
13.2.78	m013-set-true-cmp-one: `set.eq` materialization is folded as a predicate instead of `0xffffffff`.
13.2.78	m047-selp-ge-zero-branch-fold: `selp` materialization of `0xffffffff` feeding an unsigned `>= 0` branch fold skips an always-taken arm.
13.2.78	m014-vsub4-divergent-branch: `vsub4.u32.u32.u32` constant fold uses the wrong byte-lane intermediate.
13.2.78	m015-abs-loop-bmsk-fold: Loop deletion uses the pre-`abs.s32` live-out value in a `bmsk` expression.
13.2.78	m016-slct-s32-immediate-fold: `slct.s32.s32` immediate fold selects the wrong arm for a positive value.
13.2.78	m017-addc-shift-carry-fold: `add.cc.u32` / `addc.u32` fold injects an incorrect carry-in.
13.2.78	m029-addc-mul-carry-fold: Likely same root cause as m017; `addc.u32` fold injects an incorrect carry-in after multiply-derived operands.
13.2.78	m018-subc-cnot-shift-borrow-fold: `sub.cc.u32` / `subc.u32` fold injects an incorrect borrow-in after `cnot`.
13.2.78	m027-subc-shr-mul-borrow-fold: Likely same root cause as m018; `subc.u32` fold uses the wrong borrow source after shift and multiply.
13.2.78	m082-subc-shared-byte-borrow-fold: Likely same root cause as m018; `subc.u32` borrow-chain fold is wrong after a shared signed-byte roundtrip and scalar 16-bit producer.
13.2.78	m019-structured-loop-uniform-counter: Structured loop counters are promoted to uniform state and lose per-lane values.
13.2.78	m020-mixed-minmax-signedness-fold: Mixed signed/unsigned `min` / `max` fold drops the runtime input.
13.2.78	m021-cnot-funnel-add: `shf.r.wrap.b32` plus add fold loses part of the shifted value.
13.2.78	m081-cnot-shf-left-add: Likely related to m021; `cnot.b32` feeding `shf.l.wrap.b32` plus add flips the shifted contribution under optimized ptxas.
13.2.78	m022-neg-funnel-left-add: `neg.s32` plus `shf.l.wrap.b32` fold produces a sign-extension-shaped error.
13.2.78	m023-mul-wide-hi-ice: Optimized compile crashes on a `mul.wide` low-half feeding signed high multiply.
13.2.78	m024-prmt-cvt-u16-fold: `prmt.b32` plus `cvt.u16` fold drops the permuted source contribution.
13.2.78	m055-prmt-reg-control-eq-fold: Register-control `prmt.b32` feeding an equality fold selects the wrong arm.
13.2.78	m066-prmt-sign-byte-and-fold: Sign-control `prmt.b32` feeding a low-byte `and.b32` fold drops the sign-filled byte.
13.2.78	m057-s16-unary-intmin-fold: `abs.s16` / `neg.s16` of `INT16_MIN` feeding `cvt.s32.s16` is treated as a positive value.
13.2.78	m058-scalar16-min-cvt-fold: Scalar `min.{u16,s16}` through `.b16` scratch registers folds a following equality predicate incorrectly.
13.2.78	m060-scalar16-sub-intmin-fold: Scalar `sub.s16 0, INT16_MIN` feeding `cvt.s32.s16` is zero-extended by optimized ptxas.
13.2.78	m061-f32-div-pred-neg-cvt-fold: `div.approx.ftz.f32` feeding `cvt.rzi.s32.f32` and a skipped predicated `neg.f32` loses the fallback move at `-O0`.
13.2.78	m062-packed-max-u16x2-liveout-fold: `max.u16x2` in a live-range-heavy block corrupts later global roundtrip live-outs under optimized ptxas.
13.2.78	m063-f32-approx-div-cvt-boundary: `div.approx.ftz.f32` can legally straddle an integer boundary before `cvt.rzi.s32.f32`, producing an exact-output oracle mismatch.
13.2.78	m064-scalar16-max-cvt-predicate-fold: Scalar `max.u16` through `.b16` scratch registers folds a following zero predicate incorrectly.
13.2.78	m065-red-global-min-loop-fold: Loop-carried value feeding a per-thread `red.global.min.u32` roundtrip uses an earlier store value under optimized ptxas.
13.2.78	m067-atom-global-dec-loop-fold: Loop-carried state around a per-thread `atom.global.dec.u32` roundtrip changes a final predicate-controlled live-out under optimized ptxas.
13.2.78	m068-atom-global-xor-loop-fold: Loop-carried value feeding a per-thread `atom.global.xor.b32` roundtrip uses the original input word under optimized ptxas.
13.2.78	m069-wide-subc-loop-borrow-fold: Loop-carried state around a predicated `sub.cc.u64` / `subc.u64` pair drops the borrow/value feeding the final live-out.
13.2.78	m070-lop3-loop-liveout-fold: Loop-carried `lop3.b32` live-out is replaced with the original input-size value under optimized ptxas.
13.2.78	m071-f32-neg-loop-liveout-fold: Loop-carried `neg.f32` / `cvt.rzi.s32.f32` live-out is replaced with the original input word under optimized ptxas.
13.2.78	m072-const-u16-highbits-fold: `ld.const.u16` into a reused 32-bit register preserves stale high bits under optimized ptxas.
13.2.78	m073-cached-global-load-loop-entry: Likely related to m001; divergent loop-header entry drops a loop-body cached narrow global load under optimized ptxas.
13.2.78	m074-mad-hi-carry-loop-tripcount: Likely related to m004; loop optimization drops `mad.hi.cc.s32` high-multiply carry-chain updates.
13.2.78	m075-wide-bfi-loop-liveout-fold: Loop-carried `bfi.b64` live-out is folded to shifted or sign-shaped values under optimized ptxas.
13.2.78	m076-predicated-mad-loop-liveout-fold: Loop-body predicated `mad.lo.u32` update is dropped, leaving `%tid.x` instead of `%lanemask_gt` under optimized ptxas.
13.2.78	m077-global-store-loop-liveout-fold: Loop-carried value reset after a per-thread global store is ignored, so the final `mad.lo.s32` uses stale pre-store state.
13.2.78	m078-mul-lo-loop-recurrence-fold: Low-multiply loop recurrence is skipped for some lanes, leaving `input
13.2.78	m079-predicated-packed-add-high-half: Branch-local packed add leaves the low half correct but fills the high half with a lane value under optimized ptxas.
13.2.78	m080-ldu-signed-branch-fold: Signed branch control around predicated `ldu.global.u32` leaves original per-lane inputs instead of the uniform load value under optimized ptxas.
13.2.78	m059-scalar16-pred-mulwide-fold: Scalar `max.s16` feeding a predicate-guarded `mul.wide.u16` is optimized as if the multiply did not execute.
13.2.78	m025-shl-xor-square-lowbits: Fold loses the fact that a value is shifted left before testing low bits.
13.2.78	m026-shr-abs-ult-fold: Fold reasons about `0 - abs(n)` as signed or non-wrapping before unsigned compare.
13.2.78	m028-shf-r-wrap-sub-fold: `shf.r.wrap.b32` output is folded to zero before a final subtract.
13.2.78	m030-not-clz-predicate-fold: Guarded path fold drops or misapplies `not.b32` before `clz.b32`.
13.2.78	m031-guarded-sub-sub-fold: Guarded `x - (0x80000000 - x)` fold drops the `2*x` contribution.
13.2.78	m032-cnot-neg-ugt-fold: `cnot` / `neg` chain feeding an unsigned-greater-than predicate folds to the wrong arm.
13.2.78	m046-cnot-underflow-ugt-fold: Likely same root cause as m032; `cnot` feeding wrapped subtraction before an unsigned comparison selects the wrong arm.
13.2.78	m033-not-xor-branch-fold: Branch-specialized `not` / `xor` path folds the wrong value into the store.
13.2.78	m035-xor-not-predicate-fold: Likely same root cause as m033; `xor.b32` by `0xffffffff` feeding a predicate selects the wrong arm.
13.2.78	m034-bfind-zero-branch-fold: Branch fold treats `bfind.u32 0` as `0` instead of `0xffffffff`.
13.2.78	m036-mulhi-control-fold: Control-flow fold around `mul.hi.s32` uses an incorrect folded constant.
13.2.78	m037-bmsk-clz-bfi-fold: `bmsk` / `clz` / `bfi` / `mad.lo` value-chain fold sets an extra output bit.
13.2.78	m038-structured-empty-else-fold: Always-false structured branch with an empty else arm folds as if the untaken then arm executed.
13.2.78	m039-else-redefinition-fold: Branch fold drops the executed else-path redefinition of a value initialized before the branch.
13.2.78	m040-mulwide-neg-shr-fold: `mul.wide` low word feeding wrapped negation and logical shift loses the shifted high-bit contribution.
13.2.78	m049-wide-or-shift-mask-fold: Likely related to m040; `or.b64` low word feeding a shift/add mask fold computes the wrong mask.
13.2.78	m041-or-shifted-square-fold: `or.b32` after a square known to have zero low 32 bits folds with a missing output bit.
13.2.78	m044-mul-lo-square-fold: Likely same root cause as m041; square of a shifted `mul.lo` value folds to `0x80000000` instead of zero.
13.2.78	m042-vsub4-else-ifconvert-fold: If-converted else arm using `vsub4` computes the wrong value for the one lane that takes it.
13.2.78	m043-shr-sub-branch-fold: Branch-sensitive unsigned shift after wrapped subtraction loses the shifted high bit.
13.2.78	m050-reg-shl-mask-fold: Masked register-count `shl.b32` chains fold to the wrong shifted value.
13.2.78	m052-bfe-reg-pos-fold: Register-position `bfe.s32` with an out-of-range start folds to the wrong sign-filled value.
13.2.78	m053-bfi-reg-len-fold: Likely related to m052; register-length `bfi.b32` preserves high base bits that should be overwritten.
13.2.78	m045-brev-branch-fold: Branch-join fold around `brev.b32` computes `0x8000001d` instead of `0x8000001f`.
13.2.78	m083-orphan-param-ld: ptxas segfaults at every optimization level on an 11-line kernel that declares a local `.param` and reads it with `ld.param` without ever using the `.param` as a `call` argument or return value. Also reproduces on 13.0.88.
13.2.78	m084-multi-island-o-opt-crash: ptxas optimizer segfaults at `-O1` and above (clean at `-O0`) on a 66-line kernel whose body simultaneously contains `cvt.pack.sat.u8`, `bar.red.popc.u32`, `shfl.sync.up.b32`, `elect.sync`, `redux.sync.max.u32`, `createpolicy.fractional.L2` + cache-hint loads, bf16/tf32 conversions, and `sub.rn.f16x2`; removing any single category makes the crash disappear. Also reproduces on 13.0.88.
13.2.78	m085-cond-skip-or-imm-neg1: 19-line repro — `mov.b32 %r1, 0x3f800000;` followed by a kernel-param-dependent `setp` and `@%p bra done;` over `or.b32 %r1, -1, %r0;` causes `-O3` to skip the OR and keep the `0x3f800000` initialiser. Specific to the `0x3f800000` (1.0f) bit pattern — tested 11 other constants, none trigger. Also reproduces on 13.0.88.
13.2.78	m086-predicated-addc-carry-crash: 16-line kernel with a predicated `add.cc.u32` / `addc.u32` carry chain triggers a `C7907` internal compiler error at every opt level above `-O0`; the same input segfaults ptxas 13.0.88 at `-O3`.
13.2.78	m087-redux-xor-guarded-f64-cvt: Same shape as m085 — `setp.ne.u32 %p, %r_redux, 0` guarding a single-arm `bra` over an `f64`-to-`s32` cvt that writes `%r3` causes `-O3` to skip the cvt and leave `%r3` at whatever an upstream predicated `mov.b64 {%r3, %r9}, %rd7;` left it. The differing reduce predicate is computed from a `redux.sync.xor.b32` reduction. Also reproduces on 13.0.88.

Bugs Found in 13.0.88 and Fixed in 13.2.78

Found In	Fixed In	Description
13.0.88	13.2.78	m054-packed-add-cvt-fold: `add.s16x2` feeding `cvt.u16` and another packed add drops the first packed-add contribution.
13.0.88	13.2.78	m056-packed-add-cvt-s16-fold: Likely same root cause as m054; `add.u16x2` feeding `cvt.s16` drops the packed-add contribution.

Running

Build the tools:

cargo build --release -p fuzzx-diff

Run a differential sweep:

target/release/fuzzx-diff \
  --ptxas /usr/local/cuda/bin/ptxas \
  --max-iters 100000

Divergences are saved under DIV_OUT_DIR (default: divergences/) as directories containing seed.bin, program.ptx, input.bin, output_o0.*, output_o3.*, and summary.txt.

Useful follow-up commands:

target/release/fuzzx-diff-show divergences/div-...
target/release/fuzzx-diff-verify divergences/div-...
target/release/fuzzx-diff-reduce divergences/div-...
target/release/fuzzx-diff-test divergences/div-.../program.ptx divergences/div-.../input.bin
target/release/fuzzx-diff-inspect-outputs divergences/div-.../program.ptx divergences/div-.../input.bin

Check how often generated PTX assembles:

PTXAS=/usr/local/cuda/bin/ptxas scripts/check-gen.sh 200

Configuration

fuzzx-diff accepts kebab-case CLI flags for the run-control and generator settings below; target/release/fuzzx-diff --help lists the full set. The same settings can still be supplied as environment variables, which is useful for long-running scripted sweeps. Boolean environment variables accept 1, true, yes, or on for true, and 0, false, no, or off for false.

Shared

Variable	Default	Meaning
`PTXAS`	`/usr/local/cuda/bin/ptxas`, then `$HOME/bin/ptxas`, then `ptxas`	Target `ptxas` binary. Set this explicitly for reproducible runs.
`TMPDIR`	Caller value; some tools use `/dev/shm` when unset and available.	Temporary directory for PTX/cubin files.

Run Control

Variable	Default	Meaning
`DIV_OUT_DIR`	`divergences`	Directory for saved divergence bundles.
`DIV_STARTING_SEED`	nanoseconds since epoch	First seed in the deterministic seed stream.
`DIV_MAX_ITERS`	unlimited	Stop after this many generated candidates.
`DIV_PRINT_EVERY_SECS`	`5`	Progress-report interval.
`DIV_PROGRAM_BYTES`	`4096`	Bytes derived from each seed and consumed by the generator.
`DIV_GPUS`	all visible CUDA devices	Comma-separated CUDA device ordinals, for example `0,1,2`.
`DIV_WORKERS_PER_GPU`	`16`	Worker threads per selected GPU.

Generator Shape

Variable	Default	Meaning
`DIV_STRUCTURED_CONTROL_FLOW`	`false`	Use structured single-entry if/loop generation instead of arbitrary CFG generation.
`DIV_MIN_BLOCKS` / `DIV_MAX_BLOCKS`	`1` / `10`	Block-count bounds.
`DIV_MIN_INSTS_PER_BLOCK` / `DIV_MAX_INSTS_PER_BLOCK`	`1` / `6`	Instruction-count bounds per block.
`DIV_WORKING_REGS`	`8`	Number of working `u32` registers.
`DIV_MAX_LOOP_ITERS`	`16`	Maximum generated loop-trip count.
`DIV_MAX_IMMEDIATE`	`32`	Maximum ordinary immediate value.
`DIV_MAX_STRUCTURED_DEPTH`	`3`	Maximum nesting depth for structured control flow.

Generator Feature Toggles

All variables in this table default to false; setting one to true suppresses that feature.

Variable	Suppresses
`DIV_DISABLE_STRUCTURED_LOOPS`	Counted-loop shapes in structured mode.
`DIV_DISABLE_ARBITRARY_LOOPS`	Backedge loop terminators in arbitrary CFG mode.
`DIV_DISABLE_LOP3`	`lop3.b32`.
`DIV_DISABLE_PREDICATED_LOP3`	Predicated `lop3.b32` instructions.
`DIV_DISABLE_MINMAX`	`min.u32`, `max.u32`, `min.s32`, `max.s32`.
`DIV_DISABLE_SELP`	Direct integer `selp.{b32,u32,s32}` instructions.
`DIV_DISABLE_TYPED_SELP`	Direct integer `selp.u32` and `selp.s32` instructions.
`DIV_DISABLE_SUB`	Random `sub.u32` ALU instructions.
`DIV_DISABLE_MUL_LO`	`mul.lo.u32` and `mad.lo.u32`.
`DIV_DISABLE_SIGNED_LO_ALU`	Signed low-ALU spellings, including saturating add/sub.
`DIV_DISABLE_SAT_ARITH`	`add.sat.s32` and `sub.sat.s32`.
`DIV_DISABLE_PACKED_ADD`	`add.u16x2` and `add.s16x2`.
`DIV_DISABLE_SIGNED_PACKED_ADD`	`add.s16x2` only.
`DIV_DISABLE_PREDICATED_PACKED_ADD`	Predicated `add.u16x2` and `add.s16x2` instructions.
`DIV_DISABLE_PACKED_MINMAX`	`min/max.{u16x2,s16x2}`.
`DIV_DISABLE_SIGNED_PACKED_MINMAX`	`min/max.s16x2` only.
`DIV_DISABLE_PREDICATED_PACKED_MINMAX`	Predicated `min/max.{u16x2,s16x2}` instructions.
`DIV_DISABLE_SCALAR_16BIT`	Scalar 16-bit ALU through `.b16` scratch registers.
`DIV_DISABLE_SIGNED_SCALAR_16BIT`	Signed scalar 16-bit ALU while retaining unsigned `u16` ops.
`DIV_DISABLE_SCALAR_16BIT_MIN`	Scalar 16-bit `min.{u16,s16}` and `max.{u16,s16}` while retaining other scalar 16-bit arithmetic instructions.
`DIV_DISABLE_SCALAR_16BIT_SIGNED_UNARY`	`abs.s16` and `neg.s16` while retaining other scalar 16-bit ALU instructions.
`DIV_DISABLE_SCALAR_16BIT_BITWISE`	`and.b16`, `or.b16`, `xor.b16`, and `not.b16`.
`DIV_DISABLE_SCALAR_16BIT_SHIFTS`	`shl.b16`, `shr.u16`, and `shr.s16` with immediate counts in `0..15`.
`DIV_DISABLE_SCALAR_16BIT_COMPARE`	Scalar 16-bit `setp` and `set` through `.b16` scratch registers.
`DIV_DISABLE_SCALAR_16BIT_SELP`	Scalar 16-bit `selp.u16` and `selp.s16`; also disabled by `DIV_DISABLE_SCALAR_16BIT_COMPARE`.
`DIV_DISABLE_PREDICATED_SCALAR_16BIT`	Predicated scalar 16-bit ALU instructions.
`DIV_DISABLE_MULHI`	`mul.hi.u32` and `mul.hi.s32`.
`DIV_DISABLE_SIGNED_MULHI`	`mul.hi.s32` only.
`DIV_DISABLE_MAD_HI`	`mad.hi.u32` and `mad.hi.s32`.
`DIV_DISABLE_SIGNED_MAD_HI`	`mad.hi.s32` only.
`DIV_DISABLE_BITWISE_BINOPS`	`and.b32`, `or.b32`, `xor.b32`.
`DIV_DISABLE_OR`	`or.b32` while retaining `and.b32` and `xor.b32`.
`DIV_DISABLE_XOR`	`xor.b32` while retaining `and.b32` and `or.b32`.
`DIV_DISABLE_PRMT`	`prmt.b32`.
`DIV_DISABLE_PREDICATED_PRMT`	Predicated `prmt.b32` instructions.
`DIV_DISABLE_REG_PRMT`	Register-control `prmt.b32` instructions.
`DIV_DISABLE_PREDICATED_REG_PRMT`	Predicated register-control `prmt.b32` instructions.
`DIV_DISABLE_PRMT_MODES`	`prmt.b32` mode variants such as `.f4e`, `.ecl`, and `.rc16`.
`DIV_DISABLE_NOT`	`not.b32` and xor-by-`0xffffffff` forms.
`DIV_DISABLE_CLZ`	`clz.b32`.
`DIV_DISABLE_BREV`	`brev.b32`.
`DIV_DISABLE_CNOT`	`cnot.b32`.
`DIV_DISABLE_POPC`	`popc.b32`.
`DIV_DISABLE_ABS`	`abs.s32`.
`DIV_DISABLE_SPECIAL_REGS`	Deterministic special-register reads and constants such as `%laneid`, `%nwarpid`, `WARP_SZ`, and `%lanemask_*`.
`DIV_DISABLE_PREDICATED_SPECIAL_REGS`	Predicated deterministic special-register reads.
`DIV_DISABLE_GLOBAL_LOADS`	Bounded read-only `ld.global.{u8,s8,u16,s16,u32,u64,s64}` loads from the input buffer.
`DIV_DISABLE_UNIFORM_GLOBAL_LOADS`	Uniform-address scalar/vector `ldu.global` loads from the input buffer.
`DIV_DISABLE_GLOBAL_STORE_ROUNDTRIPS`	Per-thread `st.global.{u8,u16,u32,u64}` plus `ld.global.{u8,s8,u16,s16,u32,u64,s64}` roundtrips through the output buffer.
`DIV_DISABLE_GLOBAL_ATOMICS`	Per-thread `atom.global.{add,exch,cas,inc,dec,min,max,and,or,xor}` roundtrips through the output buffer.
`DIV_DISABLE_GLOBAL_ATOMIC_DEC`	`atom.global.dec.u32` roundtrips only; other global atomics remain enabled.
`DIV_DISABLE_GLOBAL_ATOMIC_XOR`	`atom.global.xor.b32` roundtrips only; other global atomics remain enabled.
`DIV_DISABLE_PREDICATED_GLOBAL_ATOMICS`	Predicated per-thread global atomic roundtrips.
`DIV_DISABLE_GLOBAL_REDUCTIONS`	Per-thread `red.global.{add,inc,dec,min,max,and,or,xor}` roundtrips through the output buffer.
`DIV_DISABLE_PREDICATED_GLOBAL_REDUCTIONS`	Predicated per-thread global reduction roundtrips.
`DIV_DISABLE_GENERIC_MEMORY`	Generic-address `ld`/`st` roundtrips through the output buffer, plus `isspacep.global`.
`DIV_DISABLE_PREDICATED_GENERIC_MEMORY`	Predicated generic-address `ld`/`st` roundtrips.
`DIV_DISABLE_CONST_MEMORY`	Bounded read-only `ld.const.{u8,s8,u16,s16,u32,u64,s64}` loads from a module-scope constant buffer.
`DIV_DISABLE_LOCAL_MEMORY`	Bounded private local-memory store/load roundtrips, including 64-bit forms.
`DIV_DISABLE_SHARED_MEMORY`	Race-free per-thread shared-memory store/load roundtrips, including 64-bit forms.
`DIV_DISABLE_SHARED_ATOMICS`	Per-thread `atom.shared.{add,exch,cas,inc,dec,min,max,and,or,xor}` roundtrips through a private shared-memory slot.
`DIV_DISABLE_PREDICATED_SHARED_ATOMICS`	Predicated per-thread shared atomic roundtrips.
`DIV_DISABLE_SHARED_REDUCTIONS`	Per-thread `red.shared.{add,inc,dec,min,max,and,or,xor}` roundtrips through a private shared-memory slot.
`DIV_DISABLE_PREDICATED_SHARED_REDUCTIONS`	Predicated per-thread shared reduction roundtrips.
`DIV_DISABLE_PREDICATED_MEMORY`	Predicated forms of bounded scalar/vector memory loads and store/load roundtrips.
`DIV_DISABLE_VECTOR_MEMORY`	Aligned `v2`/`v4` u32 and `v2.u64` memory loads and store/load roundtrips.
`DIV_DISABLE_WIDE_MEMORY`	Scalar 64-bit and `v2.u64` memory loads and store/load roundtrips.
`DIV_DISABLE_MEMORY_CACHE_OPS`	Scalar/vector global-memory cache-policy variants such as `ld.global.ca`, `ld.global.nc`, and `st.global.wt`.
`DIV_DISABLE_VOLATILE_MEMORY`	Volatile scalar/vector global and shared memory loads and store/load roundtrips.
`DIV_DISABLE_BIT_MEMORY`	Scalar/vector memory load/store type suffixes such as `.b8`, `.b16`, `.b32`, and `.b64`.
`DIV_DISABLE_MEMORY_FENCES`	Uniform `membar.{cta,gl,sys}` and `fence.{acq_rel,sc}.{cta,gpu,sys}` memory-ordering instructions.
`DIV_DISABLE_WARP_BARRIERS`	Full-warp `bar.warp.sync` emitted before generated divergent control flow.
`DIV_DISABLE_WARP_COLLECTIVES`	Full-mask `activemask`, `vote.sync`, `match.sync`, `elect.sync`, `shfl.sync`, and `redux.sync` emitted before generated divergent control flow, including randomized uniform-prefix dataflow islands.
`DIV_DISABLE_CTA_BARRIERS`	Full-CTA `bar.sync` and `barrier.sync` emitted before generated divergent control flow.
`DIV_DISABLE_CTA_BARRIER_REDUCTIONS`	Full-CTA `bar.red` and `barrier.red` reductions emitted before generated divergent control flow.
`DIV_DISABLE_PREFETCH`	Bounded `prefetch.global.*` and `prefetchu.L1` cache hints against the input buffer.
`DIV_DISABLE_CACHE_POLICY_HELPERS`	Deterministic prologue coverage for `createpolicy`, `applypriority`, cache-hint `ld.global`, `ld.global.nc`, `st.global`, and, when enabled, cache-hint global atomic/reduction forms.
`DIV_DISABLE_HELPER_CALLS`	Deterministic calls to a generated leaf `.func` in the entry prologue, plus randomized helper-call islands that re-invoke the available helper variants with `call.uni`/`call` spellings.
`DIV_DISABLE_RICH_HELPER_CALLS`	Deterministic prologue coverage for multi-argument, multi-return, mixed-width return, 64-bit/f32/f64 return, nested acyclic, explicit `.param` ABI, mixed register/`.param`, and predicate/select helper `.func` calls, plus randomized helper-call islands covering the same variants.
`DIV_DISABLE_F32_ARITH`	Sanitized `add/sub/mul/div/fma/copysign/min/max.f32` arithmetic, including approximate f32 division, f32 `.sat` arithmetic, and `.ftz` min/max.
`DIV_DISABLE_F32_ROUNDING`	Sanitized non-default rounding and `.ftz` f32 add/sub/mul/div/fma arithmetic.
`DIV_DISABLE_F32_UNARY`	Sanitized `abs/neg.f32`, including `.ftz` forms.
`DIV_DISABLE_F32_CVT`	Sanitized signed/unsigned 32/64-bit f32/int, saturating f32-to-int, f64-to-f32, and `.ftz` conversion chains.
`DIV_DISABLE_F32_SPECIAL_MATH`	Sanitized rounded and `.ftz` f32 `sqrt`/`rcp` plus approx `rcp`, `rsqrt`, `ex2`, `lg2`, `sin`, and `cos`.
`DIV_DISABLE_F32_COMPARE`	Sanitized ordered/unordered `set..u32.f32` and `setp..f32` comparisons, including `.ftz` forms, plus `testp.*.f32` classification.
`DIV_DISABLE_F32_SELP`	Sanitized `setp.*.f32`, including `.ftz` forms, feeding `selp.f32`.
`DIV_DISABLE_F16_ARITH`	Deterministic prologue coverage and randomized in-body coverage for scalar `.f16` and packed `.f16x2` add/sub/mul/fma/min/max/abs/neg, including predicated forms.
`DIV_DISABLE_F16_COMPARE`	Deterministic prologue and randomized in-body coverage for scalar `.f16` and packed `.f16x2` ordered/unordered `set`, `setp`, `.and/.or/.xor` setp forms, and predicate-fed `selp` forms.
`DIV_DISABLE_F16_CVT`	Deterministic prologue coverage and randomized in-body coverage for `.f16` conversion chains through f32, f64, integer, and packed `.f16x2` forms, including predicated forms.
`DIV_DISABLE_BF16_TF32_CVT`	Deterministic prologue coverage and randomized in-body coverage for `.bf16`, `.bf16x2`, and `.tf32` conversion forms, including predicated forms.
`DIV_DISABLE_F64_ARITH`	Sanitized `add/sub/mul/div/fma/copysign/min/max.f64` arithmetic.
`DIV_DISABLE_F64_ROUNDING`	Sanitized `.rz/.rm/.rp` f64 add/sub/mul/div/fma arithmetic.
`DIV_DISABLE_F64_UNARY`	Sanitized `abs.f64` and `neg.f64`.
`DIV_DISABLE_F64_CVT`	Sanitized signed/unsigned 32/64-bit f64/int, saturating f64-to-int, and f32-to-f64 conversion chains.
`DIV_DISABLE_F64_SPECIAL_MATH`	Sanitized rounded f64 `sqrt` and `rcp`.
`DIV_DISABLE_F64_COMPARE`	Sanitized ordered/unordered `set..u32.f64` and `setp..f64` comparisons plus `testp.*.f64` classification.
`DIV_DISABLE_F64_SELP`	Sanitized `setp.*.f64` feeding `selp.f64`.
`DIV_DISABLE_SIGNED_CMP`	Signed predicate comparisons.
`DIV_DISABLE_SIGNED_DIVREM`	`div.s32` and `rem.s32`.
`DIV_DISABLE_REG_DIVREM`	Register-divisor `div.u32` and `rem.u32` with sanitized divisors.
`DIV_DISABLE_PREDICATED_REG_DIVREM`	Predicated register-divisor `div.u32` and `rem.u32`.
`DIV_DISABLE_PREDICATED_DIVREM`	Predicated `div` and `rem` instructions.
`DIV_DISABLE_BRANCH_TABLES`	Bounded prologue and randomized in-body `brx.idx` branch tables with four rejoining targets.
`DIV_DISABLE_FUNNEL`	`shf.{l,r}.{wrap,clamp}.b32`.
`DIV_DISABLE_REG_FUNNEL`	Register-count `shf.{l,r}.{wrap,clamp}.b32`.
`DIV_DISABLE_PREDICATED_FUNNEL`	Predicated `shf.{l,r}.{wrap,clamp}.b32`.
`DIV_DISABLE_FUNNEL_CLAMP`	`shf.l.clamp.b32` and `shf.r.clamp.b32`.
`DIV_DISABLE_NEG`	`neg.s32`.
`DIV_DISABLE_SHL`	`shl.b32`.
`DIV_DISABLE_SHR`	`shr.u32`.
`DIV_DISABLE_SIGNED_SHR`	`shr.s32`.
`DIV_DISABLE_REG_SHIFTS`	Masked register-count shifts.
`DIV_DISABLE_PREDICATED_SHIFTS`	Predicated immediate-count shifts.
`DIV_DISABLE_PREDICATED_REG_SHIFTS`	Predicated masked register-count shifts.
`DIV_DISABLE_BFIND`	`bfind` and `bfind.shiftamt` instructions.
`DIV_DISABLE_SIGNED_BFIND`	`bfind.s32` and `bfind.shiftamt.s32`.
`DIV_DISABLE_WIDE_BFIND`	64-bit-source `bfind` and `bfind.shiftamt` instructions.
`DIV_DISABLE_SIGNED_WIDE_BFIND`	`bfind.s64` and `bfind.shiftamt.s64`.
`DIV_DISABLE_PREDICATED_BFIND`	Predicated `bfind` and `bfind.shiftamt` instructions.
`DIV_DISABLE_PREDICATED_WIDE_BFIND`	Predicated 64-bit-source `bfind` and `bfind.shiftamt` instructions.
`DIV_DISABLE_FNS`	`fns.b32`.
`DIV_DISABLE_REG_FNS`	`fns.b32` with a sanitized register base or offset operand.
`DIV_DISABLE_PREDICATED_FNS`	Predicated `fns.b32` instructions.
`DIV_DISABLE_PREDICATED_REG_FNS`	Predicated `fns.b32` instructions with a sanitized register base or offset operand.
`DIV_DISABLE_BFI`	`bfi.b32`.
`DIV_DISABLE_BFE`	`bfe.{u32,s32}`.
`DIV_DISABLE_BMSK`	`bmsk.{clamp,wrap}.b32`.
`DIV_DISABLE_BMSK_WRAP`	`bmsk.wrap.b32`.
`DIV_DISABLE_PREDICATED_BITFIELD`	Predicated `bfe`, `bfi`, and `bmsk` instructions.
`DIV_DISABLE_REG_BITFIELD`	Register pos/len operands for `bfe`, `bfi`, and `bmsk`.
`DIV_DISABLE_PREDICATED_REG_BITFIELD`	Predicated `bfe`, `bfi`, and `bmsk` instructions with register pos/len operands.
`DIV_DISABLE_WIDE_BFE`	64-bit scratch-register `bfe.{u64,s64}` instructions.
`DIV_DISABLE_SIGNED_WIDE_BFE`	64-bit scratch-register `bfe.s64` instructions.
`DIV_DISABLE_WIDE_BFI`	64-bit scratch-register `bfi.b64` instructions.
`DIV_DISABLE_PREDICATED_WIDE_BITFIELD`	Predicated 64-bit scratch-register `bfe` and `bfi` instructions.
`DIV_DISABLE_REG_WIDE_BITFIELD`	Sanitized register pos/len operands for 64-bit scratch-register `bfe` and `bfi`.
`DIV_DISABLE_PREDICATED_REG_WIDE_BITFIELD`	Predicated 64-bit scratch-register `bfe` and `bfi` instructions with register pos/len operands.
`DIV_DISABLE_MAD24`	`mad24.lo.u32` and `mad24.hi.u32`.
`DIV_DISABLE_MUL24`	`mul24.{lo,hi}.{u32,s32}`.
`DIV_DISABLE_PREDICATED_24BIT`	Predicated `mad24` and `mul24` instructions.
`DIV_DISABLE_SUBWORD_WIDE`	16-bit-source `mul.wide` and `mad.wide` through `.b16` scratch registers.
`DIV_DISABLE_SIGNED_SUBWORD_WIDE`	Signed 16-bit-source `mul.wide.s16` and `mad.wide.s16`.
`DIV_DISABLE_PREDICATED_SUBWORD_WIDE`	Predicated 16-bit-source `mul.wide` and `mad.wide` instructions.
`DIV_DISABLE_MUL_WIDE`	`mul.wide.{u32,s32}`.
`DIV_DISABLE_PREDICATED_MUL_WIDE`	Predicated `mul.wide.{u32,s32}` instructions.
`DIV_DISABLE_MAD_WIDE`	`mad.wide.{u32,s32}`.
`DIV_DISABLE_SIGNED_MAD_WIDE`	`mad.wide.s32`.
`DIV_DISABLE_PREDICATED_MAD_WIDE`	Predicated `mad.wide.{u32,s32}` instructions.
`DIV_DISABLE_WIDE_HIGH_RESULT`	High-half extraction from `mul.wide` and `mad.wide` results.
`DIV_DISABLE_WIDE_INT`	64-bit scratch-register ALU generation.
`DIV_DISABLE_WIDE_MINMAX`	64-bit scratch-register `min/max.{u64,s64}` instructions.
`DIV_DISABLE_WIDE_MULHI`	64-bit scratch-register `mul.hi.{u64,s64}` instructions.
`DIV_DISABLE_PREDICATED_WIDE_INT`	Predicated 64-bit scratch-register ALU generation.
`DIV_DISABLE_WIDE_MAD64`	64-bit operand `mad.{lo,hi}.{u64,s64}` instructions.
`DIV_DISABLE_SIGNED_WIDE_MAD64`	64-bit operand `mad.{lo,hi}.s64` instructions.
`DIV_DISABLE_PREDICATED_WIDE_MAD64`	Predicated 64-bit operand `mad` instructions.
`DIV_DISABLE_WIDE_SET`	64-bit scratch-register `set.{cmp}.u32.{u64,s64}` materialization.
`DIV_DISABLE_PREDICATED_WIDE_SET`	Predicated 64-bit scratch-register `set` materialization.
`DIV_DISABLE_WIDE_SETP`	64-bit scratch-register `setp`-fed guarded ALU instructions.
`DIV_DISABLE_WIDE_SETP_BOOL`	64-bit scratch-register `setp..<and
`DIV_DISABLE_WIDE_SELP`	64-bit scratch-register select `selp.{b64,u64,s64}` instructions.
`DIV_DISABLE_WIDE_UNARY`	64-bit scratch-register `not`, `cnot`, `popc`, `clz`, `brev`, `neg`, and `abs` instructions.
`DIV_DISABLE_SIGNED_WIDE_UNARY`	`neg.s64` and `abs.s64`.
`DIV_DISABLE_PREDICATED_WIDE_UNARY`	Predicated 64-bit scratch-register unary instructions.
`DIV_DISABLE_WIDE_SHIFTS`	64-bit scratch-register shifts.
`DIV_DISABLE_WIDE_REG_SHIFTS`	Masked register-count 64-bit scratch-register shifts.
`DIV_DISABLE_PREDICATED_WIDE_SHIFTS`	Predicated 64-bit scratch-register shifts.
`DIV_DISABLE_PREDICATED_WIDE_REG_SHIFTS`	Predicated masked register-count 64-bit scratch-register shifts.
`DIV_DISABLE_WIDE_DIVREM`	64-bit scratch-register `div/rem.{u64,s64}` instructions.
`DIV_DISABLE_SIGNED_WIDE_DIVREM`	64-bit scratch-register `div/rem.s64` instructions.
`DIV_DISABLE_REG_WIDE_DIVREM`	Register-divisor 64-bit scratch-register `div/rem.{u64,s64}` instructions with sanitized divisors.
`DIV_DISABLE_PREDICATED_REG_WIDE_DIVREM`	Predicated register-divisor 64-bit scratch-register `div/rem` instructions.
`DIV_DISABLE_PREDICATED_WIDE_DIVREM`	Predicated 64-bit scratch-register `div/rem` instructions.
`DIV_DISABLE_WIDE_ADDC`	64-bit scratch-register `add.cc.u64` / `addc.u64` pairs.
`DIV_DISABLE_WIDE_SUBC`	64-bit scratch-register `sub.cc.u64` / `subc.u64` pairs.
`DIV_DISABLE_PREDICATED_WIDE_CARRY`	Predicated 64-bit scratch-register carry pairs.
`DIV_DISABLE_WIDE_CARRY_CHAIN`	Three-instruction 64-bit scratch-register carry chains using `addc.cc.u64` or `subc.cc.u64`.
`DIV_DISABLE_PREDICATED_WIDE_CARRY_CHAIN`	Predicated three-instruction 64-bit scratch-register carry chains.
`DIV_DISABLE_ADDC`	`add.cc.u32` / `addc.u32` pairs.
`DIV_DISABLE_SUBC`	`sub.cc.u32` / `subc.u32` pairs.
`DIV_DISABLE_PREDICATED_CARRY`	Predicated `add.cc` / `addc` and `sub.cc` / `subc` pairs.
`DIV_DISABLE_CARRY_CHAIN`	Three-instruction `add/sub.cc` plus `addc/subc.cc` carry chains.
`DIV_DISABLE_PREDICATED_CARRY_CHAIN`	Predicated three-instruction `add/sub` carry chains.
`DIV_DISABLE_I32_BOUNDARY_IMMS`	Immediate `0x7fffffff` / `0x80000000` generation.
`DIV_DISABLE_DP4A`	`dp4a.{u32,s32}.{u32,s32}`.
`DIV_DISABLE_DP2A`	`dp2a.{lo,hi}.{u32,s32}.{u32,s32}`.
`DIV_DISABLE_NEGATED_PREDICATES`	Negated `@!%p` instruction predicates.
`DIV_DISABLE_PREDICATED_ALU`	Predicated integer ALU and floating-point arithmetic instructions.
`DIV_DISABLE_PREDICATED_UNARY`	Predicated integer unary, floating-point unary, and floating-point special-math instructions.
`DIV_DISABLE_CVT`	Direct base `cvt.{u32,s32}.{u8,u16,s8,s16}` instructions; narrow and wide round-trips have separate flags.
`DIV_DISABLE_PREDICATED_CVT`	Predicated integer and floating-point `cvt` instructions.
`DIV_DISABLE_NARROW_CVT`	Narrow `cvt` round-trips through 8/16-bit destination types.
`DIV_DISABLE_SIGNED_NARROW_CVT`	Signed narrow `cvt` round-trips.
`DIV_DISABLE_PREDICATED_NARROW_CVT`	Predicated narrow `cvt` round-trips.
`DIV_DISABLE_WIDE_CVT`	64-bit-source `cvt` round-trips.
`DIV_DISABLE_SIGNED_WIDE_CVT`	Signed 64-bit-source `cvt` round-trips.
`DIV_DISABLE_PREDICATED_WIDE_CVT`	Predicated 64-bit-source `cvt` round-trips.
`DIV_DISABLE_CVT_PACK`	Deterministic prologue and randomized in-body coverage for saturating `cvt.pack` integer packing, including predicated forms.
`DIV_DISABLE_SZEXT`	`szext.{wrap,clamp}.{u32,s32}`.
`DIV_DISABLE_SIGNED_SZEXT`	`szext.*.s32`.
`DIV_DISABLE_PREDICATED_SZEXT`	Predicated `szext` instructions.
`DIV_DISABLE_SETP_BOOL`	Integer and floating `setp.<cmp>.{and,or,xor}` predicate-combiner instructions.
`DIV_DISABLE_SETP_DUAL`	`setp. %p
`DIV_DISABLE_PRED_LOGIC`	`and.pred`, `or.pred`, `xor.pred`, and `not.pred`.
`DIV_DISABLE_PREDICATED_MAD`	Predicated `mad.lo.{u32,s32}` instructions.
`DIV_DISABLE_PREDICATED_MAD_HI`	Predicated `mad.hi.{u32,s32}` instructions.
`DIV_DISABLE_MAD_CARRY`	Three-instruction `mad.cc` / `madc.cc` / `madc` carry chains.
`DIV_DISABLE_SIGNED_MAD_CARRY`	Signed `mad.cc` / `madc.cc` / `madc` carry chains.
`DIV_DISABLE_PREDICATED_MAD_CARRY`	Predicated `mad.cc` / `madc.cc` / `madc` carry chains.
`DIV_DISABLE_PREDICATED_SET`	Predicated integer and floating-point `set`, `setp`, and `testp` instructions.
`DIV_DISABLE_PREDICATED_SELP`	Instruction-predicated `selp.{b32,u32,s32}`, `selp.f32`, and `selp.f64` instructions.
`DIV_DISABLE_SAD`	`sad.{u32,s32}`.
`DIV_DISABLE_SLCT`	`slct.{u32,s32,b32}.s32`.
`DIV_DISABLE_PREDICATED_SAD`	Predicated `sad.{u32,s32}` instructions.
`DIV_DISABLE_PREDICATED_SLCT`	Predicated `slct` instructions.
`DIV_DISABLE_PREDICATED_DP`	Predicated `dp4a` and `dp2a` instructions.
`DIV_DISABLE_PREDICATED_VIDEO`	Predicated video instructions.
`DIV_DISABLE_SET`	`set.{cmp}.u32.{u32,s32}`.
`DIV_DISABLE_S32_SLCT`	`slct.s32.*`.
`DIV_DISABLE_F32_SLCT`	`slct` forms with `f32` result or selector.
`DIV_DISABLE_WIDE_SLCT`	`slct` forms with `u64`, `s64`, or `b64` result.
`DIV_DISABLE_F64_SLCT`	`slct.f64.*`.
`DIV_DISABLE_VIDEO`	PTX video instructions.
`DIV_DISABLE_SIGNED_VIDEO`	Signed/mixed `s32` PTX video type suffixes.
`DIV_DISABLE_VIDEO_SAT`	PTX video `.sat` forms.
`DIV_DISABLE_VSUB4`	`vsub4.*` video forms.

Reduction And Sweeping

Variable	Default	Meaning
`REDUCE_GPUS`	`DIV_GPUS`, then all visible devices	CUDA devices used by `fuzzx-diff-reduce`.
`REDUCE_WORKERS_PER_GPU`	`DIV_WORKERS_PER_GPU`, then host-core based default capped at `16`	Reducer worker count per GPU.
`REDUCE_NO_PROGRESS_SECS`	`120`	Reducer timeout when no candidate completes.
`DIV_HANG_SECS`	`4`	`fuzzx-diff-sweep` no-progress threshold before reporting hangs.

License

FuzzX is licensed under the Apache License, Version 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

FuzzX PTX

Requirements

Layout

`ptxas` Bugs Found

Bugs Found in 13.0.88 and Fixed in 13.2.78

Running

Configuration

Shared

Run Control

Generator Shape

Generator Feature Toggles

Reduction And Sweeping

License

FilesExpand file tree

ptx

Directory actions

More options

Directory actions

More options

Latest commit

History

ptx

Folders and files

parent directory

README.md

FuzzX PTX

Requirements

Layout

ptxas Bugs Found

Bugs Found in 13.0.88 and Fixed in 13.2.78

Running

Configuration

Shared

Run Control

Generator Shape

Generator Feature Toggles

Reduction And Sweeping

License

`ptxas` Bugs Found