Skip to content

feat(backend): Add MaskPattern-based TSCATTER instruction and system tests#125

Closed
Crystal-wzy wants to merge 1 commit into
hw-native-sys:mainfrom
Crystal-wzy:feature
Closed

feat(backend): Add MaskPattern-based TSCATTER instruction and system tests#125
Crystal-wzy wants to merge 1 commit into
hw-native-sys:mainfrom
Crystal-wzy:feature

Conversation

@Crystal-wzy
Copy link
Copy Markdown
Collaborator

@Crystal-wzy Crystal-wzy commented May 13, 2026

Add a new TSCATTER overload that accepts a MaskPattern template parameter
(P0101, P1010, P0001, P0010, P0100, P1000, P1111), mirroring the existing
TGATHER MaskPattern interface. The implementation scatters source elements
into destination positions selected by the mask, zeroing non-selected slots.

Summary

  • Add TSCATTER<maskPattern>(dst, src, events...) template in common,
    costmodel, and cpu backends
  • Implement TScatter and TSCATTER_IMPL in TScatter.hpp with
    static assertions for 16/32-bit element width, Vec tile type, and
    row-major layout
  • Add tscatter_common.h shared header for masked test infrastructure
  • Add gen_data.py golden-data generator covering all 7 mask patterns
    for float32, float16, uint16, int16, uint32, and int32 types
  • Add parametrized GTest cases in main.cpp and kernel implementations
    in tscatter_kernel.cpp
  • Fix inconsistent WaitEvents &... events spacing to WaitEvents &...events
    across costmodel pto_instr.hpp

Testing

  • All masked TSCATTER parametrized test cases pass
  • Existing TSCATTER index-based test unaffected
  • Pre-commit hooks pass

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the TSCATTER instruction across the common, cost model, and CPU implementations, along with comprehensive unit tests for masked scatter operations. The review feedback identifies a significant bug in the test logic where vector sizing and type handling for half-precision data are incorrect. Additionally, the reviewer pointed out potential memory access safety issues in the CPU implementation's indexing logic and a redundant pointer assignment in the test kernel.

Comment thread tests/cpu/st/testcase/tscatter/main.cpp Outdated
Comment thread include/pto/cpu/TGather.hpp Outdated
Comment thread tests/cpu/st/testcase/tscatter/tscatter_kernel.cpp
@Crystal-wzy Crystal-wzy force-pushed the feature branch 3 times, most recently from b572c0e to 760ecc9 Compare May 13, 2026 08:23
@Crystal-wzy Crystal-wzy changed the title feat(backend): Add TSCATTER instruction with MaskPattern support feat(backend): Add MaskPattern-based TSCATTER instruction and system tests May 13, 2026
…tests

Add a new TSCATTER overload that accepts a MaskPattern template parameter
(P0101, P1010, P0001, P0010, P0100, P1000, P1111), mirroring the existing
TGATHER MaskPattern interface. The implementation scatters source elements
into destination positions selected by the mask, zeroing non-selected slots.

## Summary
- Add `TSCATTER<maskPattern>(dst, src, events...)` template in common,
  costmodel, and cpu backends
- Implement `TScatter` and `TSCATTER_IMPL` in `TScatter.hpp` with
  static assertions for 16/32-bit element width, Vec tile type, and
  row-major layout
- Add `tscatter_common.h` shared header for masked test infrastructure
- Add `gen_data.py` golden-data generator covering all 7 mask patterns
  for float32, float16, uint16, int16, uint32, and int32 types
- Add parametrized GTest cases in `main.cpp` and kernel implementations
  in `tscatter_kernel.cpp`
- Fix inconsistent `WaitEvents &... events` spacing to `WaitEvents &...events`
  across costmodel `pto_instr.hpp`

## Testing
- [x] All masked TSCATTER parametrized test cases pass
- [x] Existing TSCATTER index-based test unaffected
- [x] Pre-commit hooks pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant