Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
测试平台 D1 Nezha
查阅c906用户手册
https://occ-oss-prod.oss-cn-hangzhou.aliyuncs.com/resource//1685946574371/%E7%8E%84%E9%93%81C906R2S1%E7%94%A8%E6%88%B7%E6%89%8B%E5%86%8C%EF%BC%88occ%EF%BC%89.pdf
指令名称 指令描述 执行延时(LMUL=1)
VSADD.VV 矢量整型加法有符号取饱和指令 3
VAADD.VV 矢量整型加法取平均数指令 3
推测vsaddu和vaaddu两个函数耗时情况接近.
为了比对优化前后的性能,我们把 rvv1.0 的 intrinsic 代码改写成 rvv 0.7.1, 然后在现有的硬件平台(D1 Nezha)测试,
由于 D1 不支持vaaddu(Vector Single-Width Averaging Add)我们把它替换成 vsaddu(Vector Single-Width Saturating Add)
工具链及编译参数
参考
https://github.com/sipeed/TinyMaix/blob/9487854be2ce329b89427d4a5b14ce6136cb15cf/src/arch_rv64v.h#L24
把测试程序推送到开发板,然后记录 10000000 次循环耗时
test.c 如下