In the histogram kernel, we found an issue, but we don't know why it happens.
So in the program we used(The red blocks are two key ops where the issue happened. ):


The NAH operation in the red block is a routing operation, corresponding to the CTRL_MOV in the previous assembly code. It is expected to take only one cycle, but in the RTL trace, it appears to stall for an extra cycle and completes in the second cycle instead. Since I am still learning how to read the RTL logs, I have attached a CSV file with the key operations for further inspection.
tile_1_0_chain_126_134_key_only.csv
Here is a visualized timeseries graph for the steady cycle in this screenshot. The red block means stalls. Blue and green cycle means it is completed.

Here is the visualized graph for simulator:
Let me also attach the log for both:
histogram.json.log (Simulator log)
trace_histogram_4x4_Mesh.txt (RTL log)
In the histogram kernel, we found an issue, but we don't know why it happens.

So in the program we used(The red blocks are two key ops where the issue happened. ):
tile_1_0_chain_126_134_key_only.csv
Here is a visualized timeseries graph for the steady cycle in this screenshot. The red block means stalls. Blue and green cycle means it is completed.

Here is the visualized graph for simulator:
Let me also attach the log for both:
histogram.json.log (Simulator log)
trace_histogram_4x4_Mesh.txt (RTL log)