[NR UE] PDSCH Rx: Replace per slot heap allocation of memory buffers with one-time allocation at init and other memory optimizations#234
Conversation
Move six PDSCH RX buffers (rxdataF, rxdataF_comp, dl_ch_estimates, dl_ch_mag/magb/magr, rho_dl) from per-slot heap allocations (allocCast*) and a large stack VLA to single worst-case allocations in PHY_VARS_NR_UE, initialized at UE init and freed at teardown. This eliminates 6 malloc/free pairs per PDSCH slot. Signed-off-by: Rupanjali <rupanjali.srivastava@openairinterface.org>
…moving the layer demapping function within the symbol loop immediately after LLR function, thus removing the symbol loop within the demapping function Signed-off-by: Rupanjali <rupanjali.srivastava@openairinterface.org>
Signed-off-by: Rupanjali <rupanjali.srivastava@openairinterface.org>
|
CI Build: #468 | Not performing CI due to the absence of one of the following mandatory labels:
|
|
CI Build: #472 | Failed on the following stages: |
|
I tested this manually (with address sanitizer)
See also
|
|
The fundamental flaw in this MR is that it assumes only one slot is being decoded at a time - this is not true - each of the UE DL actors (4 default) can run in parallel and with this change will write in parallel to the same memory. In current design, you would have to allocate one such array per DL actor. In general I think the actor idea is flawed and the DL decoding should execute on the threadpool (it is faster and scales better). Regardless of that if you want to continue with your idea, whether the final shape of the threading model in UE will change or stay like it is now, you need to provide a "context" structure to main DL function that will pick out the correct pre-allocated structure. Right now you would pick based on actor index, but maybe in the future if the DL decoding executes on threadpool it will be something else. You can see the DL Actor usage if you search for |
|
@bpodrygajlo Thanks for the detailed feedback and for explaining the design context. This provides valuable insight. I will go through the code in more detail to better understand the current implementation and evaluate how memory pre-allocation can be structured in the context of the DL actor/thread pool architecture. |
Performs memory optimization to speed up OAI UE.