We developed a multi-stage pipelined HLS accelerator and the pipeline stages are connected with HLS:stream objects. Then the entire accelerator is warpped up and attached to arm processor in Zynq (Ultra96). It works correctly, but the performance is lower than our expectation We want to figure out the bottleneck of the pipeline stages, but we can not find any tools. I used HW-emu to debug the pipeline bottleneck in SDAccel which is really helpful. So I am wondering if there are similar tools available for the Zynq systems in SDSoC? Any suggestions will be appreciated.