09-18-2020 03:37 AM
I noticed something strange. I have a Zynq PS where the APU cores are clocked at 1 GHz then in software I have a loop with a number of functions, so far mostly empty that I expect to take microseconds at the most so far.
The loop is synchronized to an interrupt so it starts every 10 ms.
There is a free-running counter for timestamping purposes incremented every ms.
For test purposes, some of the functions in the loop print some data with xil_printf, including the time counter above.
Because those functions are almost empty, I would expect them to run in microseconds, but I find that in practice, it seems to take 3 ms for each of them, basically to run the xil_printf function.
I also have a series of print-outs with printf (not xil_) and for those, the counter (timestamp) is the same (to the ms), so printf is faster than xil_printf, it looks like.
I know (and not nuch more than that, tbh) that xil_printf is a light version of printf. And it looks like slower as well. Is this something Xilinx can confirm?
09-18-2020 05:31 AM - edited 09-18-2020 05:32 AM
update: after replacing those xil_printf by printf, they are the same slow, taking 3 precious ms each.
The difference in what they print is not much: the slow ones have '%s' and print text (8-12 chars) from a char *table.
Can printing text with %s take ms? If running at 1 GHz, 1 ms has a million of clock cycles. Does it take three million cycles to do that?
09-22-2020 07:11 AM
UART output is usually pretty slow - like 115200 Baud.
Depending on how much is printed - this will take some time.
UART IP have buffers - but if the buffer if full, the CPU core has to wait until buffer space is available.
You might not measure the time to process the printf - but due to the fact that the UART buffers are full - you might measure the transmission time.
09-22-2020 07:38 AM
Yes, I found that they are actually the same slow because they both are blocking functions, it doesn't jump to the next instruction until the Tx buffer is empty. I learnt the alternative(s) is either to create a non-blocking print pseudo-function based on writing to a buffer that some interrupt will process in the spare time, or use the multicore architecture and have "another core for that chore". I prefer the second but it will take some time to implement...