Before deploying my design in the target platform, I ran post-synthesis functional simulation and confirmed that my design synthesised properly. I also confirmed that my design met the timing requirements. But, when I deploy the design, some of the runs (1 in 4 ) fail. The same design works with a smaller number of clients.
Some background about my kernel. I interface 16 clients to 16 axi ports of stack 0. Each client will read from a memory location, add one, and write back to the same location sequentially 256 times. As mentioned before, sometimes, one of the clients does not send any packets. If I attach an ILA to that particular port, the corresponding client no longer fails. Some other client fails.
This behaviour usually means timing violations or some issue with the CDC. But, my standalone design meets the timing requirements (300 MHz), and the addition of ILA bring the frequency down as well. There are no CDC in my design, as well. There are no latches and no critical warnings.
This bizarre set of circumstances stumps me. Please help!