12-21-2018 04:50 PM - edited 12-21-2018 05:09 PM
Im using Xilinx ISE Design Suite 14.7 for my timing analysis using a constraint shown below.
NET "Clk" TNM_NET = "Clk"; TIMESPEC "TS_Clk" = PERIOD "Clk" 40 ns HIGH 50%; OFFSET = IN 15 ns VALID 17 ns BEFORE Clk; OFFSET = OUT 8 ns AFTER Clk;
After PAR there were errors on the timing analysis based on the timing constraint above. Below shows some of the error example and the timing details for slack no. 1
I need help and guidance based on the experience to improve the timing issue.
Thank you very much.
12-22-2018 06:01 AM
632 levels of logic is too many (a lot too many) LUTs to have between a pair of registers. Each LUT and associated routing will add some delay to the data path. You need to add more registers to your design. Break up the long combinatorial paths with some pipeline registers.
12-24-2018 11:49 AM
You haven't given us a lot to go on, but..
I am going to make some assumptions (forgive me if I am wrong).
I am going to bet that you are a software designer that has no or little expertise in hardware design. You noticed that Verilog/VHDL have syntax that looks similar to software language and attempted to implement something using that syntax, and this is the result. If so, this is your problem.
Software is used to describe an algorithm - a sequence of steps to execute in order in order to accomplish some task.
RTL (Register Transfer Language) implemented in an HDL (Hardware Description Language - i.e. VHDL/Verilog/SystemVerilog) is used to describe an architecture - a hardware structure that can be used to accomplish a task. Even though the syntax of HDL languages are similar to regular software languages, they are really very different.
So, if you take a description of an algorithm and wrap it in an "always @(posedge clock)" in Verilog, or a clocked process in VHDL, and then synthesize it, you get this - a huge mass of combinatorial logic that cannot run at any reasonable speed. Almost certainly this algorithm you described has some looping structure that operates iteratively generating TONS and TONS of logic.
This is not how we design in hardware. In hardware we come up with an architecture - this architecture breaks a task into operations that are done on different clocks. This is the essence of RTL - you describe the registers and the transfers between the registers using the language.
So what you are seeing is not a "timing" problem, it is a problem in understanding what hardware is and how it works. If you want to use an FPGA to accomplish something useful, you will need to learn how to architect hardware and how to use HDL to describe that architecture.
As an alternative path, Xilinx now has Vivado HLS (High Level Synthesis) - this is a level of abstraction where you do describe an algorithm (in C) and let Vivado HLS explore different architectures and ultimately implement one of them. This flow does allow you to work at the algorithm (rather than RTL) level. However, even at this level of abstraction, it is important to understand what you are trying to do - using HLS without understanding basic hardware architectural concepts is not likely to lead to positive results...
Again, I know I made a lot of assumptions here (based on very little information) - forgive me if I am wrong. And if I am right, I hope this post helps point you in the right direction - RTL is not software!
12-25-2018 04:37 PM
Im doing a systolic array design.
Systolic array consists an array of PE (Processing Element).
So my design consist of 114 PEs which utilizes 9070 slices.
If I put a register in the longest path, the behavioral simulation will be incorrect (algorithm computation).
However, Ive updated the timing constraint as shown below
NET "Clk" TNM_NET = "Clk"; TIMESPEC "TS_Clk" = PERIOD "Clk" 400 ns HIGH 50%; OFFSET = IN 360 ns VALID 380 ns BEFORE Clk; OFFSET = OUT 220 ns AFTER Clk;
there were no error, but the maximum frequency is 2.825MHz.
is the design slow?
01-02-2019 04:58 PM
Simply inserting a register on the longest path is not sufficient. If you have multiple paths you need to add registers to all paths so that partial results are available on the same clock edge where they need to be combined. This can be done in an RTL like VHDL or verilog or you can try HLS as suggested by @avrumw. You can also try a handshaking mechanism where each PE runs and signals when it has a result. This works well with AXI streaming interfaces. Without adding some pipeline registers, the design will not run at a high clock rate.