UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer maxdz8
Observer
1,164 Views
Registered: ‎01-08-2018

Wait for (dsp) combinatorial to stabilize

Jump to solution

I am doing a beharioural simulation.

I have observed glitches in my DSP output. I'm trying to do a 64-bit add routing carry through cascade (in the future I think I'll route data instead).

 

17_05_19-Window.png

See those squares? Here is a detail.

17_07_06-Window.png

Ew! There are more glitches too! I'd spare you on those. I'm not even sure from where those come from.

I am more concerned if I sample at posedge, I'll most certainly read garbage. My first gut reaction was to just wait an extra clock at the end end be done with it.

 

However, I need to build 4 other stages on top of this and the delay will add up. Who says a clock will be sufficient to get into valid state?

 

I figured I needed to write a timing check so I re-read UG903 and UG945 but I have the impression they are more concerned about IO pins. There are some references about I/O from a module. Then I figured I could re-read SystemVerilog arc statements and maybe those would be more appropriate. Or maybe the wizard will suggest something as soon as I synthetize. In general, I see combinatorial delay being an issue and I cannot quite understand it.

 

Please give pointers. I am confused.

0 Kudos
1 Solution

Accepted Solutions
Historian
Historian
1,265 Views
Registered: ‎01-23-2009

Re: Wait for (dsp) combinatorial to stabilize

Jump to solution

For the first part, ignore the glitches - they are nothing.

 

The second part is more complicated. Yes, when properly designed, it is possible to get DSP48 cells running at upwards of 500MHz. BUT, that is assuming some very specific conditions - specifically that all the internal pipeline stages are enabled - there are 3 of them (input registers [A, B, C, D], mid-point registers [M], and final product registers [p]). Furthermore, To make complex operations that use multiple DSP48 cells, there are dedicated routing channels between adjacent DSPs that enable certain common cascade paths - to get 500+MHz with multiple DSPs, you need to design an architecture that can use these paths...

 

If you are planning to go through a DSP combinatorially (which is what it seems you are describing), each DSP cell will take FAR more than 2ns. The cell itself will probably take upwards of 4-5ns. Furthermore, unless you can use the dedicated cascade paths, you will pay significant routing penalties going from cell to cell. 

 

All told, if you are looking at doing non-pipelined, cascaded multiplication operations at 100MHz, you probably won't be able to get more than 2 (or maybe 3) operations in one clock cycle (and maybe not even that)...

 

All this underscores the need to do this "properly". The DSP cells (as well as the FPGA in general) only give you really good performance if you architect your design to take advantage of the device architecture. One of the most important parts of this is proper pipelining - specifically when it come to "big block" cells, like the DSP48 and block RAMs...

 

Avrum

0 Kudos
6 Replies
Scholar drjohnsmith
Scholar
1,153 Views
Registered: ‎07-09-2009

Re: Wait for (dsp) combinatorial to stabilize

Jump to solution

lets have the circuit / design

 

where is the clock ?

 

is this a post or pre synthesis simulation ?

 

what would your predict the output of a combinational circuit to look like ?

 

 

0 Kudos
Historian
Historian
1,088 Views
Registered: ‎01-23-2009

Re: Wait for (dsp) combinatorial to stabilize

Jump to solution

You need to spend some time learning about how synchronous digital design works.

 

The whole idea behind synchronous design is that "every thing updates on the same edge of the same clock". Just at/after the edge of the clock, the entire system is in transition - transitions will propagate forward from all synchronous elements combining and potentially recombining through the combinatorial network. However, the cause of these transitions were all started by the update of the synchronous elements, and since these change only once per clock edge, these transitions will eventually die down and the combinatorial network will reach a steady state (as long as there are no combinatorial loops, which are not allowed in synchronous design).

 

The trick in proper synchronous design is to ensure that the combinatorial system reaches a steady state before the next rising edge of the clock (actually one setup time before). This is the role of static timing analysis. It analyzes the propagation of transitions from static timing startpoints (clocked elements) to static timing endpoints (also clocked elements) to make sure that)

  - all propagations are complete before the next clock edge (setup check)

  - no propagations from a startpoint cannot affect an endpoint on the same clock edge (hold check)

    - this can only happen in the presence of endpoints that have a positive hold requirement and/or in designs that have clock skew

       - (all devices have some clock skew)

 

Synthesis, placement and routing are all timing aware. With proper constraints, they not only check that these are true, but optimize the design in order to make these true (if possible).

 

So "glitches" are normal - they are not an indication of any problem.

 

The only thing that is confusing is that you are seeing them in behavioral simulations. In behavioral simulations there is no modelling of timing through combinatorial cells. As a result, this "propagation" through the combinatorial network is instantaneous and looks like it all happens at the rising edge of the clock. If you do post-synthesis or post-place and route timing simulations, you will see these transitions throughout the clock period - if timing is tight, they will continue until "right up before the next clock edge"; again, this is all normal and correct - as long as  static timing analysis says there are no setup or hold violations you design is fine.

 

If you are seeing some transitions in behavioral simulations, then some cells are introducing some intra-clock timing delays. In the past, the models for some Xilinx cells have had small propagation delays (usually 100ps), or these delays may be coming from elsewhere (like, for example, your testbench). So while it is unusual to see these transitions in behavioral simulations, they are not a problem...

 

Avrum

Observer maxdz8
Observer
1,072 Views
Registered: ‎01-08-2018

Re: Wait for (dsp) combinatorial to stabilize

Jump to solution

Let me check if I understand you correctly.

 

What you are saying is I am thinking the other way around. There's an implicit check for everything to settle before next active edge so if there is a problem I'll be warned at a later stage. This cannot be relaxed.

 

That's fine but I am going to write logic around this module. If the outer logic is 'wait a clock' I'll need to rewrite it to wait two clocks if I add a P register to meet timing. That seems like a lot of work to do without even a rule of thumb.

 

So my rule of thumb was this: I read DSPs can hit over 500Mhz and I target100Mhz so I'd expect to be able to traverse a few of them before I am tight. So yes, I find those quirks super odd but they made me think at a bigger picture of things I am assuming which might well be false. And I know there is some correlation between delay and congestion, is this truly so highly chaotic we cannot even make a ballpark estimation? I'd rather write something which makes at least half-sense.

 

In the above pics, the posedge of clk is on the gray lines. I attach a minimal example to reproduce the quirks. Not they are by themselves the true problem after all.

0 Kudos
Scholar drjohnsmith
Scholar
1,069 Views
Registered: ‎07-09-2009

Re: Wait for (dsp) combinatorial to stabilize

Jump to solution

 

 

Just to state:

   this is not your problem you are seeing here, but might be worth keeping in mind later:

      

re your comment about the dsp can meet 500 MHz,

 

that is true

 

but

 

it assumes you are using the registers in the dsp,

    

If you dont , then they are much slower.

 

 

 

Historian
Historian
1,266 Views
Registered: ‎01-23-2009

Re: Wait for (dsp) combinatorial to stabilize

Jump to solution

For the first part, ignore the glitches - they are nothing.

 

The second part is more complicated. Yes, when properly designed, it is possible to get DSP48 cells running at upwards of 500MHz. BUT, that is assuming some very specific conditions - specifically that all the internal pipeline stages are enabled - there are 3 of them (input registers [A, B, C, D], mid-point registers [M], and final product registers [p]). Furthermore, To make complex operations that use multiple DSP48 cells, there are dedicated routing channels between adjacent DSPs that enable certain common cascade paths - to get 500+MHz with multiple DSPs, you need to design an architecture that can use these paths...

 

If you are planning to go through a DSP combinatorially (which is what it seems you are describing), each DSP cell will take FAR more than 2ns. The cell itself will probably take upwards of 4-5ns. Furthermore, unless you can use the dedicated cascade paths, you will pay significant routing penalties going from cell to cell. 

 

All told, if you are looking at doing non-pipelined, cascaded multiplication operations at 100MHz, you probably won't be able to get more than 2 (or maybe 3) operations in one clock cycle (and maybe not even that)...

 

All this underscores the need to do this "properly". The DSP cells (as well as the FPGA in general) only give you really good performance if you architect your design to take advantage of the device architecture. One of the most important parts of this is proper pipelining - specifically when it come to "big block" cells, like the DSP48 and block RAMs...

 

Avrum

0 Kudos
Observer maxdz8
Observer
1,035 Views
Registered: ‎01-08-2018

Re: Wait for (dsp) combinatorial to stabilize

Jump to solution

I guess I'm asking for pie in the sky then. I will adjust to a more realistic estimates - thank you for sharing - and then hit implementation with something hopefully more sane.

0 Kudos