cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
foremans
Observer
Observer
1,594 Views
Registered: ‎05-09-2018

Efficient use of DSPs for systolic FIRs

I'm designing a systolic FIR in HLS but Vivado doesn't seem to want to use the DSP ACIN/OUT chain for the sample shift register.

 

What I expect for an N tap FIR with constant coefficients <18b: 

* Samples are <28b

* Coefficients are <18b and constant

* N DSPs (1 per tap)

* First & last DSPs with latency 4 (A1, A2, D, AD, P regs; no M)

* All other DSPs with latency 5 (A1, A2, D, AD, M, P regs)

* Samples in feed in A of the first DSP and propogate through A1, A2, ACOUT to the next DSP

* For symmetric cases, 2nd sample feeds in D from an external SRL

* Coefficients in B

 

This should result in a latency=N*5-2, II=1 using N DSPs and a fairly small number of LUT/FFs.

 

For a 64-tap symmetric filter, I see the correct number of DSPs but nearly 50k FFs instead of ~1k that I expect.

 

I found a few files that seem to be related:

  <VIVADO>/2018.2/common/technology/xilinx/common/dsp48e2.json

  <VIVADO>/2018.2/common/technology/autopilot/etc/dsp48e2_builtins.h

  <VIVADO>/2018.2/include/dsp48e2_builtins.h

 

These seem to define the various functions the DSP can synthesize to, but none include using any of the carry paths. I tried modifying these files to include the 3 new functions (fir_first, fir_mid, and fir_last) and calling _ssdm_op_DSP directly, but I get the following error:

 

ERROR: [TECH 200-102] Failed to evaluate 'dsp48_macro_latency_lookup DSP_Macro dsp': key "ACIN" not known in dictionary in platform 'DefaultPlatform'.

 

This makes some sense, given that none of the other builtins use the ACIN port, but it did exist in the dsp48e2.json file.

 

DefaultPlatform seems to be defined by the set of files in <VIVADO>/2018.2/common/config/CoreList.xml which includes DSP_Macro, but doesn't doesn't provide any clues beyond that.

 

 

Has anyone else attempted to dig this far?

DefaultPlatform does seem to be data-driven. Any advice on where to/how to add the missing keys?

 

0 Kudos
7 Replies
drjohnsmith
Teacher
Teacher
1,577 Views
Registered: ‎07-09-2009

Cant open the file on my phone,

 

but are you using the FIR compiler ?

 

https://www.xilinx.com/products/intellectual-property/fir_compiler.html

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
foremans
Observer
Observer
1,571 Views
Registered: ‎05-09-2018

I tried that first, but the FIR compiler can't do fully-pipelined (II=1) super-sample-rate (4x and 8x samples per clock) FIRs from HLS, which is ultimately what I need to make.

 

I'm just starting with a simpler II=1, 1 sample/clk case

0 Kudos
drjohnsmith
Teacher
Teacher
1,532 Views
Registered: ‎07-09-2009

You say the FIR compiler can't do what you want ?

 

I wonder why it can't ?

   

Is there something the interleaving / piep lining that means it cant route as you expect ?

    Have you drawn out how you would expect the dsp blocks to be implimented and connected ?

 

If you can provide that, we can see if it is actualy possible.

   

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
foremans
Observer
Observer
1,527 Views
Registered: ‎05-09-2018

Yes, in fact the diagram is on page 66 of the dsp48e2 under guide (on mobile right now, so sorry for no direct link) and the RTL FIR Compiler can generate the correct DSPs.

The pipeline before and after are both hls, so it would be preferable to have the fir in hls as well
0 Kudos
drjohnsmith
Teacher
Teacher
1,511 Views
Registered: ‎07-09-2009

Sorry, I can't see anything in that diagram you refer to about super sample rate, 4 or 8 samples per clock you refer to.

 

Edit, just had thought,

   are the tools being clever ?

 

in that they are routing the design to meet your timing constraints and stopping.

      so if the device has space this is what they have ended up at.

 

This is from a while back,

https://www.xilinx.com/support/answers/60913.html

 

it might be a way of forcing the usage 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
foremans
Observer
Observer
1,501 Views
Registered: ‎05-09-2018

It's actually failing to meet timing as is.
And I'm not doing super sample rate yet, just exactly what that diagram depicts
0 Kudos
drjohnsmith
Teacher
Teacher
1,468 Views
Registered: ‎07-09-2009

If you know exactly what you want,

 

instantiate the dsp's as you want them,

 

if you want to encourage vivado to use dsp's use the last link,

 

if you want to meet timing / layout, then let the tools run.

   The tools work harder as they work longer,

         so its probably just that the tools meet your requirements doing things simply  and stop, 

 

You see this a lot in these big chips, 

     one can add lots to a design, and the usage seems not to change, 

 

Personally I'd either not worry, and just make note,

    or probably, instantiate and get exactly what I want ,

        and keep this for future use.

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos