Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎01-31-2018

DSP48E2 implementaiton issues



I am using the DSP48E2 via the language template to repalce logic that was done using LUT/FF. 

Running the DSP48E2 version of the logic works in simulation (behavioral and post-synthesis timing). However when running with the rest of the system on real FPGA it does not produce correct resutls. 


When I take out just the DSP part into a test framework consiting of just the DSP logic and a Microblaze-core connected to it. This works flawlessly on the real FPGA for any generated test data. 


Now for the suprising bit.

I generally lower the frequency during development involving real FPGA testing to reduce implementation/layout times. And tried to enable the DSP replacement logic again (thinking that perhaps I had something else wrong earlier). And it worked flawlessly, generated correct results... Then When I raised the frequency back up to to the intended frequency it fails again. Intended at 300MHz was lowering to 150MHz.  

Timing is meet with margin in both 150MHz and 300MHz case. But I was supprised that intorducing the DSP cores to the chain did not seem to make it harder for the implementation/layout step to meet timing requirements, this is contradictorary to my experiences when using the DSP via (* use_dsp = "yes" *)  for multiply/add operations. 

So now for my question. 

Do I need to manually tell the tool something about timing over the DSP48E2 block when manually using the DSP48E2 Arithmetic block instance from verilog (copied from the Language Template)? And if so where do I find information on this?

Or any other tips that can help me figure out what is wrong, why it stops working when I increase clock-freq (even tough timing is still meet and no timing exceptions given in constraints file). 

I am targeting working against a KU5P (KCU116 devboard). 


Side note - my testing with the DSP logic connected to the Microblaze core was at even lower frequencies.. 




-- Robin

0 Kudos
5 Replies
Registered: ‎01-31-2018

Additional info, as I tried to debug this further. I was trying to get the tool to give me timing info on any data path trough the DSP48E2 block. But I am failing to do much at all in the implemented design... 


If I try to get a timing report from the implemented design on any of the lines (A:B, and C, out trough P) via the DSP48E2 block i get "Timing results are empty" back.

If I try to trace trough it, by taking one of the ouputs and say "Expand cone" it fails to trace trough the DSP48E2 block but if I do it on the output it just traces to DSP_OUTPUT_INST instance. (but there are no register stages enabled in the DSP48E2 block so it should have traced trough it). 

If I select one of the flip-flips driving into the DSP48EV2 , then the timing stops at the DSP48EV2 instance (it should go trough it, there is no register enabled). However the timing is ofcourse great! As it seems to just have 0time dedicated for the DSP: 

Simmilarly if ask for a timing report from a source flip-flop to the end destination (this goes trought the DSP48 block) the block does not show up on the timing:

report_timing -from [get_pins {hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/extra_out_reg_reg[509]/C}] -to [get_pins {hwp/l2v3p/spA/Sponge_A_outbuff_fifo/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.mem/gbm.gbmg.gbmgb.ngecc.bmg/inst_blk_mem_gen/gnbram.gnativebmg.native_blk_mem_gen/valid.cstr/ramloop[7].ram.r/prim_noinit.ram/DEVICE_8SERIES.NO_BMM_INFO.SDP.WIDE_PRIM36_NO_ECC.ram/DINBDIN[5]}] -delay_type min_max -max_paths 1000 -sort_by group -input_pins -routable_nets -name timing_3


Data Path					
Delay Type	Incr (ns)	Path (ns)	Location	Netlist Resource(s)	Partition
FDRE (Prop_FFF_SLICEL_C_Q)	(r) 0.079	3.289	Site: SLICE_X68Y179	hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/extra_out_reg_reg[509]/Q	
net (fo=5, routed)	0.478	3.767		hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/extra_out_reg_reg_n_0_[509]	
			Site: SLICE_X62Y164	hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/out_S_stream[509]_INST_0_i_1/I0	
LUT2 (Prop_F5LUT_SLICEM_I0_O)	(r) 0.074	3.841	Site: SLICE_X62Y164	hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/out_S_stream[509]_INST_0_i_1/O	
net (fo=1, routed)	0.281	4.122		hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/p_7_out[509]	
			Site: SLICE_X62Y171	hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/out_S_stream[509]_INST_0/I1	
LUT6 (Prop_G6LUT_SLICEM_I1_O)	(r) 0.150	4.272	Site: SLICE_X62Y171	hwp/l2v3p/spA/Spunge_Multiple_A/inst/genblk1[0].spdv2/lsp/blake2b/out_S_stream[509]_INST_0/O	
net (fo=1, routed)	0.292	4.564		hwp/l2v3p/spA/Sponge_A_outbuff_fifo/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.mem/gbm.gbmg.gbmgb.ngecc.bmg/inst_blk_mem_gen/gnbram.gnativebmg.native_blk_mem_gen/valid.cstr/ramloop[7].ram.r/prim_noinit.ram/din[41]	
RAMB36E2			Site: RAMB36_X5Y33	hwp/l2v3p/spA/Sponge_A_outbuff_fifo/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.mem/gbm.gbmg.gbmgb.ngecc.bmg/inst_blk_mem_gen/gnbram.gnativebmg.native_blk_mem_gen/valid.cstr/ramloop[7].ram.r/prim_noinit.ram/DEVICE_8SERIES.NO_BMM_INFO.SDP.WIDE_PRIM36_NO_ECC.ram/DINBDIN[5]	
Arrival Time		4.564			


So well, I understand why the tool got souch an easy time to meet timing. And why it worked when I clocked it really low. 


So any idea on what goes on here? Why is all timing on the data path trough the DSP48E2 blocks just ignoring the DSP block? Do I need to manually put some timing in the DSP48E2 instance declaration  copied from the Language templates?

I cannot find any docuemtation indiciating that this is the case. 


Any ideas would be appritiated. 


Thanks in advance

-- Robin

0 Kudos
Registered: ‎08-16-2018

@mojs @vkanchan @yashp 


The design should work fine for both the cases. Is it possible for you to check it on some other board as well. 
Also, can you please provide the 'timing reports' as well to further investigate the issue. 


/ 7\7     Meher Krishna Patel, PhD
\ \        Senior Product Application Engineer, Xilinx
/ /        
\_\/\7   It is not so much that you are within the cosmos as that the cosmos is within you...
0 Kudos
Registered: ‎01-16-2013


As Meher suggested please try his suggestion to narrow down it further from DSP use case point of view.

Regarding timing DSP block is valid start and end point for timing. Through option (or understanding that DSP block will be considered as just combinational is incorrect).

I can see from your last post that you are not able to fetch the timing results for any path which can start or end with DSP. Can you please open implemented design and generate utilization report and check if DSP is getting inferred? If yes, then search for that DSP block and trace back to the source register associated with that DSP (considering DSP is endpoint). Then run below command:

report_timing -from [get_cells <start reg name>] -to [get_cells <dsp name>] -delay_type min_max -max_paths 10 -name test -file test_timing.rpt

If the valid results generated share the test_timing.rpt with us.



0 Kudos
Registered: ‎01-31-2018

Hi Yash, 

I can get timing going into the DSP block, but the DSP block has all internal registers turned off (all .xxxREG parameters set to 0 into the Macro), so this confuses me a bit. The tool also reports that the Slack and Requirement are "infinite" into the DSP... 

report_timing -to [get_cells design_1_i/dsp_32bmux_testing_0/inst/mmm/DSP48E2_inst ] -delay_type min_max -max_paths 1000 -name test -file d:\\test_timing.rpt

The tool also does not find any paths leading out from the DSP (switching to -> from above).

However a signal path going trough the DSP will show now DSP timing as I pointed out above. 

If I try to get timing to the resulting FDRE (the D input pin) then only the Reset  (R pin) signal shows up in the timing (not the output from the DSP). 

report_timing -to [get_cells design_1_i/iomodule_0/U0/IOModule_Core_I1/GPI_I1/Using_GPI.GPI_In_reg[0] ] -delay_type min_max -max_paths 1000 -name test

While the attached picture (schematic section) clearly shows how the D pin has a connection to the DSP.



(side note, your forum does not allow .rpt files as attachments). 

The attachment's test_timing.rpt content type (application/octet-stream) does not match its file extension and has been removed


0 Kudos
Registered: ‎05-31-2017

Hi @mojs ,

I have gone through your attached timing report and from the report, I observe that the destination paths are not specified with any clocks. It seems that you don't have any clocks for DSP due to which Vivado timing engine is showing the slack to be infinite

0 Kudos