cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
hsuh6
Visitor
Visitor
625 Views
Registered: ‎04-28-2020

Does DSP slices on ZCU102 has initial delay during power-on-reset?

Jump to solution

Hi, 

 

While I am working DSP design, with DSP48E2 primitive, I came across the problem that DSP module ignores first few inputs.

Is this normal? My first guess is the DSP is in the long reset sequence while it's ignoring my inputs.

 

This DSP design is intended to do a MACC operation, multiplying two inputs and just accumulating it in multiple cycles.

DSP question.PNG

As you can see above, I am giving two inputs, "ax" and "ay". So the result will be

rslt_a = ax * ay.

Inputs are given like,

ax = 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666.

ay = 0x7f (during the whole cycles)

However, what I am getting as the first output is "0x32cc9a" calculated from the inputs "0x6666" and "0x7f".

Why those first 5 inputs are ignored? Can someone help me on this?

 

Thank you

Best,

Hsuh

0 Kudos
Reply
1 Solution

Accepted Solutions
derekh
Xilinx Employee
Xilinx Employee
492 Views
Registered: ‎08-06-2018

Hi @hsuh6 

More detailed description on the DSP48E2 is found in UG579. Please use Xilinx DocNav to find the documents relevant to your tool release.

Note that the DSPs uses several pipeline buffers to achieve maximum performance. You can find the locations of these buffers in Figure 1-1 in UG597.
Looking at your design simulation it seems that you are using asynchronous reset that have synchronized release on your first sample. I would wait a clock cycle before enabling the data.
You should also expect output delayed due to the pipeline buffers.

As markg@prosensing.com mention, instantiating the primitives by hand is very cumbersome and is generally used if you need direct access to advanced features. Easier options are to either use the macros or let the synthesis tools infer them.
To give the synthesis tools a nudge in the right direction, buffer the signals in your HDL according to the locations of the pipeline buffers, using same bit widths (see Figure 2-12 in UG579).

UG579, Chapter 3 describe various options of design entry for DSP48E2.

Derek
SAE DSP and AI Engine, Xilinx Sweden/EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**

View solution in original post

0 Kudos
Reply
4 Replies
601 Views
Registered: ‎01-22-2015

@hsuh6 

Your simulation outputs are showing other strange behavior too.  Note that the outputs can be described as:

x7F * x6666 = x32CC9A

+ x7F * x1111 = x3B4409

+ x7F * x6666 = 6E10A3

+ x7F * x1111 = x768812

+ x7F * x6666 = xA954AC

I assume you are using the DSP48E2 primitive described on page 251 of UG974(v2020.1).  This primitive is complicated and difficult to setup correctly.  

You might find that the DSP48 Macro IP described in document, PG148, is a easier way to setup and use the DSP48E2 for MACC operation.

Cheers,
Mark

0 Kudos
Reply
derekh
Xilinx Employee
Xilinx Employee
493 Views
Registered: ‎08-06-2018

Hi @hsuh6 

More detailed description on the DSP48E2 is found in UG579. Please use Xilinx DocNav to find the documents relevant to your tool release.

Note that the DSPs uses several pipeline buffers to achieve maximum performance. You can find the locations of these buffers in Figure 1-1 in UG597.
Looking at your design simulation it seems that you are using asynchronous reset that have synchronized release on your first sample. I would wait a clock cycle before enabling the data.
You should also expect output delayed due to the pipeline buffers.

As markg@prosensing.com mention, instantiating the primitives by hand is very cumbersome and is generally used if you need direct access to advanced features. Easier options are to either use the macros or let the synthesis tools infer them.
To give the synthesis tools a nudge in the right direction, buffer the signals in your HDL according to the locations of the pipeline buffers, using same bit widths (see Figure 2-12 in UG579).

UG579, Chapter 3 describe various options of design entry for DSP48E2.

Derek
SAE DSP and AI Engine, Xilinx Sweden/EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**

View solution in original post

0 Kudos
Reply
hsuh6
Visitor
Visitor
272 Views
Registered: ‎04-28-2020

Thank you for your reply, markg.

You were right. Actually, I didn't understand how the internal registers in DSP are set.

Below is the setting that I have. Other registers (input A/B/D registers) are omitted for simplification purposes.

hsuh6_1-1610131542064.png

I have two pipeline registers in the loop feedback.

So the first output is the result of 0x6666 * 0x7F = 0x32CC9A

and the second output is another result of accumulation and it starts from "0".

Therefore, the second output is for the calculation, 0x7F * 0x7777

 

after that, two set of accumulated results are coming out sequentially.

set 1: 0x7F * 0x6666 + 0x7F * 0x7777 + 0x7F * 0x7777 ...

set 2: 0x7F * 0x7777 + 0x7F * 0x7777 + 0x7F * 0x7777 ... and so on.

 

I didn't really care about my outputs but you saved me there. Thanks!

 

0 Kudos
Reply
hsuh6
Visitor
Visitor
267 Views
Registered: ‎04-28-2020

Thank you for your reply, derekh.

 

Unfortunately, I had to use DSP48E2 primitive because my application requires me to use parameterized instantiation of computing elements.

But your comment was helpful. I delayed my inputs with extra cycles and now I am getting the right output which starts accumulation from 0x7F * 0x1111.

 

To whom will visit this page later,

I think the use of DSP48E2 is not cumbersome but one needs to carefully read the UG597 manual.

Also, Figure 1-1 and Figure2-1 helped me a lot to understand what's inside of that DSP module.

 

On top of that, you may need to care about how many registers you are using, because the number of pipeline registers placed in the loop will determine your number of interleaved accumulation results.

Like in my case, for example, I have two pipeline registers and that's why I was getting two sets of accumulation results.

0 Kudos
Reply