11-01-2020 01:49 PM
Hi,
While I am working DSP design, with DSP48E2 primitive, I came across the problem that DSP module ignores first few inputs.
Is this normal? My first guess is the DSP is in the long reset sequence while it's ignoring my inputs.
This DSP design is intended to do a MACC operation, multiplying two inputs and just accumulating it in multiple cycles.
As you can see above, I am giving two inputs, "ax" and "ay". So the result will be
rslt_a = ax * ay.
Inputs are given like,
ax = 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666.
ay = 0x7f (during the whole cycles)
However, what I am getting as the first output is "0x32cc9a" calculated from the inputs "0x6666" and "0x7f".
Why those first 5 inputs are ignored? Can someone help me on this?
Thank you
Best,
Hsuh
11-12-2020 04:33 AM
Hi @hsuh6
More detailed description on the DSP48E2 is found in UG579. Please use Xilinx DocNav to find the documents relevant to your tool release.
Note that the DSPs uses several pipeline buffers to achieve maximum performance. You can find the locations of these buffers in Figure 1-1 in UG597.
Looking at your design simulation it seems that you are using asynchronous reset that have synchronized release on your first sample. I would wait a clock cycle before enabling the data.
You should also expect output delayed due to the pipeline buffers.
As markg@prosensing.com mention, instantiating the primitives by hand is very cumbersome and is generally used if you need direct access to advanced features. Easier options are to either use the macros or let the synthesis tools infer them.
To give the synthesis tools a nudge in the right direction, buffer the signals in your HDL according to the locations of the pipeline buffers, using same bit widths (see Figure 2-12 in UG579).
UG579, Chapter 3 describe various options of design entry for DSP48E2.
11-01-2020 05:58 PM
Your simulation outputs are showing other strange behavior too. Note that the outputs can be described as:
x7F * x6666 = x32CC9A
+ x7F * x1111 = x3B4409
+ x7F * x6666 = 6E10A3
+ x7F * x1111 = x768812
+ x7F * x6666 = xA954AC
I assume you are using the DSP48E2 primitive described on page 251 of UG974(v2020.1). This primitive is complicated and difficult to setup correctly.
You might find that the DSP48 Macro IP described in document, PG148, is a easier way to setup and use the DSP48E2 for MACC operation.
Cheers,
Mark
11-12-2020 04:33 AM
Hi @hsuh6
More detailed description on the DSP48E2 is found in UG579. Please use Xilinx DocNav to find the documents relevant to your tool release.
Note that the DSPs uses several pipeline buffers to achieve maximum performance. You can find the locations of these buffers in Figure 1-1 in UG597.
Looking at your design simulation it seems that you are using asynchronous reset that have synchronized release on your first sample. I would wait a clock cycle before enabling the data.
You should also expect output delayed due to the pipeline buffers.
As markg@prosensing.com mention, instantiating the primitives by hand is very cumbersome and is generally used if you need direct access to advanced features. Easier options are to either use the macros or let the synthesis tools infer them.
To give the synthesis tools a nudge in the right direction, buffer the signals in your HDL according to the locations of the pipeline buffers, using same bit widths (see Figure 2-12 in UG579).
UG579, Chapter 3 describe various options of design entry for DSP48E2.
01-08-2021 10:50 AM
Thank you for your reply, markg.
You were right. Actually, I didn't understand how the internal registers in DSP are set.
Below is the setting that I have. Other registers (input A/B/D registers) are omitted for simplification purposes.
I have two pipeline registers in the loop feedback.
So the first output is the result of 0x6666 * 0x7F = 0x32CC9A
and the second output is another result of accumulation and it starts from "0".
Therefore, the second output is for the calculation, 0x7F * 0x7777
after that, two set of accumulated results are coming out sequentially.
set 1: 0x7F * 0x6666 + 0x7F * 0x7777 + 0x7F * 0x7777 ...
set 2: 0x7F * 0x7777 + 0x7F * 0x7777 + 0x7F * 0x7777 ... and so on.
I didn't really care about my outputs but you saved me there. Thanks!
01-08-2021 11:01 AM
Thank you for your reply, derekh.
Unfortunately, I had to use DSP48E2 primitive because my application requires me to use parameterized instantiation of computing elements.
But your comment was helpful. I delayed my inputs with extra cycles and now I am getting the right output which starts accumulation from 0x7F * 0x1111.
To whom will visit this page later,
I think the use of DSP48E2 is not cumbersome but one needs to carefully read the UG597 manual.
Also, Figure 1-1 and Figure2-1 helped me a lot to understand what's inside of that DSP module.
On top of that, you may need to care about how many registers you are using, because the number of pipeline registers placed in the loop will determine your number of interleaved accumulation results.
Like in my case, for example, I have two pipeline registers and that's why I was getting two sets of accumulation results.