11-11-2020 01:05 AM - edited 11-11-2020 01:16 AM
Hi,
I have a piece of code that contains a few two dimentional arrays, something like this:
reg [W1-1:0] R1 [0:D1-1];
reg [W2-1:0] R2 [0:D2-1];
reg [W3-1:0] R3 [0:D3-1];
reg [W4-1:0] R4 [0:D4-1];
reg [W5-1:0] R5 [0:D5-1];
In some section of the code, I have used multi-level addressing for these registers, both as source and as destination, something like the following:
R1[R2[R3[R4[R4[index]]]]] <= R5[R4[R3[R2[R1[index]]]]];
This works just fine in simulation, and I know that in theory it should be synthesizable with no problem. But I am worried that it may lead to some bad inferences during synthesis and cause bugs which are difficult to find. Of course I can change the code and decrease the number of levels in accessing these registers, but that would negatively affect the readability of the code.
Another point is that I'm sure that the array widths and depths match, I mean I am certain that I'm not accessing any out of bounds addresses. My question is only about the concept of multi level addressing, both as source and as destination.
Do you think that I may safely leave this as it is, or I should change it?
Thanks is advance
11-15-2020 03:14 PM
What is your concern?
The code you wrote is legal synthesizable code. It simulates and doesn't generate any errors in Synthesis, so there is nothing wrong with it.
De-referencing an array defined as
reg [n-1:0] my_array [m-1:0];
using
out = my_array[index];
results in an n-bit wide n-entry multiplexer with each cell of the array as an input, and the index as the MUX control; this is a combinatorial structure. As with all combinational structures, they can be cascaded - each dereference of your cascaded dereferencing infers a different set of MUXes with the result of the inner one as the index.
On the left side (the assignment), the inner dereferencings are the same thing - they will result in MUXes. The final one will become a "one-of-m" enable for the corresponding bank of flops.
Depending on how these are used elsewhere in your design, the tools may be able to map these into RAMs - after all, logically, a RAM is just a big multiplexer (as long as there is only one [single port] or two [double port] dererferencing per clock). However, since you are doing these dereferencings combinatorially, these will not be able to be mapped to block RAMs (which are synchronous) - at best they will be mapped to distributed SelectRAMs, and at worst to lots and lots of individual flip-flops.
Logically and structurally there is nothing wrong with this logic.
But... depending on the widths and depths of these vectors, this has the possibility of generating huge combinatorial networks, which also will be very slow - the combinatorial propagation path through them could be very large. This will greatly limit the clock speed of the system into which you are integrating this combinatorial network - in other words, this won't work with a fast clock (or, if the sizes of these arrays are large, even a moderate clock).
Avrum
11-15-2020 05:29 AM
Any comments???
11-15-2020 09:14 AM
11-15-2020 03:14 PM
What is your concern?
The code you wrote is legal synthesizable code. It simulates and doesn't generate any errors in Synthesis, so there is nothing wrong with it.
De-referencing an array defined as
reg [n-1:0] my_array [m-1:0];
using
out = my_array[index];
results in an n-bit wide n-entry multiplexer with each cell of the array as an input, and the index as the MUX control; this is a combinatorial structure. As with all combinational structures, they can be cascaded - each dereference of your cascaded dereferencing infers a different set of MUXes with the result of the inner one as the index.
On the left side (the assignment), the inner dereferencings are the same thing - they will result in MUXes. The final one will become a "one-of-m" enable for the corresponding bank of flops.
Depending on how these are used elsewhere in your design, the tools may be able to map these into RAMs - after all, logically, a RAM is just a big multiplexer (as long as there is only one [single port] or two [double port] dererferencing per clock). However, since you are doing these dereferencings combinatorially, these will not be able to be mapped to block RAMs (which are synchronous) - at best they will be mapped to distributed SelectRAMs, and at worst to lots and lots of individual flip-flops.
Logically and structurally there is nothing wrong with this logic.
But... depending on the widths and depths of these vectors, this has the possibility of generating huge combinatorial networks, which also will be very slow - the combinatorial propagation path through them could be very large. This will greatly limit the clock speed of the system into which you are integrating this combinatorial network - in other words, this won't work with a fast clock (or, if the sizes of these arrays are large, even a moderate clock).
Avrum
11-15-2020 11:18 PM
11-15-2020 11:20 PM