topic Re: ap_fixed details in High-Level Synthesis (HLS)
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/805721#M10687
<P>I just found this old thread and wanted to contribute a solution to it which I didn't have at the time:</P>
<P> </P>
<P>The traits file in %XILINX/include/utils/x_hls_traits.h has the solution to this issue. One can define a trait based on types and use them like this:</P>
<P> </P>
<P>typedef ap_fixed<...> foo;</P>
<P>typedef ap_fixed<...> bar;</P>
<P> </P>
<P>typedef typename hls::x_traits<foo, bar>::MULT_T fooXbar;</P>
<P> </P>
<P>This is how multiplication etc operators can expand their outputs based on the size of the input variable.</P>Fri, 03 Nov 2017 22:35:29 GMTmuzaffer2017-11-03T22:35:29Zap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/689498#M5854
<P>I got a few questions regarding ap_fixed datatype that I can't find answers to.</P>
<P> </P>
<P>-Is ap_fixed(M,0) ever going to get negative? Is it an invalid but allowed datatype, or is the sign bit explicit or ignored?</P>
<P>-Is ap_fixed(N,1) correct to cover the range [-1,1>?</P>
<P>-What ap_fixed(W,I) datatype is the lossless PRODUCT of any ap_fixed(a,b)*ap_fixed(c,d)?</P>
<P>-What ap_fixed(W,I) datatype is the lossless SUM of any ap_fixed(a,b)+ap_fixed(c,d)?</P>
<P> </P>Sat, 26 Mar 2016 23:27:03 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/689498#M5854cyviz2016-03-26T23:27:03ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691078#M5922
1) not sure, probably invalid.<BR />2) Yes.<BR />3) ap_fixed<a+c, b+d><BR />4) ap_fixed<max(a,c)+1, max(b,d)+1>Tue, 05 Apr 2016 06:12:27 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691078#M5922muzaffer2016-04-05T06:12:27ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691086#M5923
<P>Thanks, I think I came to the same conclusion. I wish there was a simpler way to define 3) and 4) tho. It gets a bit messy to define a bunch of lossless sums and products of some initial datatypes. It would be nice to be able to typedef a sum or product of two other types.</P>
<P> </P>
<P>When it comes to 1) , if you sign extend, I suppose you will get some split range. The tools does seem to accept zero integer digits.</P>
<P> </P>Tue, 05 Apr 2016 06:47:51 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691086#M5923cyviz2016-04-05T06:47:51ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691140#M5927
<P>It'd be sort of nice to have a whole page in HLS that lists all the variables in the project, which variables each one depends upon, and allows you to enter an equation to define its length. That way (a) it's nice and neat (the equations don't clutter up the code), and (b) if you change something it's easy to see how it'll affect everything else.</P>
<P> </P>
<P>Matlab's FPGA tools have a very limited version of this (in that there's a page listing all the variables where you can set widths for each one). Perhaps Vivado HLS could do better?</P>Tue, 05 Apr 2016 10:44:44 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691140#M5927u42233742016-04-05T10:44:44ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691141#M5928
<P>Another thought. If you start with two ap_fixed types, and make any math to a ridiculous large ap_fixed<4096,2048> type, will the tools just optimize the unused bits out (preferably before it propagates into the top hierarcy)?</P>
<P> </P>Tue, 05 Apr 2016 10:49:37 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691141#M5928cyviz2016-04-05T10:49:37ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691166#M5930
<P>I'm pretty sure that HLS will not. HLS only does really straightforward optimisations, so in "for (int i = 0; i < 10; i++)" it'll recognise that actually "i" can just be an unsigned 4-bit value, not a signed 32-bit one.</P>
<P> </P>
<P>Vivado itself (non-HLS) will trim them out during synthesis and implementation, if it can logically guarantee the validity of that approach - but this still leaves you with a huge module right up to that point.</P>
<P> </P>
<P>There are, of course, some areas where the tools just can't follow the logic through. If you have a 3-element vector (x,y,z) and you multiply that by L = 1/sqrt(x^2 + y^2 + z^2) then the vector is normalised. Mathematically, it's easy to show that the resulting components must be no more than 1. However, what HLS (and Vivado) sees is that you're multiplying a 32-bit (for example, maybe 16.16 fixed-point) vector by a 32-bit value (16.16 fixed-point) and therefore the result should be 64-bit (32.32 fixed-point). It can't figure out that when x, y, and/or z are large, L is always small - and vice versa - which actually ensures that 1.32 fixed-point output would work perfectly.</P>
<P> </P>
<P> </P>Tue, 05 Apr 2016 13:16:25 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691166#M5930u42233742016-04-05T13:16:25ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691178#M5934
<P>Your example is worth looking closer at. I guess the type of X limits your range more than you may be aware of at first sight.</P>
<P>The expression x^2+y^2+z^2 can overflow earlier than expected unless you cast X to some higher width/precision that can handle the sum.</P>
<P> </P>
<P>Even sqrt has precision rules, but yes, the tools will not know this function at compile time, so it won't know the type. For sum and product however, the tools will know.</P>
<P> </P>Tue, 05 Apr 2016 14:00:59 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691178#M5934cyviz2016-04-05T14:00:59ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691279#M5939
<P>Yes, you'd want to have the sum (x^2 + y^2 + z^2) as at least a 66-bit (34.32) value in this case, which then gets cut down quite a lot when you do the square root.</P>
<P> </P>
<P>It's a problem that will always occur when you have inter-dependent data. For example, when you're summing edges in an image, it's not possible to have every single location giving a large positive edge. You can get a single large positive edge in the X axis by having pixels [0, 255] (ie so the edge value is 255). However, you can't have two of those in a row because that would imply that the image looks like [0, 255, 510] (and 510 is not a valid 8-bit value). In fact, using a very simple X-axis edge detector (edge[y][x] = image[y][x] - image[y][x-1]) the absolute maximum sum of edges in each line is 9-bit signed (and is just the last element minus the first element).</P>
<P> </P>
<P> </P>
<P>Even sum and product can be a little bit challenging. Take the following example:</P>
<P> </P>
<PRE>ap_uint<8> image[640*480];
int accumulator = 0;
for (int i = 0; i < 640*480; i++) {
accumulator += image[i];
}</PRE>
<P><BR />A very simple reading of this says that after the first loop iteration the accumulator will be 8-bit. After the second loop iteration it'll be (8-bit + 8-bit => 9-bit). After the third loop iteration it'll be (9-bit + 8-bit => 10-bit). After the fourth loop iteration it'll be (10-bit + 8-bit => 11-bit). And so on, until after the 307200th iteration it'll be 307207-bit. Of course, a more advanced analysis would correctly recognise that log2(640*480) < 19, so a 27-bit accumulator would do nicely. This comes down to how good HLS is at (a) figuring out loop tripcounts, and (b) interpreting what the user is doing. <BR /><BR />With regards to (a) this would imply a significant change to the functionality of the loop_tripcount pragma. Currently, if you put incorrect values in here, it just means that the latency analysis is wrong (the design will work fine but it'll run for a different time to what you expected). If HLS decides bit-widths based on that pragma, then putting incorrect values in would produce an unworkable design as internal variables would overflow.</P>
<P> </P>Wed, 06 Apr 2016 00:49:51 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691279#M5939u42233742016-04-06T00:49:51ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691350#M5948
<P>Yes I see your points regarding the loops. I would not be using the Int type like that in HLS. I would define the loop variable "i" as ap_int<19>, and if the tools allowed, the accumulator should be typedef'd like:</P>
<P> </P>
<P> typedef accumulator (i'type)*(image*'type)</P>
<P> </P>
<P>But for now, I would have to do manual</P>
<P> </P>
<P> typedef accumulator ap_int<19+8></P>
<P> </P>
<P> </P>Wed, 06 Apr 2016 07:56:50 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691350#M5948cyviz2016-04-06T07:56:50ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691383#M5949
<P>HLS does seem to actually handle ints as loop variables correctly (so it cuts that one down to 19-bit).</P>
<P> </P>
<P>For the other stuff, you can do all the definitions in the preprocessor.</P>
<P> </P>
<PRE>#define IMAGE_WIDTH 640
#define IMAGE_HEIGHT 480
#define NUM_PIXELS (IMAGE_WIDTH * IMAGE_HEIGHT)
#define PIXEL_WIDTH 8
#define NUM_PIXELS_LOG2 LOG2(NUM_PIXELS) // There are a few ways of doing LOG2 in the preprocessor.
#define IMAGE_SUM_WIDTH (PIXEL_WIDTH + NUM_PIXELS_LOG2)
#define PIXEL_EDGE_WIDTH (PIXEL_WIDTH + 1) // Edges can be positive or negative so this needs to have a sign bit added
#define IMAGE_SQUARE_SUM_WIDTH (PIXEL_WIDTH * 2 + NUM_PIXELS_LOG2)
typedef ap_uint<PIXEL_WIDTH> pixel_t;
typedef ap_uint<IMAGE_SUM_WIDTH> accumulator_t;
typedef ap_uint<PIXEL_SQUARE_SUM_WIDTH> square_accumulator_t;
typedef ap_uint<NUM_PIXELS_LOG2> image_index_t;
void test(pixel_t image[NUM_PIXELS]) {
accumulator_t accumulator = 0;
square_accumulator_t square_accumulator = 0;
for (image_index_t i = 0; i < NUM_PIXELS; i++) {
pixel_t pixel = image[i];
accumulator += pixel;
square_accumulator += pixel*pixel;
}
}</PRE>Wed, 06 Apr 2016 10:58:17 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/691383#M5949u42233742016-04-06T10:58:17ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/805721#M10687
<P>I just found this old thread and wanted to contribute a solution to it which I didn't have at the time:</P>
<P> </P>
<P>The traits file in %XILINX/include/utils/x_hls_traits.h has the solution to this issue. One can define a trait based on types and use them like this:</P>
<P> </P>
<P>typedef ap_fixed<...> foo;</P>
<P>typedef ap_fixed<...> bar;</P>
<P> </P>
<P>typedef typename hls::x_traits<foo, bar>::MULT_T fooXbar;</P>
<P> </P>
<P>This is how multiplication etc operators can expand their outputs based on the size of the input variable.</P>Fri, 03 Nov 2017 22:35:29 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/805721#M10687muzaffer2017-11-03T22:35:29ZRe: ap_fixed details
https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/820511#M11368
<P><LI-USER uid="33373"></LI-USER></P>
<P> </P>
<P>For your previous comment in Q4 LOSSLESS SUM,it seems your reply is something wrong in my tests in ap_fixed<5,3> for 3.25 and ap_fixed<5,2> for 1.125.</P>
<P> </P>
<P>It should be corrected as ap_fixed<max(a-c,b-d)+max(b,d)+1,max(b,d)+1>. Thanks</P>Tue, 09 Jan 2018 14:06:54 GMThttps://forums.xilinx.com/t5/High-Level-Synthesis-HLS/ap-fixed-details/m-p/820511#M11368nanson2018-01-09T14:06:54Z