cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Voyager
Voyager
18,290 Views
Registered: ‎04-10-2012

Who is using all my slice LUTs?!?

Jump to solution

I had a design that was using approxamtly 10k of my 46.5k slice LUTs (21% utilization).  I made some code changes (which I didn't think were very extensive) and after my synthesize time increased from ~5 minutes to ~30 minutes, it completes succesfully, but now I am using roughly 1.9M (as in million) LUts, for a utilization of 4076%!!!!!

 

Obviously this isn't going to work, but I am not sure how to run down where the problem is.  Is there a way to see what is causing this MASSIVE increase in LUT usage?

 

TIA!

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Scholar
Scholar
28,711 Views
Registered: ‎02-27-2008

851,968 bits....


Thats a whole lot of bits to try to squeeze into something other than a BRAM....

 

Its even pretty large for a BRAM

 

2^16 X 13 memory (in bits) or 65K by 13

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose

View solution in original post

0 Kudos
10 Replies
Highlighted
Scholar
Scholar
18,289 Views
Registered: ‎02-27-2008

g,

 

Look at the synthesis report.  Are they LUT6, LUT5, LUT4 (etc) which inplies they are logic?  Are they LUTRAM, or SRL which implies they are memories?


This might helpp you realize what you did (like a typo in a line of RTL that is creating a REALLY BIG shift register, or memory).

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose
0 Kudos
Highlighted
Voyager
Voyager
18,285 Views
Registered: ‎04-10-2012

Ah, thank you.  That was sort of what I was hoping for (the reports and logs get so long, I still am unsure what to find in some of them)!

 

According to the synthesis report:

Slice Logic Utilization: 
 Number of Slice Registers:           18007  out of  93120    19%  
 Number of Slice LUTs:                1898027  out of  46560   4076% (*) 
    Number used as Logic:             191109  out of  46560   410% (*) 
    Number used as Memory:            1706918  out of  16720   10208% (*) 
       Number used as RAM:            1704640
       Number used as SRL:             2278

Slice Logic Distribution: 
 Number of LUT Flip Flop pairs used:  1905735
   Number with an unused Flip Flop:   1887728  out of  1905735    99%  
   Number with an unused LUT:          7708  out of  1905735     0%  
   Number of fully used LUT-FF pairs: 10299  out of  1905735     0%  
   Number of unique control sets:      1044

 If I am reading that right, it seems like it is mostly the fault of it getting turned into RAM, but that I am basically overusing everything.  Is that the way you would read it?

 

If that is the case, that didn't help narrow it down for me (and I assume you), is there a secondary place I could look into it?

0 Kudos
Highlighted
Scholar
Scholar
18,282 Views
Registered: ‎02-27-2008

Yes, yes, and Yes,

 

There is in the verbose synthesis report (you posted the summary) what the resources are exactly.


And, I say again, it thinks you want a lot of RAM, and SRL, and perhaps to build those structures, it is also using a lot of logic LUTs...

 

Dual ports asynchronous memories?  FIFO's built from cores (not from BRAM FIFO's)?

 

Perhaps we should have first asked:  what are you doing?  Is this a MicroBlaze/EDK project?  A System Generator DSP project?  A 'plain old logic' project?

 

I like that:  'plain old logic' -- POL...I think I will use that ...

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose
Highlighted
Voyager
Voyager
18,269 Views
Registered: ‎04-10-2012

OK, I can see what your saying.

 

I guess I would call it a POL (better than POS).  There is a microblaze project included, but that didn't change from the earlier stuff to now, so I am ignoring it.

 

I do have a piece of two-port RAM in there.  The module is based on the code from here

 

The data width is 13 bits, and the address width is 16 bits (as I type this, that does seem like it could be a problem).  Is that the sort of thing you were thinking of?

0 Kudos
Highlighted
Scholar
Scholar
28,712 Views
Registered: ‎02-27-2008

851,968 bits....


Thats a whole lot of bits to try to squeeze into something other than a BRAM....

 

Its even pretty large for a BRAM

 

2^16 X 13 memory (in bits) or 65K by 13

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose

View solution in original post

0 Kudos
Highlighted
Voyager
Voyager
18,259 Views
Registered: ‎04-10-2012

It never even sawned on me, but I do think that that was the cause (actually I know it was).

 

I just implemented a 2-port block memory core and it synthesized fine (and much faster again), and things seem to be reasonable again:

Slice Logic Utilization: 
 Number of Slice Registers:           17238  out of  93120    18%  
 Number of Slice LUTs:                18847  out of  46560    40%  
    Number used as Logic:             15865  out of  46560    34%  
    Number used as Memory:             2982  out of  16720    17%  
       Number used as RAM:              704
       Number used as SRL:             2278

Slice Logic Distribution: 
 Number of LUT Flip Flop pairs used:  26542
   Number with an unused Flip Flop:    9304  out of  26542    35%  
   Number with an unused LUT:          7695  out of  26542    28%  
   Number of fully used LUT-FF pairs:  9543  out of  26542    35%  
   Number of unique control sets:      1019

 I don't know if I have the RAM setup right (the next thing to test), but I at least know what the problem was (which was the purpose of the post).  So thanks a ton (you will notice that my LUT usuase actually went down now)!

0 Kudos
Highlighted
Guide
Guide
18,246 Views
Registered: ‎01-23-2009

The RAM description you provided is decidedly odd - I am not certain exactly what you are trying to do... If you are trying to do a "conventional" dual-port SRAM, I don't see why this coding is necessary. Its particularly puzzling since you have tristates in it...

 

The ISE ProjectNavigator has coding examples for RAMs - use the language template (the little light-bulb in the GUI) - Look under

<language> -> Synthesis Constructs ->Coding Examples -> RAM -> BlockRAM -> Dual Port.

 

The way you have written it, you are (sort of) trying to multiplex writes from both ports into one port of the dual-port RAM, then do reads on the other (or something like that) - this is clearly messing up the synthesis tool and it is trying to implement it in distributed RAM, rather than block RAM.

 

Stick to the language templates to make sure you describe a structure that works. Since the BRAM36 cell actually has two full read/write ports, describing the RAM in inferrable RTL isn't complex. Like the templates, just the very simple syntact to describe two ports, each with a synchronous read or write functionality.

 

Avrum

0 Kudos
Highlighted
Voyager
Voyager
18,227 Views
Registered: ‎04-10-2012

I didn't realize that there were coding examples in there, thank for pointing those out.

 

I took the "1 Clock, 1 Write Port, 1 Read Port" example and implemented it in my code, but it failed.  Synthesis reports:

Specific Feature Utilization:
 Number of Block RAM/FIFO:             1717  out of    156   1100% (*) 
    Number using Block RAM only:       1717
 Number of BUFG/BUFGCTRLs:               17  out of     32    53%  
 Number of DSP48E1s:                    143  out of    288    49%  

 I used the Block flag (* RAM_STYLE="BLOCK" *), so I can try something else, but I can't imagine that it will help much.  Any thoughts on that?

0 Kudos
Highlighted
Guide
Guide
18,218 Views
Registered: ‎01-23-2009

Why don't you post the inferrence of the RAM. You are telling us that you are trying to infer a 65536x13 RAM, which should end up implemented in 32 instances of a 2048x16 RAM. However, the tools seem to be trying to implement 1717 RAMs  - this is over 50x the size of the RAM you are supposed to be inferring...

 

Its not that it hasn't figured out that it is a block RAM - it has (as indicated by the utilization) - it just thinks its WAY bigger than what you are telling us you are trying to infer.

 

Are you sure you don't somewhere have a 2^(2^16), rather than just a 2^16 (or something like that)?

 

Avrum

0 Kudos
Highlighted
Voyager
Voyager
9,599 Views
Registered: ‎04-10-2012

I needed to finish a thread of some other bugs I found, but when I came back to this, it synthesised and PARed fine.  I have no idea why, but it seems like maybe the tools are doing something different now (the only reason I can think that that section isn't having any issues).....  I didn't even make any changes to code, I just uncommented the block and ran through the tools again.

 

Here is the code I used:

//SAMPLE_SIZE_IN = 12
//DELAY_BUS_WIDTH = 16
//inSig and delayedSig are 12 bits each
//inOverflow and delayedInOverflow are 1 bit each

   (* RAM_STYLE="BLOCK" *)
   reg [SAMPLE_SIZE_IN:0] RamReg [(2**(DELAY_BUS_WIDTH/2))-1:0];
   reg [SAMPLE_SIZE_IN:0] data_out;   

   assign {delayedInOverflow, delayedSig} = data_out;

   always @(posedge clock) 
   begin
      RamReg[add_in] <= {inOverflow,inSig};
      data_out <= RamReg[add_out];
   end

 And here is what the synthesis report is saying today (as opposed to what it was complaining about on FRI):

Specific Feature Utilization:
 Number of Block RAM/FIFO:               70  out of    156    44%  
    Number using Block RAM only:         70
 Number of BUFG/BUFGCTRLs:               16  out of     32    50%  
 Number of DSP48E1s:                    143  out of    288    49%  

 

0 Kudos