cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
405 Views
Registered: ‎06-03-2019

Configuration Time for the Kintex ultrascale and Accessing the Configuration Flash

Jump to solution

Hi,

I am currently working on a new board design that will be using a kintex ultrascale (the xcku040-ffva1156 or the xcku060) and I have a few questions related to how long it takes to initialize and configure the FPGA as well as accessing the flash afterwards.

First, once all the power supplies are brought up, how long does it take for the FPGA to be fully configured? On page 19 of ug570, it reads: “The configuration time includes the initialization time plus the configuration time.” Further down the page it then gives equation 1-1 which states “ConfigurationTime = (BitstreamSize)/(ConfigurationRate*DatabusWidth)”. Is this the configuration time that includes the initialization time or the configuration time that excludes the initialization time, but is also called the configuration time? Using equation 1-1, I calculated that configuration should take about 1.4 seconds for the xcku040 and double that for the xcku060 with our current plan (asynchronous parallel flash with a bus width of 16); basically, I want to make sure that the FPGA will be operational in less than 5 seconds, so I want to know if there are any other factors I need to take into account. Using calc_config_time for the xcku040, it seems like the typical time without por_ramp is about 1405.6ms with the max being 2158.43ms. Setting por_ramp to the maximum of 50ms roughly adds 100ms. Going off of ug570 page 20, is it safe to assume that from power on, the maximum delay until the FPGA is usable will be roughly 2257.93ms (with por_ramp set to 50ms)?

 

Second, this is more just to confirm what I found; is SPI x4 configuration now faster than BPI x16 configuration? I went through the list of supported BPI flash for the kintex ultrascale found in ug908, and all of the flash that support synchronous read are now obsolete/reached their end of life. According to xapp1220 on page 26, 6 MHz is the typical frequency for asynchronous flash. Using 128 Mb for the bitstream size, 6 MHz for the Configuration rate and 16 for the Databus Width gives the configuration time to be about 1.4 seconds as I stated above. If instead we use the recommended 33 MHz for SPI flash found on page 17 of xapp1233 and bus width of x4, the configuration time comes out to be about 1.02 seconds. Using calc_config_time for the SPI connection, the worst time seems to be about 1671.31 ms (with por_ramp set to 50ms) which is still less than the worst time for the parallel flash connection of 2257.93 ms, and the flash could support a higher frequency than 33 MHz making the configuration even faster. I could not find any new synchronous flash being sold, so it seems like if I want the fastest configuration, I should now go with an SPI flash. Is there any other synchronous flash supported that is not obsolete or any other flash with a bus width of 16 that is supported that can safely handle higher frequencies to make it faster the SPI flash?

 

Lastly, we want to also use the configuration flash to store other data used by the system after configuration (looking at getting a flash that is at least 1Gb), and it sounds like the recommended method to do so is by using the STARTUPE3 primitive, based off of this forum post. I had a similar idea to this user of connecting the 5 signals connected to bank 0 during configuration to other IO pins (in addition to the pins in bank 0) to be used after configuration, but the answer in the post basically says, we have not tested this so it may or may not work; to make sure it works use the Startup primitive. This will be my first time using the startup primitive, normally we have 2 flashes on board, but we are trying to shrink the design, so I want to make sure I know how to use it. Because it isn’t an IP, I assume based off of ug974 that after including the lines

 

Library UNISIM;

use UNISIM.vcomponents.all;

 

I can then use it similarly to MMCMs and BUFGs without needing to declare a component first? (ie I do not need to say component STARTUPE3 primitive port ( …); end component;). I then will need to include the necessary registers in the design to synchronize it to the design clock based off of ug570 page 123. Because it does not support input or output delays, I will need to work through the referenced equations mentioned on the same page to make sure the timing works. Is there anything else I need to know/make sure I do when using the STARTUPE3 primitive to make sure I access the flash properly? Or can I safely connect the 4 data pins and the Chip select pin connected to bank 0 to another bank as well so I can skip using the startupe3 primitive to access the flash after configuration?

Currently, I am using Vivado 2020.1, but I will be using whatever the current version of Vivado is when it comes time to write the HDL.

 

Thanks for the help,

Steven

1 Solution

Accepted Solutions
Highlighted
280 Views
Registered: ‎01-22-2015

Steven,

I am still curious about the question of is SPI now faster than BPI because only asynchronous flash being supported and in active production, or is there a flash that I missed when looking through everything/a new flash the is supported that is not listed in ug908?

I am not familiar with using asynchronous flash nor do I know why they now seem to be obsolete.  However, it seems that you can easily achieve your goal of 5sec configuration time using SPIx1 or SPIx4 NOR flash.  So, maybe no need to further research asynchronous flash.

 

Also, any help with using startupe3 or routing the board to avoid using startupe3 all together is still appreciated.

I think you’ve been unnecessarily frightened about using the STARTUPE3 primitive.  If you are going to read/write the flash using a slow speed (~10MHz) clock, then STARTUPE3 should not pose problems.  See “Timing Considerations for Flash Connections” on page 123 of UG570 before attempting to use STARTUPE3 with a fast clock.

Here’s some thoughts on communication with the configuration flash via your HDL:

  1. First, look at the schematic for your configuration flash interface.  For our discussion, we’ll use SPIx1 which is described by Figure 2-2 in UG570(v1.13)).  Note all the FPGA pins, D00_MOSI, D01_DIN, FCS_B, and CCLK that are used to connect with the flash IC.
  2. Next, go to Table 1-9 in UG570 and find these FPGA pins.  You will find that all these pins are “Dedicated” (as opposed to Multi-Function) type.  So, we cannot access these pins directly from our HDL as you can with Multi-Function type pins.  However, we can access them by simply instantiating the STARTUPE3 primitive in our design (see page 642 of UG974(v2019.2)) and then talking to the STARTUPE3 as if it were the flash.
    STARTUPE3_UG570.jpg
  3. Xilinx has IP that can help you read/write flash (see Xilinx document PG153 and XAPP1282).  However, I have seen a few posts on this Forum from people struggling to use this IP when STARTUPE3 is involved.   So, search the Forum for these posts before committing to using this IP.
  4. Also, you can write your own HDL to read/write flash.   The following posts may help you to do this.

https://forums.xilinx.com/t5/FPGA-Configuration/Qspi-flash-memory/m-p/1059383#M15432

https://forums.xilinx.com/t5/FPGA-Configuration/Qspi-Flash-memory/m-p/1061208#M15558

View solution in original post

0 Kudos
5 Replies
Highlighted
337 Views
Registered: ‎01-22-2015

@sdombrowski9101 

Lots of good analysis and questions!   Let’s first make sure you are using and understanding the calc_config_time tool correctly.

The general sequence of events for Master SPI Configuration Mode of UltraScale devices is:

  1. Power ON
  2. Power Supply Ramp Time:  If you have 5 supplies (VCCINT, VCCINT_IO/VCCBRAM, VCCAUX/VCCAUX_IO, VCCO) ramping up sequentially at 40ms ramp-time for each then total ramp-up time is (5*40 = 200ms)
  3. Power On Reset (POR):  <75ms (from table 89 of DS892(v1.18))
  4. Clock Bitstream into FPGA:  Time = (BitstreamSize)/(ConfigurationRate*DatabusWidth)


Typically, the time from 4. dominates the total budget for configuration time.  Therefore, an especially important parameter is ConfigurationRate which is the frequency of the configuration clock, (typically CCLK but could be EMCCLK).  After opening the implemented design for your Vivado project, you can find the available values for ConfigurationRate by typing the following command in the Tcl Console.

list_property_value BITSTREAM.CONFIG.CONFIGRATE [current_design]


Then, use the calculation shown on pages 56-57 of UG570(v1.13) to find the allowed maximum value of ConfigRate=ConfigurationRate for your FPGA and flash device.  Parameter values needed for this calculation come from the datasheet for your FPGA and from the datasheet for your flash device.

Once you have the allowed maximum value, XXX.X, of ConfigRate, make sure it is one of the available values and then tell Vivado its value by typing the following command in the Tcl Console:

set_property BITSTREAM.CONFIG.CONFIGRATE XXX.X [get_designs impl_1]


You should also tell Vivado about the type, (YYYYY= SPIx1, SPIx4), of SPI interface by typing the following command in the Tcl Console:

set_property config_mode YYYYY [current_design]


Finally, as you have been doing, use the Tcl tool, calc_config_time, to calculate the time it takes to clock the bitstream into the FPGA.  The tool gives max and min times which correspond to the max and min values for ConfigRate that arise from FMCCKTOL (typ +/- 35% for UltraScale, see Table 89 in DS892(v1.18)).

Does all that make sense?

Highlighted
304 Views
Registered: ‎06-03-2019

markg@prosensing.com 

 

That all makes sense. I try to do as much research before asking a question/provide as much detail when asking.

 

As of now, I am only concerned with number 4; I figured 4 would account for most of the delay (I’ve been looking at a power supply sequencer for bringing up the power supplies). I’ve been using the Edit Device properties Dialog to make the necessary changes for calc_config_time after quickly realizing that although clk_freq is listed as a parameter, setting it through calc_config_time does nothing. Just to make sure there isn’t a difference between using the Dialog and setting them through Tcl commands, I used set_property to set the configRate to 33 and the config_mode to SPIx4:

 

set_property BITSTREAM.CONFIG.CONFIGRATE 33 [current_design]

set_property config_mode SPIx4 [current_design]

calc_config_time -max -por_ramp 50 -bitstream_size 134217728

1671.31

 

Which is the same answer I got by going through the Dialog (I prefer graphical interfaces when I can use them). I also set the bitstream size to 128Mb because that was listed as the Minimum configuration Flash memory size in ug570 page 21 for the ku040 (likely the FPGA we will go with for the board with a ku060 board in house for development/debugging purposes). Forgoing the bitstream_size parameter results in a time of 1599.49, not significantly less than just estimating the time with the full 128Mb.

 

Currently, I do not have an SPI flash in mind, since we would prefer using parallel flash, but looking at the list of supported flash, I went with the mt25ql01g to get the following data. Ds892 gives T_SPIDCC (setup/hold) as 3.0/0 ns, Min for D[03:00] and +/-35% for F_MCCKTOL. The data sheet for mt25ql01g gives the clock low to output valid time 6ns for under 30pF and 5ns for 10pF. Rewriting equation 2-1 in ug570 as ConfigRate <= 1/((T_SPIDCC + T_SPITCO)*(1 + F_MCCKTOL)) and substituting in the values (T_SPIDCC = 3ns, T_SPITCO = 6ns, F_MCCKTOL = 35%) give the maximum configRate of 82.30 MHz, which is above the hypothetical 33 MHz I’ve been using, so in theory it could take even less time to configure. Using list_property_value for the configRate I get:

 

list_property_value BITSTREAM.CONFIG.CONFIGRATE [current_design]

3 6 9 12 22 33 40 50 57 69 82 87 90 110

 

So 33 MHz seems like a safe frequency to use for comparison to the asynchronous flash (for the FPGA I currently have xcku040-ffva1156-2-i).

 

A little background on the question is that the current board set we are using takes at least 5 seconds to be ready to use, but I think it is probably over 10 seconds my coworker thinks 20 seconds (I do not have a board on me that I can power up and time, but it takes a while and we are tired of the long delay). The previous engineer often did things that were good enough in his opinion, such as bringing up all the power supplies for the FPGA at the same time or running a USB 3.0 connection at 2.0 speeds. The flash on the current board seems to be capable of SPIx4, but due to the time it takes to configure, I am assuming it is only running as SPIx1 or at a low configuration rate (the HDL for the project is an ISE 14.7 project, and I am not really interested in digging through the project to find all the settings used). Another board set that does use Vivado for the HDL has an estimated config time of 6697.57ms as the max time and it comes up faster than the board we are replacing. Anyway, when reading through all the different datasheets for the kintex ultrascale, designing this new board, and getting the previous estimate of 1.02 seconds for the configuration time with the values I mentioned in my previous post (SPIx4 at 33 MHz), we wanted to know how this estimate compared to that of the previous board and the actual time for initializing and configuring excluding bringing up the power supplies. We then decided to shrink the design by replacing the configuration flash and the user data flash with just one flash and use a parallel flash with a bus width of 16 bits to store both the configuration and the user data and to hopefully speed up configuration. That is when I went through the list of supported flash and found that all the synchronous flash are now obsolete; hence the question, is SPI configuration now faster than BPI configuration because all asynchronous flash have read cycle times/chip enable to output delays around 100 ns which severely limits the maximum frequency (as I said the typical frequency stated is xapp1220 is 6 MHz). I haven’t calculated what frequency we would be able to reach with our current design, but it most likely will be 3 MHz or 6 MHz (currently thinking of using the s29glxxx family) which would make the configuration time greater than the configuration time for the SPI flash. We would prefer using a parallel flash just because of the interface being more straight forward to use for accessing and programming up the user data (yes, I know there is the SPI IP and the EMC IP, I haven’t looked fully into those yet to see if they offer everything we want). While posting this question, is when I discovered calc_config_time (glanced over it when reading the datasheet for the first time), so I wanted to use that assuming that will be one of the first things I was pointed to. Because we are going to have only one flash, I also want to be familiar with how to access it. (do we need to use startupe3 like the post I linked to suggests, or can I connect the five signals that go to bank 0 to another bank as well so that we do not need to use startupe3?)

 

It sounds like it is safe to assume that regardless of which method we use, it will configure faster than the current board, unless there is another delay I need to take into account not previously mentioned that is greater than a second.

 

I am still curious about the question of is SPI now faster than BPI because only asynchronous flash being supported and in active production, or is there a flash that I missed when looking through everything/a new flash the is supported that is not listed in ug908? My coworker and I are a little baffled that the synchronous flash is obsolete.

 

Also, any help with using startupe3 or routing the board to avoid using startupe3 all together is still appreciated. I don’t want to get the board assuming I know how to communicate with the flash only to take a significant amount of time figuring out how to talk to it (though I am sure there will still be some amount of time doing that). There are only a few pages on using startupe3.

 

Thanks,

Steven

0 Kudos
Highlighted
281 Views
Registered: ‎01-22-2015

Steven,

I am still curious about the question of is SPI now faster than BPI because only asynchronous flash being supported and in active production, or is there a flash that I missed when looking through everything/a new flash the is supported that is not listed in ug908?

I am not familiar with using asynchronous flash nor do I know why they now seem to be obsolete.  However, it seems that you can easily achieve your goal of 5sec configuration time using SPIx1 or SPIx4 NOR flash.  So, maybe no need to further research asynchronous flash.

 

Also, any help with using startupe3 or routing the board to avoid using startupe3 all together is still appreciated.

I think you’ve been unnecessarily frightened about using the STARTUPE3 primitive.  If you are going to read/write the flash using a slow speed (~10MHz) clock, then STARTUPE3 should not pose problems.  See “Timing Considerations for Flash Connections” on page 123 of UG570 before attempting to use STARTUPE3 with a fast clock.

Here’s some thoughts on communication with the configuration flash via your HDL:

  1. First, look at the schematic for your configuration flash interface.  For our discussion, we’ll use SPIx1 which is described by Figure 2-2 in UG570(v1.13)).  Note all the FPGA pins, D00_MOSI, D01_DIN, FCS_B, and CCLK that are used to connect with the flash IC.
  2. Next, go to Table 1-9 in UG570 and find these FPGA pins.  You will find that all these pins are “Dedicated” (as opposed to Multi-Function) type.  So, we cannot access these pins directly from our HDL as you can with Multi-Function type pins.  However, we can access them by simply instantiating the STARTUPE3 primitive in our design (see page 642 of UG974(v2019.2)) and then talking to the STARTUPE3 as if it were the flash.
    STARTUPE3_UG570.jpg
  3. Xilinx has IP that can help you read/write flash (see Xilinx document PG153 and XAPP1282).  However, I have seen a few posts on this Forum from people struggling to use this IP when STARTUPE3 is involved.   So, search the Forum for these posts before committing to using this IP.
  4. Also, you can write your own HDL to read/write flash.   The following posts may help you to do this.

https://forums.xilinx.com/t5/FPGA-Configuration/Qspi-flash-memory/m-p/1059383#M15432

https://forums.xilinx.com/t5/FPGA-Configuration/Qspi-Flash-memory/m-p/1061208#M15558

View solution in original post

0 Kudos
Highlighted
254 Views
Registered: ‎06-03-2019

Thanks for the aid.

I somewhat answered my own question about the configuration speed when I discovered calc_config_time, but still wanted to post on the off chance someone knew about a flash that I somehow missed that would be faster than the SPI option.

I was also hoping to get a Xilinx employee to respond or to see this; I know they would just say the only supported flash are those listed in ug908. A revision note in ug908 and/or in any of the configuration documents marking obsolete flash would be appreciated. It is a little annoying when in xapp1233 page 2, it mentions that if you want shorter configuration times, consider using master BPI (x16), so you look in to it only to discover that all of the parallel flash that would configure faster than master SPI are obsolete, but that is the nature of technology, things go obsolete. I would have thought the slow asynchronous flash would have gone obsolete instead of the faster synchronous flash, but that is out of Xilinx’s control. Hopefully, the next generation of FPGAs will support some of the newer flash technologies available.

The startupe3 primitive did frighten me to some extent, especially when I needed to worry about the flash signals being spread across bank 0 and bank 65, but my coworker agreed that it makes more sense to use the SPI flash (smaller package, fewer pins, strangely faster than parallel), so now all the signals for the flash are contained within bank 0, so they will all need to go through the startupe3 primitive. I am more familiar with writing HDL, than board design (this being my first), so trying to take into account all of the board delays and how components will be accessed is new to me. I am much more accustomed to the ability to debug the HDL and work with it until it is working properly than I am creating a physical board that if something is wrong, then it is a whole new set of boards that need to be created (assuming it isn’t the simple matter of replacing a component).

Thanks for the aid.

I somewhat answered my own question about the configuration speed when I discovered calc_config_time, but still wanted to post on the off chance someone knew about a flash that I somehow missed that would be faster than the SPI option.

I was also hoping to get a Xilinx employee to respond or to see this; I know they would just say the only supported flash are those listed in ug908. A revision note in ug908 and/or in any of the configuration documents marking obsolete flash would be appreciated. It is a little annoying when in xapp1233 page 2, it mentions that if you want shorter configuration times, consider using master BPI (x16), so you look in to it only to discover that all of the parallel flash that would configure faster than master SPI are obsolete, but that is the nature of technology, things go obsolete. I would have thought the slow asynchronous flash would have gone obsolete instead of the faster synchronous flash, but that is out of Xilinx’s control. Hopefully, the next generation of FPGAs will support some of the newer flash technologies available.

The startupe3 primitive did frighten me to some extent, especially when I needed to worry about the flash signals being spread across bank 0 and bank 65, but my coworker agreed that it makes more sense to use the SPI flash (smaller package, fewer pins, strangely faster than parallel), so now all the signals for the flash are contained within bank 0, so they will all need to go through the startupe3 primitive. I am more familiar with writing HDL, than board design (this being my first), so trying to take into account all of the board delays and how components will be accessed is new to me. I am much more accustomed to the ability to debug the HDL and work with it until it is working properly than I am creating a physical board that if something is wrong, then it is a whole new set of boards that need to be created (assuming it isn’t the simple matter of replacing a component).

-Steven

0 Kudos
Highlighted
207 Views
Registered: ‎01-22-2015

Steven,

I somewhat answered my own question about the configuration speed when I discovered calc_config_time

The calc_config_time tool and the max allowed speed for CCLK that you calculated are for configuration of the FPGA from SPI flash.  After the FPGA is configured, you can talk to the SPI flash via the STARTUPE3 primitive.  However, you will then need to use a clock with a much slower speed than the configuration CCLK.  Xilinx documentation is not clear on how fast you can clock SPI flash through STARTUPE3.  However, using trial and error, you should be able to find a speed that works.  FYI - in a 7-Series FPGA we talk to SPI flash through STARTUPE2 using a 15MHz clock.

Mark

0 Kudos