cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
thutt
Newbie
Newbie
17,668 Views
Registered: ‎06-07-2009

Design fails to meet timing constraints -- need to be pointed in right direction

I've got a problem, and I'm stuck.  I'm hoping someone who reads these
postings will be able to point me in the right direction so I can get
back on track.  Here's a little about my project:

   -- Spartan 3E
   -- Using WebPack 9.1i
   -- Using 1 DCM, 50 MHz in, 50, 100 & 200 MHz out.
   -- Using a block RAM configured as 32x512
   -- Using the serial port clocked at 50 MHz from DCM

I'm trying to get a simple DDR controller working.  The DDR device's
commands are clocked at 100MHz, and the DDR part of the design (the
burst accesses) are performed using the 200MHz clock.

On the rising edge of the 200 MHz clock, I write many of the signals
to the block RAM; it's sort of a little logic analyzer.

After a single write & read, I transmit the data contained in the RAM
across the serial port at 230Kbps to a custom program running on my
laptop.  This custom program decodes the data and dumps the values of
the signals for each clock cycle.

The problem is that this design doesn't meet timing constraints, and
end up being assigned a 'score' over 10,000 from 'par'.

I've used 'timingan' (Timing Analyzer) and 'trce' (Trace), and do know
what signals are not meeting the timing requirements, but I don't know
how to approach this problem.

I know I need to input some kind of constraint, but I don't know what
kind, nor where to start.  The Xilinx documentation doesn't have an
overview or tutorial about constraints (in general) or timing
constraints (in particular), and the 'samples' they have are not
concrete.

I can easily provide more details if desired.

Can someone point me in the right direction?

Do I need timing constraints or area constraints, or something else?
If I need timing constraints over area constraints, how is this
determined?  How do I determine *which* of the multitude of
constraints to use?
Tags (4)
0 Kudos
11 Replies
jprovidenza
Voyager
Voyager
17,657 Views
Registered: ‎08-30-2007

If you know which signals have timing violations, you need to decide why they

having timing problems.  It could be your design simply has too many levels of

logic and it is hopeless to meet timining.  It could be your constraints are wrong.

 

 

Why don't you post some details from the timing report?

 

John Providenza

0 Kudos
reedbement
Newbie
Newbie
17,642 Views
Registered: ‎06-08-2009

I also found the constraints documentation a bit thin. You might take a look at this tutorial to get started with the basics: Basic Timing Constraints Tutorial 

 

Full documentation: Timing Constraints User Guide

 

-Reed

0 Kudos
thutt
Newbie
Newbie
17,628 Views
Registered: ‎06-07-2009

> If you know which signals have timing violations, you need to decide why they

> having timing problems.  It could be your design simply has too many levels of

> logic and it is hopeless to meet timining.

 

How does one know if there are too many levels, or if it's hopeless?  

It's not clear to me how this is accomplished.

 

 

>  It could be your constraints are wrong.

 

Could be.  Probably likely.  The only constraints I have are to

physically constraint the I/O pins to particular locations, and to specify

the period of the 50 MHz system clock.

 

The Xilinx tools take care of making period constraints for the clocks 

which are output from the DCM.

 

 

> Why don't you post some details from the timing report?

 

Ok, I can do that.  Hope this isn't too long, and I hope the forum software doesn't mangle the formatting. 

 

 

 From the log file of 'ngdbuild':

    Checking timing specifications ...
       CLK0: TS_S3E_DCM_fast_clocks_clk_100_200_CLK0_BUF=PERIOD
    S3E_DCM_fast_clocks_clk_100_200_CLK0_BUF TS_CLK_50MHZ*1 HIGH 50%
       CLK2X: TS_S3E_DCM_fast_clocks_clk_100_200_CLK2X_BUF=PERIOD
    S3E_DCM_fast_clocks_clk_100_200_CLK2X_BUF TS_CLK_50MHZ/2 HIGH 50%
       CLK2X180: TS_S3E_DCM_fast_clocks_clk_100_200_CLK2X180_BUF=PERIOD
    S3E_DCM_fast_clocks_clk_100_200_CLK2X180_BUF TS_CLK_50MHZ/2 PHASE + 5 nS HIGH
    50%
       CLKFX: TS_S3E_DCM_fast_clocks_clk_100_200_CLKFX_BUF=PERIOD
    S3E_DCM_fast_clocks_clk_100_200_CLKFX_BUF TS_CLK_50MHZ/4 HIGH 50%
    WARNING:XdmHelpers:662 - Period specification
       "TS_S3E_DCM_fast_clocks_clk_100_200_CLK2X_BUF" references the TNM group
       "S3E_DCM_fast_clocks_clk_100_200_CLK2X_BUF", which contains both pads and
       synchronous elements. The timing analyzer will ignore the pads for this
       specification. You might want to use a qualifier (e.g. "FFS") on the TNM
       property to remove the pads from this group.
    WARNING:XdmHelpers:662 - Period specification
       "TS_S3E_DCM_fast_clocks_clk_100_200_CLK2X180_BUF" references the TNM group
       "S3E_DCM_fast_clocks_clk_100_200_CLK2X180_BUF", which contains both pads and
       synchronous elements. The timing analyzer will ignore the pads for this
       specification. You might want to use a qualifier (e.g. "FFS") on the TNM
       property to remove the pads from this group.
    Checking expanded design ...

Below is from the 'twr' file; the only constraint I created is for the
'clk_50mhz' signal (the second to last one in this list).  All others
created by the tools:

   Number of Timing Constraints that were not applied: 2

   Asterisk (*) preceding a constraint indicates it was not met.
      This may be due to a setup or hold violation.

   ------------------------------------------------------------------------------------------------------
     Constraint                                |  Check  | Worst Case |  Best Case | Timing |   Timing
                                               |         |    Slack   | Achievable | Errors |    Score
   ------------------------------------------------------------------------------------------------------
   * TS_S3E_DCM_fast_clocks_clk_100_200_CLKFX_ | SETUP   |    -0.996ns|     5.996ns|      37|       14427
     BUF = PERIOD TIMEGRP         "S3E_DCM_fas | HOLD    |     0.599ns|            |       0|           0
     t_clocks_clk_100_200_CLKFX_BUF" TS_CLK_50 |         |            |            |        |
     MHZ / 4 HIGH 50%                          |         |            |            |        |
   ------------------------------------------------------------------------------------------------------
     TS_S3E_DCM_fast_clocks_clk_100_200_CLK0_B | SETUP   |     0.499ns|    18.004ns|       0|           0
     UF = PERIOD TIMEGRP         "S3E_DCM_fast | HOLD    |     0.801ns|            |       0|           0
     _clocks_clk_100_200_CLK0_BUF" TS_CLK_50MH |         |            |            |        |
     Z HIGH 50%                                |         |            |            |        |
   ------------------------------------------------------------------------------------------------------
     TS_S3E_DCM_fast_clocks_clk_100_200_CLK2X_ | SETUP   |     3.728ns|     6.272ns|       0|           0
     BUF = PERIOD TIMEGRP         "S3E_DCM_fas | HOLD    |     1.071ns|            |       0|           0
     t_clocks_clk_100_200_CLK2X_BUF" TS_CLK_50 |         |            |            |        |
     MHZ / 2 HIGH 50%                          |         |            |            |        |
   ------------------------------------------------------------------------------------------------------
     TS_CLK_50MHZ = PERIOD TIMEGRP "clk_50mhz" | N/A     |         N/A|         N/A|     N/A|         N/A
      20 ns HIGH 40%                           |         |            |            |        |
   ------------------------------------------------------------------------------------------------------
     TS_S3E_DCM_fast_clocks_clk_100_200_CLK2X1 | N/A     |         N/A|         N/A|     N/A|         N/A
     80_BUF = PERIOD TIMEGRP         "S3E_DCM_ |         |            |            |        |
     fast_clocks_clk_100_200_CLK2X180_BUF" TS_ |         |            |            |        |
     CLK_50MHZ / 2 PHASE         5 ns HIGH 50% |         |            |            |        |
   ------------------------------------------------------------------------------------------------------

From the 'drc' output:

    WARNING:PhysDesignRules:367 - The signal <rs232_dce_rxd_IBUF> is incomplete. The
       signal does not drive any load pins in the design.
    INFO:PhysDesignRules:772 - To achieve optimal frequency synthesis performance
       with the CLKFX and CLKFX180 outputs of the DCM comp
       S3E_DCM_fast_clocks.clk_100_200/DCM_SP_INST/S3E_DCM_fast_clocks.clk_100_200/D
       CM_SP_INST, consult the device Interactive Data Sheet.
    DRC detected 0 errors and 1 warnings.

The first warning is expected.  I have no idea what the second message
means, and apparantly neither does Xilinx, because it's not
documented.  If you know how to make it go away, please tell me.

Finally, below is the output from the Timing Analyzer Tool
(timingan).  The 'ddr/initializer' path is theoretically connected
to the block ram (logic_analyzer/Mram_data.A), but the path is not
active.  In other words, no data is stored into the block ram while
the DDR initialization is occurring; this path needs to only run at
100 MHz, and it needs to be providing output to the 'sd_cas' (and
related signals) at 100 MHz.

    Timing Improvement Wizard
    Data Path: ddr/initializer/index_2 to logic_analyzer/Mram_data.A
      Delay type         Delay(ns)  Logical Resource(s)
      ----------------------------  -------------------
      Tcko                  0.514   ddr/initializer/index_2
      net (fanout=23)    e  0.444   ddr/initializer/index<2>
      Tilo                  0.660   ddr/initializer/watchdog_rp/plus_one_watchdog_timer.plus_one_watchdog_timer/Mcount_cycles_xor<1>121
      net (fanout=4)     e  0.940   ddr/initializer/N9
      Tilo                  0.660   ddr/cmd_1_mux000014
      net (fanout=1)     e  0.381   ddr/cmd_1_mux0000_map6
      Tilo                  0.660   ddr/cmd_1_mux000040_SW0
      net (fanout=1)     e  0.381   N1432
      Tilo                  0.660   ddr/cmd_1_mux000040
      net (fanout=2)     e  0.389   sd_cas_OBUF
      Tbdck                 0.227   logic_analyzer/Mram_data.A
      ----------------------------  ---------------------------
      Total                 5.916ns (3.381ns logic, 2.535ns route)
                                    (57.2% logic, 42.8% route)

  --------------------------------------------------------------------------------
  Slack:                  -0.879ns (requirement - (data path - clock path skew + uncertainty))
    Source:               ddr/initializer/index_4 (FF)
    Destination:          logic_analyzer/Mram_data.A (RAM)
    Requirement:          5.000ns
    Data Path Delay:      5.879ns (Levels of Logic = 4)
    Clock Path Skew:      0.000ns
    Source Clock:         sd_ck_p_OBUF rising at 0.000ns
    Destination Clock:    clk_burst rising at 5.000ns
    Clock Uncertainty:    0.000ns
    Timing Improvement Wizard
    Data Path: ddr/initializer/index_4 to logic_analyzer/Mram_data.A
      Delay type         Delay(ns)  Logical Resource(s)
      ----------------------------  -------------------
      Tcko                  0.514   ddr/initializer/index_4
      net (fanout=18)    e  0.407   ddr/initializer/index<4>
      Tilo                  0.660   ddr/initializer/watchdog_rp/plus_one_watchdog_timer.plus_one_watchdog_timer/Mcount_cycles_xor<1>121
      net (fanout=4)     e  0.940   ddr/initializer/N9
      Tilo                  0.660   ddr/cmd_1_mux000014
      net (fanout=1)     e  0.381   ddr/cmd_1_mux0000_map6
      Tilo                  0.660   ddr/cmd_1_mux000040_SW0
      net (fanout=1)     e  0.381   N1432
      Tilo                  0.660   ddr/cmd_1_mux000040
      net (fanout=2)     e  0.389   sd_cas_OBUF
      Tbdck                 0.227   logic_analyzer/Mram_data.A
      ----------------------------  ---------------------------
      Total                 5.879ns (3.381ns logic, 2.498ns route)
                                    (57.5% logic, 42.5% route)

  --------------------------------------------------------------------------------
  Slack:                  -0.870ns (requirement - (data path - clock path skew + uncertainty))
    Source:               ddr/initializer/index_3 (FF)
    Destination:          logic_analyzer/Mram_data.A (RAM)
    Requirement:          5.000ns
    Data Path Delay:      5.870ns (Levels of Logic = 4)
    Clock Path Skew:      0.000ns
    Source Clock:         sd_ck_p_OBUF rising at 0.000ns
    Destination Clock:    clk_burst rising at 5.000ns
    Clock Uncertainty:    0.000ns
    Timing Improvement Wizard
    Data Path: ddr/initializer/index_3 to logic_analyzer/Mram_data.A
      Delay type         Delay(ns)  Logical Resource(s)
      ----------------------------  -------------------
      Tcko                  0.567   ddr/initializer/index_3
      net (fanout=20)    e  0.345   ddr/initializer/index<3>
      Tilo                  0.660   ddr/initializer/watchdog_rp/plus_one_watchdog_timer.plus_one_watchdog_timer/Mcount_cycles_xor<1>121
      net (fanout=4)     e  0.940   ddr/initializer/N9
      Tilo                  0.660   ddr/cmd_1_mux000014
      net (fanout=1)     e  0.381   ddr/cmd_1_mux0000_map6
      Tilo                  0.660   ddr/cmd_1_mux000040_SW0
      net (fanout=1)     e  0.381   N1432
      Tilo                  0.660   ddr/cmd_1_mux000040
      net (fanout=2)     e  0.389   sd_cas_OBUF
      Tbdck                 0.227   logic_analyzer/Mram_data.A
      ----------------------------  ---------------------------
      Total                 5.870ns (3.434ns logic, 2.436ns route)
                                    (58.5% logic, 41.5% route)

  --------------------------------------------------------------------------------

  1 constraint not met.


The initialization is a 21-step processes for which the steps are
controlled via 'index' (which is implemented as an integer counter by
XST).  As far as I can tell, the parts of 'index' are not propogating
fast enough to meet the '5.000ns' (200 MHz, the speed of the block
ram) requirement.  Can I ignore this path?

Another issue that I've seen from the output of XST is this:

   Timing Summary:
   ---------------
   Speed Grade: -5

      Minimum period: 5.067ns (Maximum Frequency: 197.367MHz)
      Minimum input arrival time before clock: 2.945ns
      Maximum output required time after clock: 9.149ns
      Maximum combinational path delay: 6.065ns

   <snip>

   =========================================================================
   Timing constraint: Default OFFSET OUT AFTER for Clock 'S3E_DCM_fast_clocks.clk_100_200/CLK2X_BUF'
     Total number of paths / destination ports: 139 / 28
   -------------------------------------------------------------------------
   Offset:              9.149ns (Levels of Logic = 5)
     Source:            ddr/initializer/index_1 (FF)
     Destination:       sd_we (PAD)
     Source Clock:      S3E_DCM_fast_clocks.clk_100_200/CLK2X_BUF rising

     Data Path: ddr/initializer/index_1 to sd_we
                                   Gate     Net
       Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
       ----------------------------------------  ------------
        FDE:C->Q             23   0.514   1.091  ddr/initializer/index_1 (ddr/initializer/index_1)
        LUT3_D:I1->O          4   0.612   0.651  ddr/initializer/index_and0000231 (ddr/initializer/N14)
        LUT4:I0->O            1   0.612   0.509  ddr/cmd_0_mux00009 (ddr/cmd_0_mux0000_map5)
        LUT4:I0->O            1   0.612   0.387  ddr/cmd_0_mux000018 (ddr/cmd_0_mux0000_map7)
        LUT3:I2->O            2   0.612   0.380  ddr/cmd_0_mux000024 (sd_we_OBUF)
        OBUF:I->O                 3.169          sd_we_OBUF (sd_we)
       ----------------------------------------
       Total                      9.149ns (6.131ns logic, 3.018ns route)
                                          (67.0% logic, 33.0% route)

This tells me that the 'sd_we' output signal is arriving at 9.149 ns
(109.301 Mhz).  Is that a correct assessment?

This path needs to run at 100 MHz to meet DDR SDRAM requirements.

How can I tell if this 109 MHz is fast enough -- all other things
considered.  Is it fast enough?

As a bit of background, please consider that I'm an uneducated person.
I'm a software person by trade (compilers, linkers and other low level
things like virtualization engines -- nearly 30 years of programming
avocationally and vocationally).  But, hardware is relatively new to
me; I am trying to learn something about how to make hardware, so I
may use the wrong terminology and ask what seem like stupid question.

If you point me in the right direction, I'm more than happy to go off
and experiment to find the answer; I'm not asking for a direct
solution to my problem -- just a guiding hand from people who've
experienced this.

thutt

0 Kudos
thutt
Newbie
Newbie
17,629 Views
Registered: ‎06-07-2009

Thanks to the constraints tutorial on scribd.  I've briefly looked at that before, but was unable to get it to print correctly.

If you actually download the pdf, it will print correctly from a pdf viewer.  (I wonder how much spam I'm going to get

for signing up to that.... in fact, it doesn't need to re-sign in after you sign up -- so feel free to make up an email address

if you want to just download something!)

 

If it doesn't help, I'll most likely be back with more questions!  

0 Kudos
jprovidenza
Voyager
Voyager
17,608 Views
Registered: ‎08-30-2007

Thutt -

 

Here are a couple of comments.

 

a) If you're doing a DDR controller, you typically need more timing constraints to guarantee that

the signal timing to the IO pins is correct.  You need to look at the setup/hold times required

by the DDR parts to create your OFFSET OUT constraints and look at the DDR clk->Q delays

to set your OFFSET IN parameters.

 

b) I noticed that the signal SD_WE goes through some logic, then goes to the IO pad.  This hurts

its IO timing.  If you drive the SD_WE directly from a flip-flop, it can be put in an IOB and you'll

get much better IO timing.  You can do the same trick on inputs to get much better input timing.

 

c) side-comment.. how are you creating the clock to the DDR ram?  I assume from in the FPGA.

If so, see comment (b).  You want to make sure you're using the DDR flops in the IOB to forward

the clock from FPGA internal to IO pad.  You can find app-notes on why this is the best/cleanest

way to forward a clock from FPGA internal to outside world.

 

d) For your timing error, try adding a timing constraint to make the P&R tool work harder on that

path. Try something like TIMESPEC = from FFS(ddr/initializer/index*) 4.9 ns;  to see if it makes a

difference.   This *should* speed up the path from ddr/initializer/index* for all signals that they

go to.  If you truly *know* that you BRAM only needs to run at 100 MHz, you could use a constraint

similar to  TIMESPEC = from FFS(ddr/initializer/index*) to RAMS(logic_analyzer*) 9 ns;

 

e) How many levels of logic are OK?  Depends (of course) on clock speed and part speed.  If you

look at the timing report, you can see that each level of logic is about 1 nsec.  So, 4 levels of logic

and a driver makes it harder to achieve timing.  Possible, but harder.  7 would probably be fatal.

 

If you are rolling your own DDR controler, you should look at OpenCores.org to see how theirs

works.   You may get some good ideas from there.  I've never used their controller, butyou may get

some ideas on tricks for clock forwarding, what needs constrained, etc.

 

Hope this helps!

 

John Providenza

0 Kudos
thutt
Newbie
Newbie
17,579 Views
Registered: ‎06-07-2009

> Here are a couple of comments.




> a) If you're doing a DDR controller, you typically need more timing
> constraints to guarantee that the signal timing to the IO pins is
> correct.
>
> You need to look at the setup/hold times required by the DDR parts
> to create your OFFSET OUT constraints and look at the DDR clk->Q
> delays to set your OFFSET IN parameters.

Yes, I have been aware of this, but there are two reasons why I have
not done anything in this realm so far:

1. I didn't know *how* to specify the constraints because the
Xilinx documentation gives no concrete examples -- just syntax
examples.

2. As long as the rest of the design didn't meet timing, there
didn't seem much use in trying to figure out how to define
constraints so the input & output would be perfect.

> b) I noticed that the signal SD_WE goes through some logic, then goes
> to the IO pad. This hurts its IO timing.

> If you drive the SD_WE directly from a flip-flop, it can be put
> in an IOB and you'll get much better IO timing. You can do the
> same trick on inputs to get much better input timing.

Thank you! This suggestion hit the spot. I've moved the assignment
under control of a clock (to induce the creation of a FF), and my
score went from 15,000 to 20,000 down to 5,000.

The biggest complaint of the timing analyzer is that the code that
performs a burst write is too slow. The signal needs to be 2.5 ns,
but the best it can do is 2.7 ns.

I'm investigating that, and maybe I'll come up with another way to
write it altogether.

I'd like to experiment with making an array with Gray Code index
values, but Xlinx's implementation of enumerations in XST is defecient
enough that I don't believe it is possible.

> c) side-comment.. how are you creating the clock to the DDR ram? I
> assume from in the FPGA. If so, see comment (b). You want to make
> sure you're using the DDR flops in the IOB to forward the clock
> from FPGA internal to IO pad. You can find app-notes on why this
> is the best/cleanest way to forward a clock from FPGA internal to
> outside world.

The clock to the DDR SDRAM is created from one of the DCMs on the S3E
board. I take the 50 MHz board clock, and produce three clocks out:
CLK2X, CLK2X180, and CLKFX. CLKFX is 200MHz.

I will look for the mentioned app notes; is there anything special I
need to do to accomplish your suggestion. In an ideal world, the app
notes will guide the way... but I find Xilinx docs often raise more
questions than they answer.

> d) For your timing error, try adding a timing constraint to make the
> P&R tool work harder on that path. Try something like TIMESPEC =
> from FFS(ddr/initializer/index*) 4.9 ns; to see if it makes a

This hint was enought to get me on the right track for understanding a
small part of timing constraints. FWIW, the actual syntax I needed
was this:

TIMESPEC ts_write_burst_command =
FROM FFS(ddr/writer/write_burst_command/index*) TO FFS 2.4 ns;

This was what a I tried to use on the current hotspot of the timing.
Didn't work; the best the tools could do was 2.7ns.

> difference. This *should* speed up the path from
> ddr/initializer/index* for all signals that they go to. If you
> truly *know* that you BRAM only needs to run at 100 MHz, you could
> use a constraint similar to TIMESPEC = from
> FFS(ddr/initializer/index*) to RAMS(logic_analyzer*) 9 ns;

After I caused the assignment for the DDR pins to go through a FF,
these particular errors went away (as mentioned above). I'm going to
try my hand at figuring out how to address the current hot-spot of
timing issues. If I can't make headway, I'll be back.

> e) How many levels of logic are OK? Depends (of course) on clock
> speed and part speed. If you look at the timing report, you can
> see that each level of logic is about 1 nsec. So, 4 levels of
> logic and a driver makes it harder to achieve timing. Possible,
> but harder. 7 would probably be fatal.

Thanks! That was very informative. This confirms what I thought! I
originally started with a software notion of 'divide & conquer'. Take
small pieces and impelment them. Build up. This technique doesn't
work so well in hardware because the traces get too long. As a
compiler guy, I tend to look at it from the perspective that the tools
are not doing enough optimization yet. Maybe someday....

> If you are rolling your own DDR controler, you should look at
> OpenCores.org to see how theirs works. You may get some good ideas
> from there. I've never used their controller, butyou may get some
> ideas on tricks for clock forwarding, what needs constrained, etc.

I was originally using the 'ddr_ctrl.vhd' from the Plasma (mlite)
project. It's not written by a software guy, is my guess. No
abstractions at all in that code; everything is 'std_logic'. As a
software guy, I don't understand why hardware people don't take
advantage of things that make programming easier -- like 'boolean' or
other user-defined types. Oh well, it's really a different mindset
for creating hardware, as compared to creating software.

I have deciphered how most of it works, but he runs it at 70MHz (which
I think it out-of-spec), so it might not be the best design to
examine.

Are you aware of others?

Again, thanks for your help. I've got a few new ideas in the
toolchest and I'll work through some iterations with them to see what
they boil down to.

thutt
0 Kudos
jprovidenza
Voyager
Voyager
17,574 Views
Registered: ‎08-30-2007

Thutt -

 

Take a look at OpenCores.org  They have at least one DDR controller that you could look at

for sample ideas.  Doing a DDR controller as an initial project is very ambitious.  Lots of potential

timing problems & odd tricks that need to be done.

 

My question on how you're generating the DDR clock is "what type of IO cell are you using to drive

the pad that goes to the DDR ram clock pin".  You *must* use the DDR flops in the Xilinx IOB if

you want to properly forward it from inside the FPGA to the external world.

 

In general, for a design like this, a good rule of thumb is that every signal leaving the FPGA should

be driven by a flip-flop in the IOB.  You may want every incoming signal to  also use the flip-flop

in the IOB.  This makes IO timing much more consistent.

 

When you're trying to implement high speed logic, you tend to need to think about how the HDL being

written will appear in actual gates.  While lots of high-level abstractions are nice, you need to keep

a eye on performance as you design.  As you're discovering, meeting a 2.5nsec timing spec can

be very demanding. 

 

I completely agree that the XIlinx app-notes can sometimes raise more questions than they appear to answer,

but at least you start to learn what questions you need to ask.  There are lots of web resources to help

you find the answers.

 

John Providenza

0 Kudos
thutt
Newbie
Newbie
17,558 Views
Registered: ‎06-07-2009

> Take a look at OpenCores.org They have at least one DDR controller
> that you could look at for sample ideas.

Last time I looked, I didn't see anything that was specific for DDR.
But, it won't hurt to look again.

> Doing a DDR controller as an initial project is very ambitious.
> Lots of potential timing problems & odd tricks that need to be done.

Oh, this isn't my first design. Though it is the first with really
tight constraints to external hardware. My goal / project is to build
a single board computer on the S3E that is accessible to people who
don't know hardware -- or perhaps software -- but want to learn. It
will be a full blown computer with development tools (already done)
and access to the source. My view is that trying to do something with
computers you buy today (as compared to 20 - 25 years ago) is nigh
impossible. So, I wanted to put fun back into computing again. At
least for myself.

I've already got the computer part working. I'm using a RAM simulator
at this point; to implement RAM, I use the serial port hooked up to a
custom program on my Linux machine. While I always knew it was going
to be slow, once I finally had the keyboard hooked up and I could
interactively type, I *SAW* it was too slow.

If you're interested, I'm chronicling the project at
http://www.harp-project.com

It's not the best hardware code at this point, and it's an incomplete
project (heck, I need a DDR controller! And real video output. At
this time, for video, I have another program on the Linux computer
that displays video memory in a window on my desktop). However, as
more of the project is completed and the computer becomes more usable,
I'll go back and improve things now that I have more experience and
knowledge.

If you do take a look, I'm interested in feedback.

> My question on how you're generating the DDR clock is "what type of
> IO cell are you using to drive the pad that goes to the DDR ram
> clock pin". You *must* use the DDR flops in the Xilinx IOB if you
> want to properly forward it from inside the FPGA to the external
> world.

I'm doing nothing special. I'm using the constraints that Xilinx
provides to access the DDR pins in one of the Spartan 3E guides (the
one which discusses the devices on the S3E Starter Kit). Which is
like this:

# ==== DDR SDRAM (SD) ==== (I/O Bank 3, VCCO=2.5V)
NET "sd_a<0>" LOC = "T1" | IOSTANDARD = SSTL2_I ;

NET "sd_ba<0>" LOC = "K5" | IOSTANDARD = SSTL2_I ;
NET "sd_ba<1>" LOC = "K6" | IOSTANDARD = SSTL2_I ;
NET "sd_cas" LOC = "C2" | IOSTANDARD = SSTL2_I ;
NET "sd_ck_n" LOC = "J4" | IOSTANDARD = SSTL2_I ;
NET "sd_ck_p" LOC = "J5" | IOSTANDARD = SSTL2_I ;
NET "sd_cke" LOC = "K3" | IOSTANDARD = SSTL2_I ;
NET "sd_cs" LOC = "K4" | IOSTANDARD = SSTL2_I ;
NET "sd_dq<0>" LOC = "L2" | IOSTANDARD = SSTL2_I ;

NET "sd_ldm" LOC = "J2" | IOSTANDARD = SSTL2_I ;
NET "sd_ldqs" LOC = "L6" | IOSTANDARD = SSTL2_I ;
NET "sd_ras" LOC = "C1" | IOSTANDARD = SSTL2_I ;
NET "sd_udm" LOC = "J1" | IOSTANDARD = SSTL2_I ;
NET "sd_udqs" LOC = "G3" | IOSTANDARD = SSTL2_I ;
NET "sd_we" LOC = "D1" | IOSTANDARD = SSTL2_I ;
# Path to allow connection to top DCM connection
#NET "sd_ck_fb" LOC = "B9" | IOSTANDARD = LVCMOS33 ;
# Prohibit VREF pins
CONFIG PROHIBIT = D2;
CONFIG PROHIBIT = G4;
CONFIG PROHIBIT = J6;
CONFIG PROHIBIT = L5;
CONFIG PROHIBIT = R4;

If there is something else that I must do, then I'm unaware of it.
Please enlighten me.

> In general, for a design like this, a good rule of thumb is that
> every signal leaving the FPGA should be driven by a flip-flop in the
> IOB. You may want every incoming signal to also use the flip-flop
> in the IOB. This makes IO timing much more consistent.

I'm assuming this detail is handled by the location constraints,
right?


> When you're trying to implement high speed logic, you tend to need
> to think about how the HDL being written will appear in actual
> gates. While lots of high-level abstractions are nice, you need to
> keep a eye on performance as you design. As you're discovering,
> meeting a 2.5nsec timing spec can be very demanding.

I normally don't bother with ISE (makefiles are better for managing
large projects), but I do frequently pop in it to check out the
schematics for what I'm writing. I have an idea how it should look,
and I can see if something has gone wrong.

So in that respect, I think I'm doing ok. I'm still learning how to
speed things up; coalescing levels was something I recently
discovered / deduced.


thutt
0 Kudos
jprovidenza
Voyager
Voyager
17,557 Views
Registered: ‎08-30-2007

Thutt -

 

 

Take a look at the map report (.mrp) - it will show you what flip-flops were

packed into IOBs, what the drive options are, etc.

 

If you don't do anything special to forward the clock to the outside world, it

will go out using sub-optimal routing.  If you force it into an output DDR flop,

it will have similar delays as the other IO signals - you can then use the

phase shifted clock signals from the DCM to control setup/hold.

 

John P

0 Kudos
thutt
Newbie
Newbie
5,135 Views
Registered: ‎06-07-2009

If I'm reading this correctly, each of my DDR signals are in an IOB?
(I deleted some of the rows & columns to make this fit in 80 columns)

I didn't do anything special, so what made the tools put them in the IOB?
Is it just normal for the tools to put top-level input/output signals into an IOB?
(Perhaps I should mention I'm using VHDL)

+---------------------------------------------------------------------------+
| IOB Name | IOB Type | Direction | IO Standard | IBUF/IFD |
| | | | | Delay |
+---------------------------------------------------------------------------+
| clk_50mhz | IBUF | INPUT | LVCMOS33 | 0 / 0 |
| sd_a<0> | IOB | OUTPUT | SSTL2_I | 0 / 0 |

| sd_a<12> | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_ba<0> | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_ba<1> | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_cas | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_ck_n | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_ck_p | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_cke | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_cs | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_dq<0> | IOB | BIDIR | SSTL2_I | 0 / 0 |

| sd_dq<15> | IOB | BIDIR | SSTL2_I | 0 / 0 |
| sd_ldm | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_ldqs | IOB | BIDIR | SSTL2_I | 0 / 0 |
| sd_ras | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_udm | IOB | OUTPUT | SSTL2_I | 0 / 0 |
| sd_udqs | IOB | BIDIR | SSTL2_I | 0 / 0 |
| sd_we | IOB | OUTPUT | SSTL2_I | 0 / 0 |
+---------------------------------------------------------------------------+
0 Kudos
jprovidenza
Voyager
Voyager
5,128 Views
Registered: ‎08-30-2007

All your IO signals will be in IOBs - the question is, are they using the flip-flops in the IOB?

 

Here's a snippet from a .mrp file for one of my projects:

 

 

Section 6 - IOB Properties
--------------------------

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IOB Name                           | Type             | Direction | IO Standard          | Drive    | Slew    | Reg (s)      | Resistor | IBUF/IFD | SUSPEND          |
|                                    |                  |           |                      | Strength | Rate    |              |          | Delay    |                  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| clk_in                             | IBUF             | INPUT     | LVCMOS33             |          |         |              |          | 0 / 0    |                  |
| eom_clkn                           | DIFFSTB          | OUTPUT    | LVDS_33              |          |         |              |          | 0 / 0    | 3STATE           |
| eom_clkp                           | DIFFMTB          | OUTPUT    | LVDS_33              |          |         | ODDR2        |          | 0 / 0    | 3STATE           |
| eom_datan                          | DIFFSTB          | OUTPUT    | LVDS_33              |          |         |              |          | 0 / 0    | 3STATE           |
| eom_datap                          | DIFFMTB          | OUTPUT    | LVDS_33              |          |         | OFF1         |          | 0 / 0    | 3STATE           |
| leds<0>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    | OFF1         |          | 0 / 0    | 3STATE           |
| leds<1>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    |              |          | 0 / 0    | 3STATE           |
| leds<2>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    |              |          | 0 / 0    | 3STATE           |
| leds<3>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    |              |          | 0 / 0    | 3STATE           |
| leds<4>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    | OFF1         |          | 0 / 0    | 3STATE           |
| leds<5>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    | OFF1         |          | 0 / 0    | 3STATE           |
| leds<6>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    | OFF1         |          | 0 / 0    | 3STATE           |
| leds<7>                            | IOB              | OUTPUT    | LVCMOS33             | 8        | SLOW    | OFF1         |          | 0 / 0    | 3STATE           |
| reset_in                           | IBUF             | INPUT     | LVCMOS33             |          |         |              | PULLDOWN | 0 / 0    |                  |
| sel0                               | IOB              | OUTPUT    | LVCMOS33             | 12       | SLOW    |              |          | 0 / 0    | 3STATE           |
| sma_clk                            | IOB              | OUTPUT    | LVCMOS33             | 12       | FAST    | OFF1         |          | 0 / 0    | 3STATE           |
| sw0                                | IBUF             | INPUT     | LVCMOS33             |          |         |              |          | 0 / 0    |                  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Note that column that has some OFF1 or DDR2  entries - this shows that an flip-flop in the IOB is being used.

 

John Providenza

0 Kudos