cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Explorer
Explorer
7,477 Views
Registered: ‎11-12-2007

ll_temac poor performance

Hi guys,

I am using the ll_temac with the PPC405 in a Virtex 4 FX. The PPC runs at 300MHz and the ll_temac is connected to the mpmc which runs a DDR Ram with 32 bit. All busses are at 100 MHz. So this is my setting. 

I am using eCos RTOS with the FreeBSD network stack.

 

I am getting only about  22Mbit/s when sending UDP data, using sockets. TCP is at 20MBps.

The curious issue: I have ported the design from the older plb_temac which runs on the old 3.4 PLB. The hardware (a custom board) and the software (os and application) is the same. Only the ll_temac driver is the new component, which I did by my own, based on the old xtemac and the Xilinx examples from xapp1041 and the lwIP-ll_temac-Driver.

 

At the moment I am using TX checksum offloading and interrupt coalescing, which is the same setting as the old xtemac.

 

Overall system performance is the same as in the old design.

 

Are there any suggestions on how to optimize the lltemac driver? I'm quite unsure whether  I missed something or not. 

Tags (3)
0 Kudos
Reply
7 Replies
Adventurer
Adventurer
7,427 Views
Registered: ‎04-24-2009

I think your problem is not in the LLTEMAC driver, it is more in the network stack that you are using. Also using an operating system decreases the speed of the system. What is your targeted speed and did you really need the operating system ?

 

 

0 Kudos
Reply
Explorer
Explorer
7,384 Views
Registered: ‎11-12-2007

Unfortunately I still have performance issues in my system. And I'm sure it is not the ll_temac, but the mpmc.

I realized that not only the network is slower but also any kind of memory operations. 

Memcpy is about 20-30% slower than on the old system (plb3.4 and plb_ddr). This is not only in the operating system, but also measurable in a small application run from the blockram. CPU cache and compiler optimizations are enable, the CPU has two P2P PLB busses for instruction and data to the mpmc. I deactivated all pipelines in the mpmc configuration.

Is this possible that I'm so much slower than with the old PLB34? 

0 Kudos
Reply
Xilinx Employee
Xilinx Employee
7,372 Views
Registered: ‎07-30-2007

Yes, the MPMC is higher latency that plb_ddr.  Maybe even 20%... The reason is that MPMC uses the DDR2 PHY for old DDR, along with all the extra pipeline levels needed to reach DDR2 speeds.  Theres no way to remove those unneeded registers.

 

The MPMC is also much faster than plb_ddr2 core, however.

Explorer
Explorer
7,359 Views
Registered: ‎11-12-2007

Hi dylan, thanks for your reply. 

So there is no way to get the same performance with the ll_temac/mpmc with DDR1 memory as we had with the plb_ddr and xtemac?

 

I can see in the datasheet of the plb_ddr,  it has a write / read lateny of 11 / 13 cycles. 

The mpmc needs at least 21 cycles with all the configurable pipelines deactivated.

So I guess I should have a try with increasing the frequency.

At the moment our systems runs all with 100 MHz (PPC@300)

I could run the mpmc with 150 and the peripherals with 75... 

0 Kudos
Reply
Xilinx Employee
Xilinx Employee
7,349 Views
Registered: ‎07-30-2007

Correct, turning off the pipelines is generally the most you can do.Yes, you should be able to run the MPMC and PLBv46 at a higher clock rate.

 

One thing you can try is to use the static PHY.  It is much simpler, but is generally limited in memory margin. Attached is an example Virtex-4 design. If you try this, please post the latency results back, as I have not evaluated it for this purpose.

 

0 Kudos
Reply
7,135 Views
Registered: ‎01-27-2009

I'm not very familiar with the network stack you're using, but one thing you can do is circumvent the mpmc by using BRAM.  If you don't have enough BRAM for all the buffers you want to implement, at least use it  to hold critical data structures such as block descriptors.  We've seen performance gains of 3-4x using this strategy with the VxWorks network stack.

0 Kudos
Reply
Explorer
Explorer
7,005 Views
Registered: ‎11-12-2007

This is an interesting point. I am using eCos as RTOS with the FreeBSD stack. BRAM can't be used, because the ll_temac has a SDMA connection with the mpmc. But I think I could try your suggestion off locating the ll_temac buffer descriptors inside the blockram.

0 Kudos
Reply