03-01-2010 12:40 AM
I am using the ll_temac with the PPC405 in a Virtex 4 FX. The PPC runs at 300MHz and the ll_temac is connected to the mpmc which runs a DDR Ram with 32 bit. All busses are at 100 MHz. So this is my setting.
I am using eCos RTOS with the FreeBSD network stack.
I am getting only about 22Mbit/s when sending UDP data, using sockets. TCP is at 20MBps.
The curious issue: I have ported the design from the older plb_temac which runs on the old 3.4 PLB. The hardware (a custom board) and the software (os and application) is the same. Only the ll_temac driver is the new component, which I did by my own, based on the old xtemac and the Xilinx examples from xapp1041 and the lwIP-ll_temac-Driver.
At the moment I am using TX checksum offloading and interrupt coalescing, which is the same setting as the old xtemac.
Overall system performance is the same as in the old design.
Are there any suggestions on how to optimize the lltemac driver? I'm quite unsure whether I missed something or not.
03-09-2010 02:36 AM
I think your problem is not in the LLTEMAC driver, it is more in the network stack that you are using. Also using an operating system decreases the speed of the system. What is your targeted speed and did you really need the operating system ?
03-23-2010 02:34 AM
Unfortunately I still have performance issues in my system. And I'm sure it is not the ll_temac, but the mpmc.
I realized that not only the network is slower but also any kind of memory operations.
Memcpy is about 20-30% slower than on the old system (plb3.4 and plb_ddr). This is not only in the operating system, but also measurable in a small application run from the blockram. CPU cache and compiler optimizations are enable, the CPU has two P2P PLB busses for instruction and data to the mpmc. I deactivated all pipelines in the mpmc configuration.
Is this possible that I'm so much slower than with the old PLB34?
03-25-2010 01:43 PM
Yes, the MPMC is higher latency that plb_ddr. Maybe even 20%... The reason is that MPMC uses the DDR2 PHY for old DDR, along with all the extra pipeline levels needed to reach DDR2 speeds. Theres no way to remove those unneeded registers.
The MPMC is also much faster than plb_ddr2 core, however.
03-29-2010 12:02 AM
Hi dylan, thanks for your reply.
So there is no way to get the same performance with the ll_temac/mpmc with DDR1 memory as we had with the plb_ddr and xtemac?
I can see in the datasheet of the plb_ddr, it has a write / read lateny of 11 / 13 cycles.
The mpmc needs at least 21 cycles with all the configurable pipelines deactivated.
So I guess I should have a try with increasing the frequency.
At the moment our systems runs all with 100 MHz (PPC@300)
I could run the mpmc with 150 and the peripherals with 75...
03-29-2010 08:31 AM
Correct, turning off the pipelines is generally the most you can do.Yes, you should be able to run the MPMC and PLBv46 at a higher clock rate.
One thing you can try is to use the static PHY. It is much simpler, but is generally limited in memory margin. Attached is an example Virtex-4 design. If you try this, please post the latency results back, as I have not evaluated it for this purpose.
05-26-2010 07:28 PM
I'm not very familiar with the network stack you're using, but one thing you can do is circumvent the mpmc by using BRAM. If you don't have enough BRAM for all the buffers you want to implement, at least use it to hold critical data structures such as block descriptors. We've seen performance gains of 3-4x using this strategy with the VxWorks network stack.
06-07-2010 01:28 AM
This is an interesting point. I am using eCos as RTOS with the FreeBSD stack. BRAM can't be used, because the ll_temac has a SDMA connection with the mpmc. But I think I could try your suggestion off locating the ll_temac buffer descriptors inside the blockram.