cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
9,021 Views
Registered: ‎06-18-2008

Spartan6 + xps_ll_temac | UDP TX performance with lwIP 3.00.a @1Gb

Hallo,

 

I have a custom board with xc6slx75csg484-3 and DDR3 16bit @800MHz running.

 

I tried to test the maximum raw UDP TX performance of a design with microblaze @100MHz(8k/8k caches), MPMC  and xps_ll_temac(16k TX/8k RX).

 

Based on xapp1026 I used the utxperf.c example. I got 500Mbit with a payload size of 8100bytes (120Mbit@1424bytes) , see my older post Virtex4 UDP performance.

 

But I think this is not the maximum performance of this configuration, because of SDMA operation and there is no throttle in the datapath.

 

Has anyone pushed the perfomance to 900Mbit?

Has anyone test the bare metal TX performance?

 

May be the microblaze limitates the performance in some way?

 

Please comment and best regards, Thomas

0 Kudos
11 Replies
Highlighted
Adventurer
Adventurer
9,020 Views
Registered: ‎06-18-2008

 

By the way, in ISE 12.3 the generated lwipopts.h comes up with #define PROCESSOR_LITTLE_ENDIAN.

May be I hit the wrong button but this is a kind of funky EDK joke  :smileyhappy:.

Tags (1)
0 Kudos
Highlighted
Adventurer
Adventurer
8,970 Views
Registered: ‎06-18-2008

Update:

Currently I pushed the UDP TX performance to 630MBit/s. I can see a correlation between processor performance and throughput.

 

Probably the microblaze handles the interrupts to slowly?

 

Any opinion is welcome. Regards, Thomas

0 Kudos
Highlighted
Adventurer
Adventurer
8,716 Views
Registered: ‎06-18-2008

Hello,

 

here some results from the XILINX Temac test suite:

 

Spartan6 + xps_ll_temacwith lwIP 3.00.a(tuned) @1Gb,

100MHz CPU Clock + 8kByteD/I cache, code runs in DDR3@800MHz:

 

# 20.12.2010
#************************************************************************
#Interrupt Driven SG DMA Performance
#************************************************************************
#
#                   Packet Threshold settings
#          +------------------+------------------+------------------+------------------+
#          |       1          |       2          |       8          |      64          |
#          +------------------+------------------+------------------+------------------+
#     Frame        Net   CPU          Net   CPU          Net   CPU          Net   CPU
#      Size   Mbps Util  Util    Mbps Util  Util    Mbps Util  Util    Mbps Util  Util
#      ----  ----- ----- ----   ----- ----- ----   ----- ----- ----   ----- ----- ----
#        64   15.6   2.1  100    15.6   2.1  100    15.6   2.1  100    15.6   2.1  100
#       128   31.2   3.6  100    31.3   3.6  100    31.3   3.6  100    31.3   3.6  100
#       512  124.1  12.9  100   124.1  12.9  100   124.1  12.9  100   124.1  12.9  100
#      1518  362.3  36.7  100   362.3  36.7  100   362.3  36.7  100   362.3  36.7  100
#      9000  994.4  99.7  100   994.4  99.7  100   994.4  99.7   69   996.4  99.9   52

 

So it found microblaze can realize the throughput maximum but under full CPU load only.

 

(The Packet Threshold > 1 CPU utilization is not applicable for real world applications - I think.)

 

Using 100MHz CPU Clock + 16kByteD/I cache I pushed the throughput to
102MByts/s (855Mbit/s) payload for a GigE Vision device using jumbo frames.

 

Probably bigger chaches can increase a little the throughput, but it is hard to map&place in my case.

 

Regards, Thomas

 

 

p.s.:

spend a kudo for an intrinsic information.

0 Kudos
Highlighted
Adventurer
Adventurer
8,667 Views
Registered: ‎06-18-2008

Yes, so it is. Bigger caches (32k I/D) push the throughput to 978Mbit/s in my case.

But the cost(BRAM resources) of the performance gain are to high probably.

 

However, SPARTAN6 with microblaze, soft temac (sdma) and lwip3.0 is a competitive platform.

 

Regards, Thomas

 

0 Kudos
Highlighted
Visitor
Visitor
8,220 Views
Registered: ‎07-28-2010

Hi tgmaster,

 

I am trying to evaluate the performance of a similar setup without being able to achieve your performance.

I am using DDR3 16 bit @ 666MHz and microblaze clocked at 83MHz.

For all other HW options (cache, temac features...) I did an extensive exploration.

 

I used the utxperf.c and lwip v2.00 example with jumbo frames of 8000-9000 packets but received a maximum performance of just 125Mbits/sec. 

 

Did you performed any special tweaks to the hw system or lwip to achieve 500Mbit or you think our difference in the DDR3 and microblaze operating frequencies are causing the lowest performance.

 

Thank you in advance,

Nikos

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
8,207 Views
Registered: ‎08-06-2007

Hi,

 

What parameters have you set on MicroBlaze?

 

Göran

0 Kudos
Highlighted
Visitor
Visitor
8,204 Views
Registered: ‎07-28-2010

I have:

 

  • barrel shifter enabled
  • Integer multiplier enabled 
  • Integer divider enabled
  • Pattern comparator enabled
  • 32KB I & D Caches
  • Clocked @ 83.3MHz
Nikos

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
8,201 Views
Registered: ‎08-06-2007

Hi,

 

That should be good enough.

I doubt that he had better hardware configuration than you.

So it might be software configuration that differs.

Have you based your project on the xapp1026 as the OP?

 

Göran

0 Kudos
Highlighted
Visitor
Visitor
8,196 Views
Registered: ‎07-28-2010

I am using Xapp 1026 and olny the utxperf.c and sending jumbo frames. The only difference is that i am liniking utxperf.c to lwip 1.3 v2.0 and not lwip 1.3 v3.0 (I am testing v3.0 at the moment). 

 

Any ideas on specific lwip configurations which I need in order to increase performance?

 

I am measuring performance with wireshark since it seems that iperf cannot handle large frames (correct me if I am wrong).

 

Nikos

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
3,152 Views
Registered: ‎08-06-2007

Hi,

 

Never done much tweaking of lwip.

You might however find something useful if you google on it.

 

Göran

0 Kudos
Highlighted
Adventurer
Adventurer
3,137 Views
Registered: ‎06-18-2008

Hello,

 

first of all, try to port the XILINX Temac test suite on your system to see the maximum performance.

 

Issues you might consider:

 

1.) The instructions throughput of the microblaze is low compared to PPC.

In my case bigger code caches helped a lot. Use all available cache optimizations.

Optimize your code - control pbuf allocation yourself.

 

2.) Yes, the lwip xilinx port needs a lot of patches to get the better performance.

 

But first of all keep an eye on the generated lwIP options :smileyhappy: .

There was something wrong... as I remember:  Do not use EDK to configure the lwIP options

and compile yourself.

 

3.) How do you generate your payload? I my case a VHDL engine writes the payload data into the

memory via a mpmc pim(npi). Use "referenced by pointer" pbuf as much as possible.

 

 

Regards, Thomas

 

0 Kudos