05-01-2014 05:24 AM
I'm working with a Zedboard and the Ethernet driver (XEmacPs) for a standalone application. My goal now is to send packets as fast as I can. As long as I know, the limit for a Buffer Descriptor Ring (BdRing) is 32 BDs, so I need to fill a new BdRing and send it after the transmission of the previous.
When I allocate a new BdRing and enqueue it to hardware, then it seems that some of the previous packets has been not transmitted yet, so the board don't transmit all the BDs of the new BdRing and I lose them. If I put a delay function between the two transmissions, then everything goes right (except for the useless delay...).
So... Is there some way to check if a transmission is completely finished in order to send a new one without those problems? Which is the proper way to send BdRings constantly?
Thank you, and forgive my English.
05-07-2014 07:02 AM
05-08-2014 10:28 AM
Thank you for your answer, but it looks like the XEmacPs example (I am already working whit it) with some added PL functionalities. However, it doesn't solve my problem about knowing the state of the Ethernet device's buffer or some flag in order to transmit packets continuously without overflow.
05-12-2014 04:07 AM
I am currently facing the same issue.
It is not clear to me to know how to resend some packets after that the previous ones have been sent...
Normally, when you allocate a certain amount of BD and give them to the HW, then an Interruption is rise when the packets are sent.
Then in the interrupt handler you disable the interrupts and unalloc the BD that has been sent.
After, the challenge is to reallocate new bdrings and give them to the HW. An int_enable is not enough...
After multiple tries, if I made a xemacps_start() and xemacps_transmit() after my new allocation, then the packets are sent but the zync seems to be in a strange behavior. The load of the same application after stopping the previous one without changing anything in the design doesn't work... A power off and on solve the issue, which make me thinking that the way it works is not correct...
Is it the write to TXQBASEADRESS made in the function xemacps_start() which allows to sent new packets?
Would it be possible to share your c code to compare with what I do to help each other?
Is there an example somewhere for standalone application which send some packets repetitvely(differently as the xilinx C code example)?
05-12-2014 08:47 AM
I am using a ZC706 Board.
My tool version is VIVADO 2013.4.
I have also follow the recommendations from the UG 485 (TRM) section GEM in the way to clear the status bits of the concerned registers.
Let me know if further details are necessary.
05-13-2014 04:21 AM
Is there refrence documentation onto the DMA mechanism on the emacps somewhere?
It is not clear to me to know how to proceed when I want to sent several frames repetively for tests...
When I sent for example say 10 frames in 10 BDs, I get an interrupt in the handler function.
I disable the interruption on the TX part.
Then I have to deallocate the BDs used.
Then when to send some new frames in BDs, I alloc the new BDs and give them to the HW and enable the interrupt again.
And it doesn't sent anything...
When I call the xemacps_transmit() function, then the frames are sent but it stopped after several loop(15).
And if I want to reload the application that was working for 15 times, then it doesn't work... I need to power off the board and switch on again... Which make me thinking that the way I proceed is not correct.
Can any one can help me to solve my issue in the way that I can progress in my developpment and avoid wasting time with the dma mechanism.
Thanks for your feedback.
PS: I will post soon a zip file of my code source example for my ZC706 under SDK2013.4.
05-13-2014 08:37 AM
please find in attachment a zip file of the example files provided by xilinx for the emacps controller that I have modified in the way to send frames repetitevely.
It alternatively changes the ethernet mac adress every new BD allocation. Then I saw that my frames are sent out of the board monitoring ethernet port with WIRESHARK onto my PC (Ethernet board port connected to my PC).
Then looking at my C code, I would expect to have 32 frames out with one MAC address and then 32 frames with the other MAC address. And instead I have only six frames of each...
In addition, if I don't allocate the maximum amount of bds that was specified with XEmacPs_BdRingCreate, then the program doesn't work...
I am quite stuck in my developpement due to the fact that I can't have the flexibility that I would exepect...
Meaning to send a variable amount of frames and not only the maximum of BDs available.
I still think that I miss something in the way to configure and exchange with the controller...
Feel free to modified the code and exchange with us or give some advices.
05-13-2014 08:38 AM
I'm sorry for the delay in responding, but I wanted to try some things before do it. I hope it helps you...
Today I wrote a piece of code (is based on XEmacPs example, I attach it) that send frames repetively, and I think there is no delay between each filling of the BdRing. I'm not using any flag or information provided for the dirver to do that, but I THINK it's pretty efficient and the code is very short. It's still a test code, so it's not well documented and you could find some garbage code lines, but I'm sure you'll get it. The only two important functions are main_test(), send_test() and the send handler.
In fact, the key of this code is the deallocation of every BD when the send handler is triggered (as you suggested, so thank you very much). As long as I know, the interruption means that a packet has been sent through the fisical port, so you can allocate a new BD without risk of saturating the out buffer. However, I'm not disabling the interruptions in the handler, there is any reason I should?.
In this code, I have a counter for the number of BDs in the BdRing. It is incremented when allocating BDs and decremented by 1 after freeing a BD in the send handler. I am allocating 16 BDs (the half of the maximum allowed) always this counter is 16 or lower, so it will never be more than 32 BDs in the BdRing. By doing that you can avoid the delay between transmissions since the buffer will never be empty.
I'm sure there is a better way to do that, but I don't know it yet. And maybe I'm doing something wrong in the code (I'm not an expert programmer...), but it works. Check it out and tell me what you think!
(I don't get more than 600 Mbps in transmission... a little disappointing).
05-14-2014 08:44 AM
Wow, I'm sorry, I just noted your last message (sent one minute before mine...).
I'm looking at your code and... it's more complex and longer than I expected. As I said, I'm not a programmer, and I can't read and understand code quickly.
I'll take a look at your code, but I don't promise you anything. It's clear that you understand much much better than me everything related to DMA, cached memory and so on, but... I don't know why you need so much code. Take a look at my code and give me what you think (I'm not the expert here! haha).
05-17-2014 06:21 PM
There is alot of information in the install Xilinx directory including some examples and docs. On my machine:
In the examples directory, the file xemacps_example_intr_dma.c shows alot of what you want. Be carefule, it is full of bugs. For example, the TxFrame and RxFrame are not aligned to the cache line size of 32 bytes and if they had ever sent a second packet it would be corrupt since both Xil_DCacheFlushRange and Xil_DCacheInvalidateRange backup to a cache line boundary and go for an integral multipe of it.
Also, many macros in many Xilinx drivers do not properly put parenthesis around their arguments, so you will have to do it when you call the macro with nontrivial arguments. That seems greatly improved in 2013.4, however, after being burnt, it's best to check each one.
A better place to look is in the lwip port. For example, on my machine:
has alot of what you are asking about.
05-19-2014 12:20 AM
thanks a lot for your example code. I ahve tested it on my ZC706 board and it works fine.
The point that was not clear to me (still not) is the TX interruption frequency.
Indeed, when I allocated 16 bd and give them to HW, it is not clear to know if an interruption is raised every time a packet is sent or when all the allocated BDs have been sent...
I have made a similar example as yours for the RX part and it works not properly but it works. I will post soon my code for the RX part. Again another point is not clear to me for the RX part. Why it works when only and only when I allocate the total amount of BDs that I create? The wrap bit is automatically set during creation and if I allocate only 10BDs and if I set the wrap bit of last one of my BD set, then it doesn't work...
And this is mentionned in the TRM UG 495...
I continue to investigate. I keep you in touch.
05-19-2014 12:35 AM
05-20-2014 03:41 PM
The way the driver uses the bits in the BDs is dictated by the hardware (See UG 585 under "Rx Buffers" in the "Gigabit Ethernet Controller" chapter).
The lwip driver is a much better place to look than the driver example. It has it's own set of issues. The most important is that the Xil_DCacheInvalidateRange function is broken in 2013.4. It's broken in a different way in 2014.1 It's broken in creative other ways in earlier versions.
I discovered this through exhaustive testing. If you create their lwip echo example, and send two full size 1460 byte packets (apparently more testing than Xilinx did :-) ), then some of the echoed ones will be corrupt in the last 10 bytes. Took two minutes to determine there was a problem and a week to track it down to yet another driver bug.
Int the lwip driver, you will see a line:
Xil_DCacheInvalidateRange((unsigned int)p->payload, (unsigned)XEMACPS_MAX_FRAME_SIZE);
in emacps_recv_handler(). The function has several glaring errors and does not invalidate the cacheline of the last address. XEMACPS_MAX_FRAME_SIZE is 1518 but it only invalidates up to 1504.
Of course, if you are writing your own, you only need to invalidate p->len (the actual size of the payload) and not XEMACPS_MAX_FRAME_SIZE as the lwip driver does. Of course, this assumes a correctly functioning Xil_DCacheInvalidateRange. It may not sound like much, but to get wire speed, you only have 12 uS per full size TCP packet and the lwip driver is using up 3 uS (the amount of time to invalidate a full packet) even for short ACK and other packets (eg lots of short ARP and other broadcast packets not even intended for you app).
The lwip driver also has alot of magic number for things for which the driver provides macros (see xemacps_bd.h etc). For example, XEmacPs_BdSetRxWrap will set those lower two bits as required.
05-21-2014 11:33 AM
I am following your posts for some days and I found it so useful!
I have a duplication problem with EMACPS in TX mode (Zynq to PC) when I set everything in wrap (RX and TX). The RX side seems to work perfectly but when I return the same frame for PC I receive two times!
I share my code perhaps you could find a solution!
Scenario: Everytnig received from PC should be return by Zynq... for simplification, I send Ping packets and I check by WireShark.
05-27-2014 07:33 AM
I came back to you after several investigations and I came to a solution that works fine for both TX and RX.
I am going to clear my code in the way to share it with you without any issue.
The fix is mainly due to the use of the function Xil_DCacheInvalidateRange for each buffer deallocation. It works fine for both ZEDBOARD and ZC706 board.
I will post the code next week.
05-29-2014 05:42 AM
Thanks for your future code, Julien. I'm sure it'll help all of us. I really would like to share my code too, but it seems that I will never finish with it. Every time I solve a problem, a new one appears.
I had a problem with reception. I couldn't receive more packets that the number I specified when creating the BdRing. The solution was so simple as using "XEmacPs_BdClearRxNew()" before freeing each BD, but I had to read the description of each function in the documentation (which doesn't help a lot).
Now, I have a problem when transmission. I am allocating BDs as soon as others are freed, but the device doesn't trasmit more than 28-32 packets (sniffing with Wireshark) before it stop transmitting. Maybe the problem is the high speed of transmission. Is there some way or trick to especify / limit the transmission speed?
The last problem I have is the loss of packets in reception when I'm working with speeds over 400-500 Mbps. Maybe I just have to improve the memory management, but I don't know. Can you receive packets at higher speeds without loss?
Ok, it's a lot of problems and I feel like a crybaby. I'm sorry... :)
07-09-2014 03:54 AM
Sorry for my late reply but I was too busy these last weeks.
I didn't get time to clean my simple example in standalone mode.
I plan do it in the following weeks and post it as soon as it is ready.
The main important thing is to uncache the memory buffers after each allocation and deallocation.
This seems to be linked with the MMU which is present in the ARM cores of the ZYNQ family.
04-21-2016 02:37 PM
I'm sure that code would be useful to me as well as others that stumble upon these issues!
Any chance you could still post it (admittedly, 3 years later :D)?