05-06-2011 06:53 AM
Hello everybody, I'm looking for an UDP/IP stack written in VHDL language (I have to implement it in Spartan-6 FPGA). I'm looking for a one, as simple as possible, with these features:
-written in VHDL language ONLY, not Verilog
-to be implement in a Spartan-6
-no use of other processors or devices (as Microblaze, PowerPC and others)
-no use of external memory, no DDR or external ports, should be use only distributed memory or block ram, no other possibilities
-obviously a free UDP/IP Stack is the best but also a licensed one could be fine
Is there anywhere an UDP/IP stack with these features?
Thank you very much!
05-06-2011 02:25 PM
HI
I've just done a google and found a lot of links to this, including xilinx Xapps,
try this
05-06-2011 04:06 PM
Hi,
I needed a custom implementation of UDP a while ago (Verilog, no processor). Researching available cores didn't produce any good results, so I've implemented the stack from scratch. Depending on the features you want it can be relatively simple or complex. Do you want IP fragmentation or not; how do you assign IP: static or DHCP; do you need ARP; do you need an application protocol on top of UDP (like TFTP), and several other considerations.
Thanks,
05-07-2011 03:27 AM
If you just want to periodically generate UDP packets, it's very easy to do this with a fairly simple state machine.
You're welcome to look at the example I've written, at http://tristesse.org/DigilentAtlysResources that does just this. It's running successfully on a Spartan 6. It's written in Verilog, but the UDP/IP/Ethernet part alone would be trivial to rewrite in VHDL if you so desired.
05-11-2011 07:58 AM
I Download your code in spartan 6 Atlys Board I change the physical address and Ip address and than simulate and burn on board it cannot send any data on th UDP
Please guide me how yor code will work on my board
Thanks
05-11-2011 02:57 PM
05-13-2011 10:32 AM
Same problem i have faced, How can i calculate the checksum? i just change the IP addrs and Mc addrs, do i need to change any other thing?
When i burn the code the LAN leds goes on but i didn't see any thing on wireshark. Can you guide me how i can add the calculation steps into the state machine which you are talking about.
Regards.
Fadi
05-14-2011 01:53 AM
05-14-2011 07:37 AM
i calculated the checksum and i have tried but did get any thing
can you tell me what i m doing wrong
05-14-2011 07:53 AM
Your checksum is wrong. I get:
4500 + 0023 +1234 + 0000 + 4011 + 0000 + c0a8 + 0103 + c0a8 + 0108
= 21AC3
2 + 1AC3 = 1AC5
Things to note:
05-19-2011 03:20 PM
Hi Joel,
Your ethernet module for the atlys just saved me a bunch of time! Thank you very much! :)
I made a few quick additions so I don't have to recalculate the checksum every time. This snippet from my slightly modified "ethernet_test_top.v" .
function [15:0] header_checksum; input wire [191:0] pack_32_6_header; reg [19:0] chk; reg [15:0] val_uint16; integer i, j; begin chk = 0; for (i=0; i<10; i=i+1) begin j = i + 1; j[0] = ~j[0]; // $display("header_checksum: j = %d", j); val_uint16 = pack_32_6_header[16*j +: 16]; chk = chk + val_uint16; // $display("header_checksum: val_uint16 = %h", val_uint16); // $display("header_checksum: chk = %h", chk); end header_checksum[15:0] = ~(chk[15:0] + chk[19:16]); $display("header_checksum: header_checksum = %h", header_checksum[15:0]); end endfunction // header_checksum parameter [47:0] DST_MAC = 48'h0001_0203_0405; parameter [47:0] SRC_MAC = 48'h0037_ffff_3737; parameter [31:0] SRC_IP = {8'd192, 8'd168, 8'd3, 8'd8}; parameter [31:0] DST_IP = {8'd192, 8'd168, 8'd3, 8'd4}; parameter [15:0] SRC_PORT = 16'd12345; parameter [15:0] DST_PORT = 16'd23456; parameter [15:0] PKT_LEN = 16'h000f; initial begin packet_buffer[0] = DST_MAC[47:16]; // dstmac (8) packet_buffer[1] = {DST_MAC[15:0], SRC_MAC[47:32]}; // dstmac (4), srcmac (4) packet_buffer[2] = SRC_MAC[31:0]; // srcmac(8) packet_buffer[3] = 32'h0800_4500; // hwtype ethernet (4), protocol type ipv4 (1), header length (1) (*4), dsc (2) packet_buffer[4] = 32'h003c_1234; // total length (4), identification (4), packet_buffer[5] = 32'h0000_4011; // flags/frag offset (4), ttl (2), protocol (2) packet_buffer[6] = {16'h0000, SRC_IP[31:16]}; // checksum (4), srcip (4) packet_buffer[7] = {SRC_IP[15:0], DST_IP[31:16]}; // srcip (4), dstip (4) packet_buffer[8] = {DST_IP[15:0], SRC_PORT[15:0]}; // dstip (4), srcport(4) packet_buffer[9] = {DST_PORT[15:0], PKT_LEN[15:0]}; // dstport (4), length (4) packet_buffer[10] = 32'h0000_4845; // checksum (4), data (4) packet_buffer[11] = 32'h4c4c_4f40; // data packet_buffer[6][31:16] = header_checksum ( {packet_buffer[8], packet_buffer[7], packet_buffer[6], packet_buffer[5], packet_buffer[4], packet_buffer[3]} ); end
Also, I noticed in your ISE project settings that you set the device to a speedgrade -3. Some time ago I contacted Digilent to ask what speedgrade was on the Atlys, and the reply was that it was a -2 speedgrade. (I could not easily check it on the board because it was delivered with a small passive cooler on it).
And in the timing report I noticed some failing paths in TS_clk_125_tx_clkout0, but those seemed like you just might know exactly what was going on there and ignoring those.
Anyways, many thanks for the gbit module! :)
05-19-2011 04:59 PM - edited 05-19-2011 05:04 PM
Hi mrflibble,
Awesome! I'm glad you got it working.
I wanted to be able to change header fields from packet to packet, so I added code to my state machine to do it:
module ip_header_checksum( input clk, output wire [15:0] checksum, input wire [31:0] header, input wire reset ); reg [31:0] checksum_int; reg [2:0] header_count; always @(posedge clk) if (reset) begin checksum_int <= 0; header_count <= 0; end else if (header_count != 5) begin header_count <= header_count + 1'b1; checksum_int <= checksum_int + header[15:0] + header[31:16]; end assign checksum = ~(checksum_int[31:16] + checksum_int[15:0]); endmodule
Then in the packet generator code:
reg [31:0] header_checksum_input; wire [31:0] header_checksum; reg header_checksum_reset; ip_header_checksum ip_header_checksum_1 ( .clk(mac_clk), .checksum(header_checksum), .header(header_checksum_input), .reset(header_checksum_reset)); // Then within the packet generation FSM, before starting to send // anything out, generate the checksum: 0: header_checksum_reset <= 1; 1: header_checksum_reset <= 0; 2: header_checksum_input <= {16'h4500, ETHERNET_LENGTH[15:0]}; 3: header_checksum_input <= {8'd0, ip_identification, 3'b000, FRAG_1[12:0]}; 4: header_checksum_input <= {16'h4011, 16'h0000}; 5: header_checksum_input <= SRC_IP; // etc.
Then when you start sending the packet to the MAC, substitute in the checksum at the appropriate place. I ended up doing it word-by-word rather than using a loop like I did in the example. It's a bit more code, but it made it a lot easier to send a series of different packets in sequence! Using Verilog tasks reduced the code to this:
71: begin mac_flags <= 4'b0000; transmit_header({DST_MAC[15:0], SRC_MAC[47:32]}); end 72: transmit_header(SRC_MAC[31:0]); 73: transmit_header(32'h0800_4500); 74: transmit_header({ETHERNET_LENGTH[15:0], 8'd0, ip_identification}); 75: transmit_header({3'b000, FRAG_5[12:0], 16'h4011});76: transmit_header({header_checksum, SRC_IP[31:16]}); // IP checksum (4), srcip (4)// etc. task transmit_header; input [31:0] header; begin header_data <= header; packet_gen_state <= packet_gen_state + 1'b1; end endtask
From memory there was something wrong with the synchronisation of reset signals to the different clock domains. It didn't seem to cause any problems in such a simple design, but you should fix it for completeness (I did in my ongoing work project, but haven't updated the extracted example).
Thanks for the information on the Spartan-6 speed grade! The manual is pretty spotty, and I'm not really sure why I thought it was a -3. I'll update the information on my site.
05-23-2011 11:28 AM
Hi Joel,
I think I just might lift that packet generation code, and see if it fits in
that "atlys_ethernet_test_v1-20110404.zip" version. :-)
Although one tiny drawback is something I am trying to prevent here. Some
time ago I did a similar statemachine for the USB fifo's on the Nexys2
(which also works for the Atlys). And the minor drawback being: FSM's for
assembling packets gets cumbersome real quick, and you really wish you had
some sort of microcontroller.
For now I do the micro stuff with a picoblaze, but for some of the things I
am really looking at something with a bit more oomph. I noticed you were
also considering aeMB for that. Same here! What I ran into however is the
learning curve looks a bit steep. Which is not being helped by the
availability of documentation / examples. Which is a nice way of saying that
progress is being hampered by the lack of availability of good documentation
and examples I suppose.
Either that, or I am blind and I am missing out on all the good resources.
Have you made any progress on the aeMB front? I am willing to put some time
into it in the near future, but I am not looking to flush endless amounts of
time into it if that can be avoided.
So happen to run into any good aeMB resources? And in particular for use
with assembling UDP packets...
05-23-2011 05:07 PM
You're absolutely right - every time I added yet another state, I became ever more convinced that I should be using a microcontroller, but I think the FSM approach helps to show how the GEMAC works, which is handy because it's otherwise undocumented. Since I really just want to send the contents of a FIFO with a short header, it's not too bad.
Anyway, I've pretty much given up on aeMB for now as I reached my threshold of pain. The documentation is essentially non-existent or completely out of date and some of the test benches are don't work, though the author was kind enough to help me find the right top level file and get it to simulate.
Fortunately, there is one good resource for UDP packets on the aeMB, and that's the USRP2. You can download the firmware from http://www.gnuradio.org/redmine/repositories/browse/gnuradio/usrp2 . Again you'll need to poke around to find out as it's not clear what everything does.
I probably spent close to ten hours pulling the aeMB apart - I would have been financially better off buying an EDK license and being able to get to back to work almost immediately. If you're sold on free alternatives, you might also want to consider the ORPSoC, which works on the Atlys: http://chokladfabriken.org/projects/orpsoc-atlys
05-27-2011 01:32 AM
05-29-2011 08:57 AM
"Anyway, I've pretty much given up on aeMB for now as I reached my threshold
of pain. The documentation is essentially non-existent or completely out of
date and some of the test benches are don't work, though the author was kind
enough to help me find the right top level file and get it to simulate."
Well, that is precisely the sort of scenario I am trying to avoid here. This
is meant for an open source project. So even if I were to drudge my way
through it, then it still will be hard to get into for someone else that
might want to use this project. And a good way to make sure your open source
project will die horribly before it even gets started is just make the
threshold real high!
And also, just for once I'd like to use some easy to use components. ;)
Maybe I should just stick with picoblaze then. Which of course has as
disadvantage the lack of C compiler. Not much of an issue for me personally,
but again from the open source point of view ... being able to use C instead
of assembly helps make it a little easier for people.
Darned tradeoffs! ;)
As for ORPSoC, that looks to be a bit overkill for this application.
Basically I am looking for a free for non-profit soft micro with a small
footprint for which a decent C compiler is available... But maybe ORPSoC is
smaller than I think and I should RTFM more thorough...
@basset:
That UDP/IP core looks like it's commercial. Or did I miss the "Free for
non-profit use" message somewhere on that site?
05-29-2011 07:34 PM
You could consider the LatticeMico8 or LatticeMico32 soft processors. They're both free, open source, and supported by the GCC compiler.
I haven't used them with Xilinx FPGAs and don't think I found any nice examples or tutorials for doing so, but it might be worth the time investment as they are probably higher quality and better supported than anything on OpenCores.
06-16-2011 01:29 AM
Hi everyone.
I have downloaded the Atlys BSB support files and found out that the part number of the FPGA used on the Atlys board is XC6SLX45-3CSG324C, which means its speed grade is -3, not -2.
See my website for full information: http://projects.armandas.lt/atlys-fpga-speed-grade.html
10-31-2012 03:47 AM
Hi Joel,
excellent job on the Ethernet module!!!
That must have been hard.
I have bought Mars MX2 board from
Enclustra. This module use KSZ9021 gigbabit PHY, not
Marvell M88E1111. I was thinking about modifying your
code to communicate with my PHY, but I have never communicated
with such chips. I can not determine if this would be relatively easy to
do, or will it take me months to complete?
There are 26 verilog files in simple gemac and I am having little trouble
identifying which are use to define communication with PHY chip.
Does anyone knows are registers in PHY chips generic type, or every
manufacturer has different registers for their PHY?
I will appreciate any advice you can give me.
best regards,
Bojan Kuljic
10-31-2012 03:55 PM
Ethernet PHYs use a couple of different standards. Unfortunately the 88E1111 and the USRP MAC use GMII for gigabit, which isn't compatible with the RGMII standard that your KSZ9021 supports.
You could certainly extend the MAC to support RGMII, though due to the tighter timing requirements of the interface I don't think this will be a trivial task. I'd probably recommend using the Xilinx MAC, if you have a license for it.
In terms of registers (which are set using the MDIO interface), my understanding is that there are a few registers that might be the same between manufacturers, but you can't really count on it and should always read the datasheet. If your requirements aren't too complicated this is usually pretty straightforward - just go through the list and see if each one sounds like something you need to be doing. For my 88E1111 code, for example, I know that the MAC only supports gigabit speeds, so I disabled negotiation of 10 and 100 Mbit.
11-01-2012 05:40 AM
Hi Joel,
thanks for your replay :-)
I have already acquired datasheet for PHY
and this is on my to-do list for holidays :-)
Regards,
Bojan Kuljic
01-18-2013 05:53 AM
Hi Joel,
I tried to synthesize both versions you provided, I found that the new version requires much more resources than the old one.
The first version (20110228):
Slice Logic Utilization: Number of Slice Registers 880 out of 54576 1% Number of Slice LUTs: 1267 out of 27288 4% Number used as Logic: 1089 out of 27288 3% Number used as Memory: 178 out of 6408 2% Number used as SRL: 178 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 1576 Number with an unused Flip Flop: 696 out of 1576 44% Number with an unused LUT: 309 out of 1576 19% Number of fully used LUT-FF pairs: 571 out of 1576 36% Number of unique control sets: 69 IO Utilization: Number of IOs: 43 Number of bonded IOBs: 42 out of 218 19% Specific Feature Utilization: Number of Block RAM/FIFO: 2 out of 116 1% Number using Block RAM only: 2 Number of BUFG/BUFGCTRLs: 2 out of 16 12%
The second Version (20110404):
Slice Logic Utilization: Number of Slice Registers: 1668 out of 54576 3% Number of Slice LUTs: 1882 out of 27288 6% Number used as Logic: 1481 out of 27288 5% Number used as Memory: 401 out of 6408 6% Number used as SRL: 401 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 2666 Number with an unused Flip Flop: 998 out of 2666 37% Number with an unused LUT: 784 out of 2666 29% Number of fully used LUT-FF pairs: 884 out of 2666 33% Number of unique control sets: 178 IO Utilization: Number of IOs: 49 Number of bonded IOBs: 44 out of 218 20% IOB Flip Flops/Latches: 27 Specific Feature Utilization: Number of Block RAM/FIFO: 49 out of 116 42% Number using Block RAM only: 49 Number of BUFG/BUFGCTRLs: 8 out of 16 50% Number of PLL_ADVs 1 out of 4 25%
also the maximum frequency decreases so much, 105.924MHz for the new version compared to 209.507MHz for the old one.
do you think that is normal ? specially for the block RAMs, do you know what makes that difference ?
Thanks in advance,,
Mina
01-18-2013 08:53 PM
I have no recollection of what the difference between the two versions was (try using a diff tool?) but perhaps the newer one has more/larger FIFOs or something. I've updated the site with a newer version of the design that is much nicer, though it's still not documented. I wouldn't bother with the two older versions at all. If you want to know where the BRAMs are being used, have a look at the messages the synthesis tool produces or look at the schematic view.
You shouldn't ever get hung up on the synthesis frequency readout. The design employs a bunch of different clocks so the maximum frequency is somewhat meaningless. The only thing you should really worry about when designing for FPGAs is that you have applied timing constraints properly, and that your design is able to meet those constraints.
01-21-2013 03:34 AM
Hi Joel,
Thanks for your reply,
I use VHDL and I'm not good in Verliog, so I didn't want to waste a lot of time tracing this issue.
The last version you uploaded gives much better synthesis results, specially regarding the Block Ram. I'll try it now with the atlys board.
Here is the synthesis results (20111219):
Slice Logic Utilization: Number of Slice Registers: 1292 out of 54576 2% Number of Slice LUTs: 1702 out of 27288 6% Number used as Logic: 1510 out of 27288 5% Number used as Memory: 192 out of 6408 2% Number used as SRL: 192 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 2186 Number with an unused Flip Flop: 894 out of 2186 40% Number with an unused LUT: 484 out of 2186 22% Number of fully used LUT-FF pairs: 808 out of 2186 36% Number of unique control sets: 104 IO Utilization: Number of IOs: 50 Number of bonded IOBs: 45 out of 218 20% Specific Feature Utilization: Number of Block RAM/FIFO: 2 out of 116 1% Number using Block RAM only: 2 Number of BUFG/BUFGCTRLs: 6 out of 16 37% Number of PLL_ADVs: 1 out of 4 25%
01-21-2013 04:18 AM
Ah, perhaps the one with all the BRAM had a ChipScope Pro core left in it. Glad you've had some more success with the newest version.
03-15-2013 10:45 AM
I recently tried out this code and turned out to be great. But I have a problem regarding the received ethernet frame. It seems that calculated CRC is not correct in this code according to Wireshark network analyzer. Is it because CRC calculation is wrong or this code doesn't contain CRC calculation code?
03-30-2013 12:00 AM
The CRC calculation (of the Ethernet frame) is done in the MAC and as far as I know it is correct - I've certainly not noticed any problems with it.
My sample code generates UDP packets with incorrect checksums but most operating systems don't care about this (it's a little bit of a pain to generate them in an FPGA, so I didn't bother).
03-30-2013 03:52 AM
the udp crc, is easy to calculate,
its just the ones compliment,
if you set the udp checksum to zero, then the receiver will not check ,
if you have an incorrect udp checksum, i.e one thats none zero, then the system would in my exoeriance discard the packet.
03-30-2013 04:04 AM
Yep! I'm pretty sure the code just sets it to zero.
Incidentally, the annoying thing about calculating a UDP checksum is that it comes before the packet data. In an FPGA this tends to mean that you have to buffer the entire packet, calculate the checksum, and then generate the header. If you're generating data from a continuously streaming source, this adds a bit more complication and latency so it's nice to just set it to zero.
However if you need to reliably interoperate with equipment you don't control and can't test in advance, it's worth doing the right thing and calculating the checksum.
05-07-2013 06:55 AM
Hi,
I have also noticed this. UDP doesn't need checksum, IP header does. The core doesn't calculate
Ethernet packet crc which can be seen in the attached picture.
I am sending minimal packets, so last data is 0xc. If I modify packet_sender.v like:
// Start sending the rest of the payload
16:
if (wr_dst_rdy_i) begin
if (packet_size_count < {5'b0, packet_size_i[7:0], 3'b100})
begin
wr_data_o <= {18'd0, packet_size_count[13:0]}; // data
packet_size_count <= packet_size_count + 1'b1;
end
else
begin
wr_data_o <= 32'h870AD370; // data
packet_size_count <= packet_size_count + 1'b1;
end
if (packet_size_count == {5'b0, packet_size_i[7:0], 3'b100}) begin // switch controls packet size
state <= state + 1'b1;
wr_flags_o <= 4'b0010; // 4 bytes, EOF
end
end
where crc code is inserted instead of the 0xc everything is ok, and I can receive Udp packets.
Maybe I have overlooked something but I can not see that this core is generating
Ethernet Frame Check Sequence (FCS).
Best regards,
Bojan Kuljic