09-27-2017 09:51 AM
In short: what is the best way to assemble a packet into a struct from a stream of data?
Goal: take in a stream of 4-bit nibbles and assemble it into 16-byte packets, and map to a c-struct.
On problem is there is no succinct way to take a buffer and map to a struct without intimate knowledge of compiler, and impossible due to memory data boundaries (i.e. many fields cross 8-bit and 32-bit boundaries).
I implemented by buffering up a nibble at a time into a 16-byte buffer. Then, when this buffer is filled up, I fill out an instance of the c-struct and write it into an hls::stream object. The function prototype is:
void packetize(ap_uint<4> din, ap_uint<1> syncin, hls::stream<Packet> &pout);
I implemented a function that takes a 16-byte buffer and copies each field from the buffer using ap_uint bit selections to the c-struct fields. This allows the compiler to control the field layout of the struct. However, this always products a minimum II of 2 (for the top level packetize function). I've made sure to pack the struct, completely partition the buffer, implement buffering and packetizing in separate functions and using dataflow for the packet function, using pipeline for the internal functions, etc.
I know this would be pretty simple to do in Verilog and product a 2-clock cycle latency module with an II of 1, which is required since I get a new 4-bit nibble every clock cycle. However, this problem is fundamental to understand how HLS works and how I can take advantage of writing protocol processing pipelines very quickly. Any ideas?
I've attached a code sample of what I have been trying. This is one of many different configurations I've tried.
09-27-2017 02:57 PM
09-29-2017 09:11 AM
As an experiment, I implemented the packetize2 function that eliminates the use of an intermediate buffer. This has the side effect of very tedious implementation of placing each nibble received into a specific 4-bit space of the packet. However, this produced a latency of 0, and II of 1.
I feel that this subverts some of my goals of using HLS. It turns two elegant functions: one buffering, the other converting to a struct (which is admittedly not as elegant) into one pretty ugly function. I wish there were a way to define a struct in c and define the exact bit mapping from a bit vector (not needing to worry about field/byte alignment). While this doesn't make sense in software, this would really allow for succinct implementations of packet processing in HLS. I could define the packet structure, buffer up the packets into a buffer and somehow 'cast' to the struct. Then, its very symbolic when using the packet and no bit ranges show up in my code other than the initial definition of the struct and mapping.
Any thoughts on better ways to do this to get closer to my goal?
10-01-2017 09:17 AM
@peteralieber I'd try making a copy of buffer before calling packet_out and/or add dataflow to packetize function.
10-02-2017 08:30 AM
@muzaffer I did add dataflow very early on. This did not seem to help. I think the issue was actually in the call to write() on the pout stream. The logic to determine when to write is dependent on when the last byte of the packet is filled out. The only way I found to fix this was to add a local variable, 'done'. I set this to true when I fill out the last byte. Then, in a separate context, I check that variable and write the packet to the stream if it is true. This also seems to subvert the purpose of HLS, because turning pipeline on seems like it should see that it can delay the write by a clock cycle and pipeline the packet, but I had to explicitly add the done flag rather than using the counter to determine when the packet is done being accumulated.
The attached file is how I got it down to II=1. I still think Vivado should be able to do it when I get rid of the 'done' variable and just put:
on line 146. However, it could be that the analysis gets confused because the packet.write() is in the same switch statement as the packet construction. I'll try not using the done variable, but put the pout.write() in its own if statement using the count variable as a condition.