cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
namabo
Adventurer
Adventurer
790 Views
Registered: ‎07-27-2018

1 cycle latency with axi stream and axilite in the same HLS IP

Jump to solution

Hi everybody,

I'm working with an IP in which I have 1 axi stream input and 2 axi stream outputs, with 2 axilite register.

The functioning is quite simple, I have a condition on the input stream that depends on the value passed with the 2 axilite ports.

The axilite port are read only on the last chunk of the input stream in order to get the right behaviour during one packet processing.

If the condition is asserted I route the packet to one of the out axi stream port, otherwise I route it on the other.

I would like to reach 1 cycle latency in order to process a stream with one chunk per clock, but I can't figure out which pragma or approach fits.

Following the code:

void axil_st(
        hls::stream<AXISTREAM> &in_s,
        hls::stream<AXISTREAM> &out_a,
        hls::stream<AXISTREAM> &out_b,
        volatile uint32_t *slv_reg_f1,
        volatile uint32_t *slv_reg_f2
){
        #pragma HLS INTERFACE axis port=in_a
        #pragma HLS INTERFACE axis port=out_b
        #pragma HLS INTERFACE s_axilite register port=slv_reg_f1 bundle=BUS_A
        #pragma HLS INTERFACE s_axilite register port=slv_reg_f2 bundle=BUS_A
        #pragma HLS INTERFACE port=return
        AXISTREAM in_chunk;
        AXISTREAM out_chunk;

        static uint32_t slv_reg_f1_int = 0;
        static uint32_t slv_reg_f2_int  = 0x8000;

        #pragma HLS dataflow
        in_chunk = in_s.read();

        if(
                in_chunk.user.range(15,0) <= slv_reg_f1_int &&
                slv_reg_f2_int == 0x00000002
        )
        {
                out_b.write(in_chunk);
        } else
    {
                out_a.write(in_chunk);
        }

        if(in_chunk.last == 1){
                slv_reg_f1_int = *slv_reg_f1;
                slv_reg_f2_int = *slv_reg_f2;
        }
}

What I'm experimenting is that HLS synthesize an IP which spends 1 clock to read the two axilite reg  and one clock to perform in/out on axi streams.

How can reach 1 cycle processing?

I have tried pipeline pragma but I get latency 3 and interval 1 clk cycle.

Hope someone has same experience on that who can help me.

Thank you

Tags (4)
0 Kudos
1 Solution

Accepted Solutions
florentw
Moderator
Moderator
576 Views
Registered: ‎11-09-2015

HI @namabo 

I do not know a way of cosimulating an infinite loop.

About the pipeline pragma, can you make sure it is apply to the loop. I believe the pragma should be inside the loop 


Florent
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**

View solution in original post

4 Replies
florentw
Moderator
Moderator
645 Views
Registered: ‎11-09-2015

HI @namabo 

You might want to do a loop.

With your current code you will execute the code once each time, thus only one read.

And because you have 1 read + 1 processing + 1 write, you get 3 cycle latency.


Florent
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**
0 Kudos
namabo
Adventurer
Adventurer
628 Views
Registered: ‎07-27-2018

Hi @florentw 

I changed the code as you suggested

void axil_st(
        hls::stream<AXISTREAM> &in_s,
        hls::stream<AXISTREAM> &out_a,
        hls::stream<AXISTREAM> &out_b,
        volatile uint32_t *slv_reg_f1,
        volatile uint32_t *slv_reg_f2
){
        #pragma HLS INTERFACE axis port=in_a
        #pragma HLS INTERFACE axis port=out_a
        #pragma HLS INTERFACE axis port=out_b
        #pragma HLS INTERFACE s_axilite register port=slv_reg_f1 bundle=BUS_A
        #pragma HLS INTERFACE s_axilite register port=slv_reg_f2 bundle=BUS_A
        #pragma HLS INTERFACE port=return
        AXISTREAM in_chunk;
        AXISTREAM out_chunk;

        uint32_t slv_reg_f1_int = 0;
        uint32_t slv_reg_f2_int  = 0x8000;

        #pragma HLS pipeline II=1 rewind
        while(1){
                in_chunk = in_s.read();

                if(
                        in_chunk.user.range(15,0) <= slv_reg_f1_int &&
                        slv_reg_f2_int == 0x00000002
                )
                {
                        out_b.write(in_chunk);
                } else
                {
                        out_a.write(in_chunk);
                }

                if(in_chunk.last == 1){
                        slv_reg_f1_int = *slv_reg_f1;
                        slv_reg_f2_int = *slv_reg_f2;
                }

        }
}

Unfortunately this implementation hasn't 1-cycle processing performance.

I put the IP in a vivado project and I connected a custom axi stream generator and and axi master to it.

As you can see the IP dut, stall for 2 clock cycle, and it drives tready law for 2 clock cycle (1read and 1 processing?!).

design.png

 

simu.png

 

I made a Vivado project because inserting a while, I'm not able to perform cosimulation, (I use a break keyword in a csim...)

Is there a way to cosimulate a block with an infinite loop?

However the while approach does not solve the problem.

Thank you .

0 Kudos
florentw
Moderator
Moderator
577 Views
Registered: ‎11-09-2015

HI @namabo 

I do not know a way of cosimulating an infinite loop.

About the pipeline pragma, can you make sure it is apply to the loop. I believe the pragma should be inside the loop 


Florent
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**

View solution in original post

namabo
Adventurer
Adventurer
547 Views
Registered: ‎07-27-2018
Great!
Putting pragma pipeline inside thw while block it reach 1cycle processing performance

Really thank you @florentw
0 Kudos