UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Participant xpaillard
Participant
8,332 Views
Registered: ‎04-11-2013

Chain | Cascade module HLS

Dear,

 

I would like to chain 3 modules generated by Vivado HLS:

 

moduleA => moduleB => moduleC

 

Each module write/read a DDR3 memory. the memory and the three modules are all connected by AXI memory mapped.

 

ModuleC can execute his work only when moduleB has finish and moduleA must finish before moduleB start (sort of cascaede or chain).

 

What is the bes solution to develop this process :

 

- Write a big C code (all in one HLS module)

- Write 3 C code and chain 3 HLS module

- Another methodology ?

 

I have seen in the directive : ap_ctrl_chain. This directive add an input ap_continue but i don't understand how it works.

All these protocols are encapsuled in AXILite (and data aka memory access over AXI memory mapped).

 

For example, moduleB must do a sort of polling on the ap_continue input of the moduleA, but OVER AXI ? Therefore, there is  a risk of saturation of the AXI bus (?)

 

Is it possible to add personnal control input/output that are not embedded in an AXI interface, and smply connects moduleA_finish => moduleB_start ?

 

thank you for your further answer, it's not really clear for me and the documentation is weak about this subject.

 

Xavier

0 Kudos
7 Replies
Highlighted
Teacher muzaffer
Teacher
8,317 Views
Registered: ‎03-31-2012

Re: Chain | Cascade module HLS

Write 3 functions, test them individually and let them write their outputs to memory and have the next read its input from memory. Once this works, if you find the performance too slow because of all the memory accesses, you can start optimizing but first get the functionality correct.

In terms of valid signalling, you can add signals to each module to tell it that its input data is ready and you can arrange these signals from the processors or a higher level RTL module.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Moderator
Moderator
8,308 Views
Registered: ‎04-17-2011

Re: Chain | Cascade module HLS

ap_ctrl_chain is intended for design where multiple blocks are chained together to process a stream of data so it may not help you. Also, prevent PIPELINE directive.
Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
Participant xpaillard
Participant
8,307 Views
Registered: ‎04-11-2013

Re: Chain | Cascade module HLS

Hi,

 

Thank you for your answers.

 

Effectively, ap_ctrl_chain will not be useful for my flow.

 

I will either add control signals or user some synchro. mechanism like mutex bloc.

 

I can start my three modules in same time, and block or start module with some mutex process (like concurrent programmation).

 

In the Vivado library there are AXI-mutex module. Each module can check the value of the respective mutex and start the execution when the mutex is free.

 

With this methodology i provide a mechanism of synchronism and exclusiv ressource access : my three modules write/read in the DDR3 at potentially same address.

 

My main goal is to add the minimal control c-code, because each module is a part of a genetic algorithm developped by software-developer-guy who don't need to check/add/specify control code (to hardware implementation).

 

Xavier

0 Kudos
Teacher muzaffer
Teacher
8,305 Views
Registered: ‎03-31-2012

Re: Chain | Cascade module HLS

I think your description best fits to a fifo rather than a shared resource. When one module produces data it can put it into a FIFO for the next module to read.

Check out the following constructs:

set_directive_interface -mode ap_fifo ...
set_directive_resource -core AXI4Stream ...
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Participant xpaillard
Participant
8,304 Views
Registered: ‎04-11-2013

Re: Chain | Cascade module HLS

Hi,

 

Interesting possibility ! I need to work with a lot of data stored (read and write) in DDR memory, i'm not sure that i arrive to push results from moduleA into a fifo to moduleB (like producer-consumer mechanism).

 

At the base, i prefer store all datas in DDR and just work with pointers to load datas into moduleA "do something" store results in DDR and start the moduleB "do something" etc.

 

I will think about this solution !

 

Thank you for your suggesitons.

 

 

 

 

 

 

0 Kudos
Teacher muzaffer
Teacher
8,301 Views
Registered: ‎03-31-2012

Re: Chain | Cascade module HLS

I am not sure if hls supports it but there is the concept of a virtual fifo where pointers are maintained The fabric but the data is in dram you can certainly use that
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Participant xpaillard
Participant
8,296 Views
Registered: ‎04-11-2013

Re: Chain | Cascade module HLS

Hi,

 

Many good ideas ! I don't arrive to choose the best way, so i will give you more informations about the functionnality of my design :

 

In the DDR I store a set of data : about 10'000 x 1'000 x 32 bits float datas (transferred by PCIe from a Linux host).

 

moduleA is an Genetic algorithm : it chooses about 30 lines of data stored in the DDR and set several parameters.

 

moduleB is a fuzzy algotirhm model : it reads the 30 previous selected lines from de DDR and do some calculus. It returns a result (a vector).

 

moduleC acquires the result and compute a fitness value. Several iterations are done between moduleB and moduleC.

 

After that, the moduleA re-selects 30 new lines and the model begins again.

 

In software point of view this is a "standard" genetic algorithm flow with the population, the generation and the fitness.

 

One of the goal of the project is to demonstrate that HLS can be used by pure software-developper ! So i need to reduce the C-code that control/sync the hardware.

 

I hope, it's a little bit more clear.

 

Thank you.

 

0 Kudos