cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Explorer
Explorer
707 Views
Registered: ‎01-18-2019

AXI4 - why use separate address wires for burst transactions?

Hi,

I am new to AXI. Looks great, I like it, but there are situations where I find it an overkill. In AXI4 if a burst transaction is used, the address wires are used only a very little percentage of the time. Why use separate set of wires for data and address? Why not use a stream and send commands and parameters (e.g. length, burst type) first and then payload (in case of a write transaction)? Like most ICs config interface. The write response could also arrive thru the same channel where normally the payload would arrive after a read command. Sure makeable, but is there maybe a widespread standard for this kind of memory (or register) access?

AXI-Lite still uses separate address and data wires.

Thank you

Miklos

0 Kudos
Reply
10 Replies
689 Views
Registered: ‎07-23-2019

 

Oh dear, the wheel has just been discovered... now you have published it, it cannot be patented as "means and technique of optimizing the wire count in burst transactions between elements in programmable logic or integrated circuits"

0 Kudos
Reply
Explorer
Explorer
630 Views
Registered: ‎01-18-2019


@archangel-lightworks wrote:

 

Oh dear, the wheel has just been discovered... now you have published it, it cannot be patented as "means and technique of optimizing the wire count in burst transactions between elements in programmable logic or integrated circuits"


Your reply has been most helpful. Of course there is nothing new in this (as I wrote "Like most ICs config interface.").  QuadSPI protocol does the same, but with serial lines. The question is, is there a standardized way to do something like the protocol at QuadSPI, but with parallel datalines, that is with a(n AXI) stream?

Thank you.

Miklos

0 Kudos
Reply
Scholar
Scholar
610 Views
Registered: ‎05-21-2015

@mbence76,

I share some of your AXI questions, but perhaps I might be able to offer some perspectives.

When I build an AXI design, or any bus design for that matter, my goal is typically throughput.  To that end, I want to achieve one beat of transfer on every clock with no down time.  It's not always possible to do.  (Xilinx's example designs often achieve between 16% and 50% utilization.)  Were the address and data lines shared, one beat per clock would no longer be possible.

Second, the address and data lines go to separate places.  If you had to split them apart later, if (address) then else if (data) then ..., that would cost more logic.

Third, I would note that I've seen a lot of shared data line structure when going off-chip, but rarely within a chip.  This probably has something to do with the bidirectional I/O drivers only driving I/Os and not internal wires.

Finally, most of the AXI work I do invites comparisons with the Wishbone bus, and here I can discuss differences.  Wishbone is similar to a very stripped down AXI-Lite, with some key differences: 1) There's no backpressure--acknowledgments are always accepted.  2) Wishbone read and write address channels are shared.  A single wire, WE a.k.a. write-enable, determines which type of transaction is being made.  3) The entire wishbone synchronization is based upon a request signal (Wishbone STB signal is roughly equal to an AXI *VALID signal) and a stall signal (roughly equivalent to a !*READY signal).  There's only one channel (bus request)--not five (write address, write data, write response, read address, read data).  It's simpler and easier to work with.

I also find it telling that within most AXI cores, a special bus protocol handler is required as a front end (see IPIF, or even the MIG controller) since the AXI protocol is really too complicated for most slaves to deal with.  Wishbone is not like this at all.

Dan

595 Views
Registered: ‎07-23-2019

 

@mbence76 , hopefully, you took the irony constructively. Is there a standardized way? Probably not, standards tend to be easily found on Google so you wouldn't need to ask. There is always a trade-off between general-purpose/ universal usability and optimization. you are asking for something that minimizes wires, so optimizes usage, that will create constrains on usability (otherwise, the optimized thing would have replaced the original, as Cooley-Tukey algorithm replaced the original calculation) then it would be purpose-specific or somehow limited and Xilinx wouldn't spend time/money on a non-general thing. As you say 'Quad SPI does the same', well, you can just develop your own for your purpose. Something optimized is a product advantage, that's the prize for your sweat drops and sleepless nights.

0 Kudos
Reply
Explorer
Explorer
559 Views
Registered: ‎01-18-2019

Hi Dan,

>my goal is typically throughput. 

I was primary thinking of an alternative to AXI-Lite, for configuration data and to fill up a memory once a while,  but with much wires.

>Second, the address and data lines go to separate places.  If you had to split them apart later, if (address) then else if (data) then ..., that would cost more logic.

I was thinking of this too, however you need a state machine anyway, and I was not sure how much of an extra logic would be needed.

> I've seen a lot of shared data line structure when going off-chip, but rarely within a chip. 

Same with me. My question was also about the reason of this.

>This probably has something to do with the bidirectional I/O drivers only driving I/Os and not internal wires.

Hmm, I am not talking about bidirectional wires, but simply one command word, one address word, and then the "payload" data.  Sort of data frames. Yes, you need an oppositely directed stream too to send ack / error info.

Thank you for your help.

Miklos

 


0 Kudos
Reply
Explorer
Explorer
554 Views
Registered: ‎01-18-2019

Hi @archangel-lightworks ,

>@mbence76 , hopefully, you took the irony constructively.

No problem. I appreciate every reply. :)

 

>Is there a standardized way? Probably not, standards tend to be easily found on Google so you wouldn't need to ask.

What search phrase am I to type in to Google to find a standard that does what I described? I tried and did not get anything. But it did not mean it did not exist.

 

>There is always a trade-off between general-purpose/ universal usability and optimization. you are asking for something that minimizes wires, so optimizes usage, that will create constrains on usability

There is AXI, AXI-Lite, AXI-Stream, and I was hoping there is 4th one, built on AXI-Stream, or an extension of it,  that uses "data frames" made up of fields such as command, length, starting address, payload data (for write). 'Constraints on usability' are constrains only if you want to use it for something it is not intended for.

 

>As you say 'Quad SPI does the same', well, you can just develop your own for your purpose. Something optimized is a product advantage, that's the prize for your sweat drops and sleepless nights.

I will see. Might gonna do that.

Thank you.

Miklos

 


 

0 Kudos
Reply
Participant
Participant
543 Views
Registered: ‎07-03-2013

Not  a direct answer to your question if there is a standard for what your describing;  but sometimes for architectural reasons I end up using Ethernet as the protocol to communicate between blocks (both inside a single FPGA, and intra FPGA) and use a custom header after the Ethertype to describe the command type, address, and access rules.

This is similar to how memory mapped AXI4 streams work, but it can be usefull sometimes as you dont have to configure stuff out of band (e.g. with AXI4-lite registers); it works natively with PMA/PCS/MAC interfaces for chip to chip links; and it's easy to bring out a spare Ethernet port to a switch and use wireshark to sniff all the transactions.   Obviously this is much more resources than AXI4 but it depends on what your trying to solve.   

If you are trying to simply reduce resources for a low speed register interface, and you want a standard, then Id recommend you look at ARMs AHB/APB protocols, or even IBM's DCR standards.   Each of these uses far less resources than AXI4. 

Digital Design Golden Rule: If its not tested - its broken.
0 Kudos
Reply
Scholar
Scholar
504 Views
Registered: ‎05-21-2015

@mbence76,


>Second, the address and data lines go to separate places.  If you had to split them apart later, if (address) then else if (data) then ..., that would cost more logic.

I was thinking of this too, however you need a state machine anyway, and I was not sure how much of an extra logic would be needed.

I've seen a lot of broken state machine processing for both AXI and AXI-lite.  In particular, the last "state machine" implementation I examined that processed both read and write channels would freeze if there were ever a request to both read and write at the same time.  (You can find more information about this in the Youtube video of the presentation on Formally Verifying AXI components, recorded during at Orconf 2019.)  Fixing that bug would bring you back to a state machine that could only handle one beat every two clocks, rather than one beat per clock.

I personally find it easier to process AXI requests from a pipeline standpoint rather than a state machine standpoint--although a strong argument could be made that the two are somewhat equivalent.  If you are curious at all as to how 100% beat utilization can be accomplished, you might find this article on how to build the perfect AXI slave valuable in explaining how to do it.  If you browse nearby, you might also find an article on how to achieve 100% throughput with AXI-lite.  (Xilinx's comment on AXI-lite, paraphrased: AXI-lite isn't meant for performance, and that if you want performance you should be using AXI (full).)

My whole point being: you can't get 100% beat utilization if you share address and data lines--but I think I've said that before.

Dan

Explorer
Explorer
464 Views
Registered: ‎01-18-2019

Hello Dan,

I understand you are after 100% beat utilization, but I am not. :)  At least not right now in this project. Your blog is full of very useful info, thank you for the hard work put into it, the bottleneck of leveraging this info is unfortunatelly the lack of brain capacity on my side. I am not at that level yet.

I cannot imagine an other way to handle an AXI interface other than a state machine. I am not sure what you mean by "pipeline standpoint", even if they are somewhat equivalent. I guess I will create a custom protocol for my purposes for now and leave AXI for later. The reason is not the resource demand by AXI, but another problem that I am struggling with in my other post   So far without reply - I must have formulated my question wrong. :(

Have a nice weekend!

Miklos

0 Kudos
Reply
Scholar
Scholar
452 Views
Registered: ‎05-21-2015

@mbence76,

I see your other post, but since I don't use block design I'm not really qualified to answer it.  I know there's a way to group all of the AXI signals in a bus together and just connect the bus components, but I don't know how to make that happen from a user design.  I've seen posts describing how to do it, but never having done it myself I don't remember them.  Perhaps making this comment here might draw some attention to your question.

Dan

0 Kudos
Reply