cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
zhangzq71
Visitor
Visitor
2,092 Views
Registered: ‎12-02-2019

what's wrong with this logic?

Jump to solution

Here is my program, 

	always @(posedge M_AXI_ACLK)
	begin
		if (M_AXI_ARESETN == 0 || LOCAL_RESET == 1'b1)
		begin
			start_write_addr <= 1'b0;
			fifo_read_en <= 1'b0;

			wr_ddr_en <= 1'b0;
			state_sm2ddr[2:0] <= S_WR_IDLE;

			run_cnt1 <= 0;
			run_cnt2 <= 0;
		end
		else
		begin
			case (state_sm2ddr)
				S_WR_IDLE:
				if (SM2DDR_FIFO_WR_COUNT >= C_M_AXI_BURST_LEN)
				begin
					if (~burst_write_active && ~start_write_addr)
					begin
						run_cnt2 <= run_cnt2 + 1'b1;
						fifo_read_en <= 1'b1;
						state_sm2ddr[2:0] <= S_WR_WAIT_FIFO;
					end
					else
					begin
						fifo_read_en <= 1'b0;
						state_sm2ddr[2:0] <= S_WR_IDLE;
					end
				end
				else
				begin
					fifo_read_en <= 1'b0;
					state_sm2ddr[2:0] <= S_WR_IDLE;
				end

				S_WR_WAIT_FIFO:
                begin
					fifo_read_en <= 1'b0;
					start_write_addr <= 1'b1;
					state_sm2ddr[2:0] <= S_WR_WAIT_WLAST;
                end

				S_WR_WAIT_WLAST:
                begin
					fifo_read_en <= 1'b0;
					start_write_addr <= 1'b0;

					if (axi_wlast)
					begin
						state_sm2ddr[2:0] <= S_WR_NULL_STEP;
					end
					else
					begin
						state_sm2ddr[2:0] <= S_WR_WAIT_WLAST;
					end
                end

				S_WR_NULL_STEP:
				begin
					run_cnt1 <= run_cnt1 + 1'b1;
					fifo_read_en <= 1'b0;
					start_write_addr <= 1'b0;

					state_sm2ddr[2:0] <= S_WR_IDLE;
				end

				default:
				begin
					state_sm2ddr[2:0] <= S_WR_IDLE;
				end
			endcase
		end
	end

As you can see run_cnt1 + 1 must occurs after run_cnt2 + 1, so run_cnt1 must be equal run_cnt2, but in my test run_cnt1 is greater than run_cnt2, what is wrong?

0 Kudos
1 Solution

Accepted Solutions
hgleamon1
Teacher
Teacher
1,578 Views
Registered: ‎11-14-2011

You may get this type of issue when you don't handle different clock domains correctly.

It is clear that the state machine logic is synchronous to the m_axi_aclk but what clock domain is the SM2DDR_WR_COUNT signal or the burst_write_active signal belonging to?

One often uses FIFOs when crossing clock domains but you have to ensure that the control logic for each "half" of the FIFO remains under its own clock domain or is otherwise handled in a correct CDC manner.

----------
"That which we must learn to do, we learn by doing." - Aristotle

View solution in original post

0 Kudos
36 Replies
hgleamon1
Teacher
Teacher
1,897 Views
Registered: ‎11-14-2011

Have you simulated this?

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
zhangzq71
Visitor
Visitor
1,842 Views
Registered: ‎12-02-2019

I didn't simulate because it is part of a PCIe card project, it is not easy make the whole project in simulation. Do you think that code may have some error just after a glance?

0 Kudos
zhangzq71
Visitor
Visitor
1,835 Views
Registered: ‎12-02-2019

Sorry, I forgot to tell you that run_cnt1 is not always greater than run_cnt2, most of the time they are equal, this error occurs after a long time running, but how long is not fixed. That is why I did not  simulate it.

0 Kudos
zhangzq71
Visitor
Visitor
1,831 Views
Registered: ‎12-02-2019

Will it caused by timing constraints? But my project did not have timing constrains error after I just add the clock constrain to my module.

0 Kudos
hgleamon1
Teacher
Teacher
1,771 Views
Registered: ‎11-14-2011

It's possible that you have some timing issues but if you have correct clock constraints I don't think that is the root cause of the problem.

Do you have any synthesis or implementation warnings for this logic? How are you confirming the values of the signals?

If it is possible to take just this block of code and simulate it with external stimuli you should try to do that.

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
zhangzq71
Visitor
Visitor
1,733 Views
Registered: ‎12-02-2019

Synthesis or implementation must always has many many warnning but we always ignore those except the critical warnning. The logic of this code is very simple and run_cnt1 increasement must depends on run_cnt2. And I found another more intresting error, I modified the code to like this. As you can see I just have a increasement on ecah state changed, after this modification this program seems confused, the result is 

run_cnt1=0x10

run_cnt2=0x11

run_cnt3=0x10

run_cnt4=0x10

that mean the state changes to S_WR_WAIT_FIFO accidently. Actually what this code do is read data from fifo and write to DDR2 memory through AXI bus. if the code like this it won't work, but it is very interesting that if I remove all the lines that related to run_cnt* it works. 

This is a PCIe card project, I can read those necessary value through a AXI slave that connected to XDMA ip.

	always @(posedge M_AXI_ACLK)
	begin
		if (M_AXI_ARESETN == 0 || LOCAL_RESET == 1'b1)
		begin
			// axi_awvalid <= 1'b0;
            // axi_wvalid <= 1'b0;

			start_write_addr <= 1'b0;
			fifo_read_en <= 1'b0;

			wr_ddr_en <= 1'b0;
			state_sm2ddr <= S_WR_IDLE;

			 run_cnt1 <= 0;
			 run_cnt2 <= 0;
			 run_cnt3 <= 0;
			 run_cnt2 <= 0;
		end
		else
		begin
			case (state_sm2ddr)
				S_WR_IDLE:
				if (SM2DDR_FIFO_WR_COUNT >= C_M_AXI_BURST_LEN)
				begin
					if (~burst_write_active && ~start_write_addr)
					begin
						 run_cnt1 <= run_cnt1 + 1'b1;
						fifo_read_en <= 1'b1;
						state_sm2ddr <= S_WR_WAIT_FIFO;
					end
					else
					begin
						fifo_read_en <= 1'b0;
						state_sm2ddr <= S_WR_IDLE;
					end
				end
				else
				begin
					fifo_read_en <= 1'b0;
					state_sm2ddr <= S_WR_IDLE;
				end

				S_WR_WAIT_FIFO:
                begin
					 run_cnt2 <= run_cnt2 + 1'b1;

					fifo_read_en <= 1'b0;
					start_write_addr <= 1'b1;
					state_sm2ddr <= S_WR_WAIT_WLAST;
                end

				S_WR_WAIT_WLAST:
                begin
					
					fifo_read_en <= 1'b0;
					start_write_addr <= 1'b0;

					if (axi_wlast)
					begin
						 run_cnt3 <= run_cnt3 + 1'b1;
						state_sm2ddr <= S_WR_NULL_STEP;
					end
					else
					begin
						state_sm2ddr <= S_WR_WAIT_WLAST;
					end
                end

				S_WR_NULL_STEP:
				begin
					 run_cnt4 <= run_cnt4 + 1'b1;
					fifo_read_en <= 1'b0;
					start_write_addr <= 1'b0;

					state_sm2ddr <= S_WR_IDLE;
				end

				default:
				begin
					state_sm2ddr <= S_WR_IDLE;
				end
			endcase
		end
	end
0 Kudos
hgleamon1
Teacher
Teacher
1,715 Views
Registered: ‎11-14-2011

I can't see anything obvious and without seeing all of the signal declarations or how other components interact with this component it will be hard to say.

I suggest using chipscope to monitor the relevant signals, especially the state machine, to see when state transitions occur and when your counters are being updated.

How do you ensure that when you read the values you are actually reading the latest value of the signals?

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
zhangzq71
Visitor
Visitor
1,709 Views
Registered: ‎12-02-2019

I also read the others values such as read/write address, fifo count etc, they are all in excepted value, so I believed the value got from AXI slave are correct.

I am new to Vivado, and try to use chipscope as you suggested.

Thank you very much!

BTW, what is the possible timing issue will cause this problem?

0 Kudos
dgisselq
Scholar
Scholar
1,688 Views
Registered: ‎05-21-2015

> I didn't simulate because ...

I think I just found your problem.

Consider using formal methods on just this piece of code, rather than the entire design.  While you might find the problem with chipscope, formal methods will get you there a lot faster--minutes vs days.

Dan

0 Kudos
zhangzq71
Visitor
Visitor
1,648 Views
Registered: ‎12-02-2019

Dan,

I am sorry that I don't really understand what is formal method? Could you show me where is the formal method in my code? 

0 Kudos
hgleamon1
Teacher
Teacher
1,579 Views
Registered: ‎11-14-2011

You may get this type of issue when you don't handle different clock domains correctly.

It is clear that the state machine logic is synchronous to the m_axi_aclk but what clock domain is the SM2DDR_WR_COUNT signal or the burst_write_active signal belonging to?

One often uses FIFOs when crossing clock domains but you have to ensure that the control logic for each "half" of the FIFO remains under its own clock domain or is otherwise handled in a correct CDC manner.

----------
"That which we must learn to do, we learn by doing." - Aristotle

View solution in original post

0 Kudos
dgisselq
Scholar
Scholar
1,509 Views
Registered: ‎05-21-2015

@zhangzq71,

Formal methods work based upon properties that a designer either asserts or assumes from within their code.  Since your code doesn't have any properties within it, I can tell you haven't used formal methods yet.

Relax, they aren't that bad.  In fact, they're often really fast and easy to work with.  I write a lot about them on my blog.  You can read about my first experiences with formal properties here, or look over the slides that I use to teach a course in formal methods here.  In both cases, I like to use SymbiYosys as the tool that drives the formal proof.  There's a free version on-line that you can download that handles Verilog without any problems plus the immediate assertion subset of the System Verilog Assertion (SVA) language.  (That's the subset that "feels" most like the Verilog you've already been writing.)

Once you have that, you might wish to relate your two counters together, and come up with an assertion describing that relationship.  Then let the formal tools guide and direct you straight to the bug.

Dan

0 Kudos
dgisselq
Scholar
Scholar
1,501 Views
Registered: ‎05-21-2015

@zhangzq71,

Looking over your code in more detail, I'll note that ...

  1. You haven't provided enough of it for me to repeat whatever bug you are struggling with.  I'd ideally like a copy of the full module this is taken from, to include I/O declarations, in a form that can pass synthesis without syntax errors.
  2. More significantly, I would note that you are checking for an axi_wlast signal without also checking for AXI_WVALID && AXI_WREADY.  This is a clear bug in your design, as AXI_WLAST is a don't care if !AXI_WVALID, and you don't want to transition on AXI_WLAST unless AXI_WREADY as well.
  3. You should be aware that Xilinx's AXI demonstrators had bugs where they checked for WLAST without also checking for WVALID && WREADY.  (This was one of several bugs recently found in their demonstration cores using formal methods.)

Dan

0 Kudos
zhangzq71
Visitor
Visitor
1,485 Views
Registered: ‎12-02-2019

You are right, I think it is caused by CDC. Because I changed the clock of sm_clk_100M to the same freq as s00_axi_aclk and m00_axi_aclk which is 125MHz, then everything is ok, run_cnt* all have the same and correct value. Here is a diagram of this project, as you can see sm_clk_100M, s00_axi_aclk and m00_axi_aclk are connected together now. If connect sm_clk_100M to the clk_out2 of clk_wiz_0 which output 100MHz, then I got the same error as before.

How can I fixed the problem if I really want sm_clk_100M is 100Mhz? I attach the code, could you help me to check what is the problem? What I want to do in this project is,

1. write data to ddr3 in a host(PC), it is about 8M bytes data

2. the host send a command with the number of data to this PCIe card through the AXI slave

3. the PCIe card get data from ddr3 and put them to a ddr2sm_fifo

4. at the same time there is a module to get data from that fifo when it is not empty

5. process the data, but it does not actually process now, just give the result the same value, then write this result to sm2ddr_fifo

6. get data from sm2ddr_fifo when it's elements more then 16 because I set axi burst len is 16, then write data back to ddr3

7. the host get result from ddr3

Is there a better solution to do that? If the current solution is ok how to fix the problem if I want sm_clk_100M is 100Mhz?

Thank you very much! 

1.PNG

0 Kudos
zhangzq71
Visitor
Visitor
1,483 Views
Registered: ‎12-02-2019

I've not read your article in detail, but I think formal method seems like the unit test method in software development, is that right?

0 Kudos
zhangzq71
Visitor
Visitor
1,482 Views
Registered: ‎12-02-2019

I attach the code here, but I don't think there is something error in the logic, maybe that problem is caused by timing issue. Thank you very much for paying much concern on my problem?

0 Kudos
hgleamon1
Teacher
Teacher
1,432 Views
Registered: ‎11-14-2011

I'm not a Verilog guy so it's a little difficult for me to follow the code you have attached. I think we have agreed that there is some strange CDC with the connections you are making but I just can't see it.

There is a differential clock that goes to the clk_wiz, the output of which goes to the DDR MIG block. What is the frequency of the clk_out1? What is the frequency of the MIG ui_clk output?

What is the purpose of the Processor Reset Block when no external reset is routed there, and no output reset is routed to the MIG block? Why doesn't the PCIe block reset output go to the Reset block?

What is the frequency of the PCIe block output clock? Is this the 125MHz clock?

I admit to being very confused by your clocking and reset strategy - it really isn't clear to me at all.

If it all works nicely at 125Mhz, why is it important to have a 100MHz clock in the system? 

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
zhangzq71
Visitor
Visitor
1,421 Views
Registered: ‎12-02-2019

The clk_out1 is 200MHz, and MIG ui_clk is 400MHz.

"What is the purpose of the Processor Reset Block when no external reset is routed there", Sorry, I don't understand what is the Processor Reset Block, I thought it is the SMModule, because I want to reset it by command sent from host PC.

"Why doesn't the PCIe block reset output go to the Reset block?", Do you mean that orange line in the following diagram?

1.jpg

"What is the frequency of the PCIe block output clock? Is this the 125MHz clock?", Yes, it is 125MHz.

"If it all works nicely at 125Mhz, why is it important to have a 100MHz clock in the system? ", actually data process will be performed in an external chipset that it's clock must be lower than 100MHz.

Please tell me what information do you want?

0 Kudos
hgleamon1
Teacher
Teacher
1,393 Views
Registered: ‎11-14-2011

I was just trying to get an understanding of how you are coordinating the resets and clock structure of your system. It seems a bit random.

All I can think of right now, as the issue appears in your custom SMPortal module, is that you are reading signals (probably from the FIFOs) from the 125 MHz clock domain in the 100 MHz clock domain without correctly synchronising them.

I am suspicious of this piece of code here:

begin
  case (state_sm2ddr[1:0])
    S_WR_IDLE:
      if (SM2DDR_FIFO_WR_COUNT >= C_M_AXI_BURST_LEN)

from your SMPortal_v1_0_M00_AXI.v file.

The input port SM2DDR_FIFO_WR_COUNT comes from the SMPortal_v1_0.v module and the afifo_128W_256L sm_to_ddr_buf FIFO instantiation. I think the write count is in the Write clock domain which is the sm_clk_100M.

You are reading the input in the M_AXI_ACLK domain without correctly crossing domains.

----------
"That which we must learn to do, we learn by doing." - Aristotle
dgisselq
Scholar
Scholar
1,384 Views
Registered: ‎05-21-2015

@zhangzq71,

So, I took a look at the slave core you sent.  Unfortunately, you copied from Xilinx's IP using the IP packager demonstration code.  This code has numerous bugs within it.  The following traces are a non-exhaustive list of some of the bugs found within this code.  (I didn't have time to be exhaustive this morning.)

  1. The code may return the wrong AXI ID to any transaction.  The diagram below shows this problem with the ARID->RID signal, but the problem exists for AWID->BID as well.  In the trace below, the master makes a request for ARID=1, but receives a response with RID==0.arid.png
  2. It also uses a broken method for determining the number of least-significant address bits--those used for referencing a byte within a word.  As a result, burst writes to this core will get written to the wrong addresses.  In the trace below, the master requests a burst write to address 0x01000.  Since you are using a 128-bit word, the address of the second item written should be 0x01010, not 0x01020.awaddr.png
  3. The core doesn't ignore WLAST when AWVALID == 0, but might conclude a transaction early.wlast.png
  4. Read data might change--even before the correct read data is read out.  In the example below, the read channel is stalled (RVALID && !RREADY), and yet RDATA changes, suggesting that data was lost on the return.rdata.png

I'm aware of other bugs with this core, this was just what I found on a quick cursory glance.

While I realize these aren't the bugs you've been chasing, they'll likely show up soon enough.

Dan

zhangzq71
Visitor
Visitor
1,355 Views
Registered: ‎12-02-2019

Yes, that is code causes the issue where I've put many run_cnt* in it. And the write clock of sm_to_ddr_fifo is sm_clk_100M.

But I still don't understand what is the exact reason that causes this problem. How difficult to fix this problem?

0 Kudos
zhangzq71
Visitor
Visitor
1,350 Views
Registered: ‎12-02-2019

Thank you very much for such detail trace. Maybe the Xilinx's IP package code has some bugs, but I don't think my problem is caused by this AXI bus related code, I would rather think it is caused the CDC as hgleamon1 said in previous post. But I just don't know how to fix it.

I appreciate very much for doing such detail trace and help me to find the problem. 

0 Kudos
hgleamon1
Teacher
Teacher
1,344 Views
Registered: ‎11-14-2011

You need to synchronise the write count signal to the M_AXI_ACLK domain (please tell me you know how to synchronise from a lower frequency domain to a higher frequency domain) or you need to use another signal already in the correct clock domain to determine when to start your state machine.

What is the exact version of the FIFOs you are using?

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
zhangzq71
Visitor
Visitor
1,334 Views
Registered: ‎12-02-2019

I always only concern about the logic and don't care about the clock, so I really don't know how to synchronise the signal, could you show me a sample or tell me where can I find the sample about this?

And I don't understand what is the correct clock domain, sorry for wasting your time.

I am using Vivado V2019.2.1. The fifo was generated by FIFO Generator(13.2). I also attach the generated fifo code here. 

0 Kudos
hgleamon1
Teacher
Teacher
1,290 Views
Registered: ‎11-14-2011

Given you are using a synchronous device (the FPGA), to not care about the clocks, especially in a multi-clock domain system, is poor practice.

The errors you see are possibly because of metastability, glitching signals, failures in setup and hold timings for registers. It could be a lot of things.

Anyway, having read a little bit more about the FIFO you are using, rather than trying to synchronise the write count signal and use that, I suggest you use the read count signal instead. This is already synchronous to the read clock domain and should provide you with both the information you are trying to use and in a synchronous fashion.

Try that out and see how you get on.

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
dgisselq
Scholar
Scholar
1,258 Views
Registered: ‎05-21-2015

@zhangzq71,

Ok, now that I'm looking through your AXI master, you have a *SERIOUS* problem with your external LOCAL_RESET signal.  If you reset your interaction with the AXI bus, what resets the rest of the buses interactions?  Indeed, you are liable to bring the whole bus down by doing something like that.

The correct way to use an external reset that doesn't also reset the entire AXI bus is via a request and acknowledgment.

  1. A reset is requested.  The acknowledgment line come from the core is also low at this time.
  2. Any ongoing operations are finished, and then the core comes to an idle.  No additional AWVALIDs are set, but AWVALID cannot be cleared if its already set until after AWVALID && AWREADY.  The same applies for ARVALID.  The write channel is a touch different, as it needs to complete any burst write in process.  WSTRB should be set to zero on the first clock after (!WVALID || WREADY).
  3. Once the last BVALID and/or RVALID && RLAST have been received, the core should then raise the reset acknowledgment.
  4. When the external environment sees the reset acknowledgment, it should drop the reset request
  5. Once the reset request is dropped, the acknowledgment should be dropped.
  6. The core is then ready to use.

If you don't do it this way, you risk 1) locking up the bus, or 2) getting acknowledgments that you haven't requested.  While I haven't chased this down completely, this could also cause a counter mismatch.

Dan

0 Kudos
zhangzq71
Visitor
Visitor
1,242 Views
Registered: ‎12-02-2019

I modified the code to this,

			case (state_sm2ddr[1:0])
				S_WR_IDLE:
				if (SM2DDR_FIFO_RD_COUNT >= C_M_AXI_BURST_LEN)
				begin

1. when sm_clk_100M = 100MHz, it only can work at the first time after power up with the number of processing data are 0x80000 bytes, everything is ok at the first time, the run_cnt* are all correct, all the data are correct. But it had the same problem as before at the second time. I tried it 3 times, that means I power down/up the computer 3 time, all the same, the first time after powerup must be OK. Very interesting.

 2. when sm_clk_100M = 125MHz, everything is OK, I tried the number of processing data from 0x100 bytes to 0x100000 bytes are all OK, and no matter how many time I tried.

0 Kudos
zhangzq71
Visitor
Visitor
1,238 Views
Registered: ‎12-02-2019

You know the AXI very well. Actually what I want the LOCAL_RESET to do is to reset my processing logic only and the FIFO, because everytime there is any error I have to reboot the computer, I wanted to reset my logic to initial state when there is error, I didn't want to reset the AXI bus. I tried to reset by LOCAL_RESET but it seems doesn't as you said, because everytime there is an error I set the LOCAL_RESET to reset but the error still there. And I think there sould be no LOCAL_RESET in final product, it is only for development purpose. 

0 Kudos
dgisselq
Scholar
Scholar
1,222 Views
Registered: ‎05-21-2015

@zhangzq71,

My big claim to fame is being able to formally verify AXI components.  To that end, I've written a couple of AXI based cores that have had a means of externally resetting them--such as your LOCAL_RESET.  These cores are controlled from an AXI-Lite bus, and so a special AXI-Lite bus command stops the core from issuing any more AXI transactions, clears the FIFO, etc.  Once complete, the core comes to rest and it can then wait for a new command.

You can take a look at these cores if you'd like.  They're my AXI DMA's: S2MM and MM2S, both found within this repository.  If you do take a look, pay attention to the error and abort signals, since "abort" is what I call my externally generated reset process.

Dan