cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Participant
Participant
10,415 Views
Registered: ‎10-15-2015

error -110 reading 5 from AXI QSPI flash

My system has a Spansion s25fl512 used to configure a Virtex-7. The virtex-7 has a control interface provided by  separate Zynq, with an AXI Chip to Chip link between the two. I have a AXI Quad SPI device configured as Quad SPI, AXI lite, 8 bit with the startup macro enabled. I have followed the special constraint file to add a generated clock with the startup module which I have modified to allow for the prop delay of a chip in my SPI Clk routing.

Due to this delay I am aiming for quite a slow SPI clock, using ext_spi_clk 10 MHz, making a 5 MHz SPI Clock.

 

I am using Linux 4.4.

 

The first problem I have is that after the SPI flash has configured the Virtex- and I allow Linix to boot, the Flash memory does not reply correctly tot the read jedec code. I can work around this by repeating the read jaded code command, which always works 2nd time.

With this fix Linux starts and the device is visible in /proc/mtd

root@zx3-pm3-zynq7:~# cat /proc/mtd
dev:    size   erasesize  name
mtd0: 04000000 00040000 "v7-flash"
mtd1: 00500000 00020000 "nand-linux"
mtd2: 00100000 00020000 "nand-device-tree"
mtd3: 1fa00000 00020000 "nand-rootfs"

 

If I start to tests the flash it gets part way the then locks out.

root@zx3-pm3-zynq7:~# cp /dev/mtd0 from_mtd0h
random: nonblocking pool is initialized
m25p80 spi32765.0: SPI transfer timed out
m25p80 spi32765.0: SPI transfer timed out
m25p80 spi32765.0: error -110 reading 5
error -110 reading SR
cp: read error: Connection timed out

 

Once it has locked out subsequent attempts  fail immediately.

Using an ila to study this, once it has locked out the SPI command 5 (RDSR1) is sent to the SPI flash, but then the burst of SPI clks to get the reply are not sent.

 Looking in more detail, the TX FIFO empty Interrupt is not sent to the zynq.

The Quad SPI Core uses a counter tx_fifo_count to follow the number of words in the TX FIFO all in the AXI clock domain and combines it with the TX_FIFO_Empty signal from the TX_FIFO, which has to get from the ext_spi_clock domain to the AXI clock domain.

 When it is working (from boot) the counter is reset to 3F and increments to 0 while there is 1 word in the FIFO and decrements back to 0 at the end of the last SPI transfer. The core detects tx_fifo_count=0, TX_FIFO_EMPTY and end of SPI transfer to send TX FIFO empty interrupt.

After an attempt at reading the SPI memory, with e.g  5238784 bytes read OK, the system is out of step. The tx_FIFO_count now idles at 0 , goes to 1 and decrements to 0 after the last spi Xfer, but since it is not 0 at the last spi xfer, the interrupt is not generated.

 

I don't know whether other people are seeing the same problem. These posts MAY be seeing the same problem, but the symptom are only similar not the same:

 

https://forums.xilinx.com/t5/Embedded-Linux/mmcblk1-error-110-transferring-data-sector/m-p/702006#M16236

https://forums.xilinx.com/t5/Embedded-Linux/nor-flash-write-error-with-zcu102-board/m-p/702906/highlight/true#M16259

I am building with advanced triggers on the logic analyse to try to catch the point where it gets out of step, though this is not easy.

 

I realise I have a large ratio between my AXI clock (100 MHz) and the ext_spi_clk (10 MHz). Also I am accessing via a AX Chip to Chip link which may change (slow) the register access timing. Otherwise I don't understand why what I am doing is unusual or wrong.

 

I am also trying a build with the AXI clock slowed to match the ext_spi_clk, making the AX infrastructure cross the clock domains.

I am also trying a standard SPI build.

 

 

 

 

 

0 Kudos
Reply
10 Replies
Participant
Participant
10,225 Views
Registered: ‎10-15-2015

I have screen dumps from the logic analyser showing how apparently the separate word counter in the top level and the state of the TX_FIFO_II seems to get out of step.

The tx_fifo_count starts at 3f when the count is 0 and works up/down as 0 are written and down as they are transferred to the SPI bus. At the end of the transfer the FIFO goes empty but the count does not drop to 0.

See attached

 

zoom_all.png

zoom_start.png

0 Kudos
Reply
Participant
Participant
10,225 Views
Registered: ‎10-15-2015

I have screen dumps from the logic analyser showing how apparently the separate word counter in the top level and the state of the TX_FIFO_II seems to get out of step.

The tx_fifo_count starts at 3f when the count is 0 and works up/down as 0 are written and down as they are transferred to the SPI bus. At the end of the transfer the FIFO goes empty but the count does not drop to 0.

See attached

 

zoom_all.png

zoom_start.png

 

zoom_end.png

0 Kudos
Reply
Participant
Participant
10,224 Views
Registered: ‎10-15-2015

Full png files attached.

 

zoom_all.png
zoom_end.png
zoom_start.png
0 Kudos
Reply
Participant
Participant
10,222 Views
Registered: ‎10-15-2015

I have tried a version with the AXI clk=ext_spi_clk = 10 MHz, with the clock domain crossing in the AXI infrastructure.

This appear to work and may be my work around.

 

William.

0 Kudos
Reply
Participant
Participant
10,114 Views
Registered: ‎10-15-2015

I have investigated further and realise that the problem is caused by the TX FIFO going empty and filling again. In this case I have a slow CPU interface due to the AXI Chip2Chip link, but  a fairly slow ext_spi_clk. This is a little unusual, but the problem probably could also occur with a faster AXI interface (directly inside a Zynq PL) and a faster ext_spi_clk.

The problem only occurs with dual/quad SPI as for these the datasheet allows continuing more reads by pushing more 0s into the TX FIFO. For single transaction on standard SPI the datasheet calls for the master Inhibit to be used so these cases are not exercised.

 

Due to the cycle level timing, there is a window where the TX fifo goes empty at the end of a SPI transfer making the logic stall the SPI clock, but 2 ext_spi_clk cycles later a SPIXfer_done_int_d2 signal makes a read. If the FIFO has gone not empty in this window, the SPIXfer_done_int_d2 signal will make a successful read. The transfer_start signal also senses that the FIFO is no longer empty and start up as from empty, making a 2nd read, losing 1 word.

See near the cursor

start_unmodded3.png

 

My answer to this is to mask the read from SPIXfer_done_int_d2 in the case that transfer_start=0, as another read is about to be created by the rising edge of transfer_start.

Looking /opt/Xilinx/Vivado/2016.2/data/ip/xilinx/axi_quad_spi_v3_2/hdl/src/vhdl/qspi_mode_control_logic.vhd near line 1020:

 

Replaced SPIXfer_done_rd_tx_en <= transfer_start_pulse or SPIXfer_done_int_pulse_d2;

With:

SPIXfer_done_rd_tx_en <= not SR_5_Tx_Empty and (transfer_start_pulse or (SPIXfer_done_int_pulse_d2 and transfer_start));

 

Note that I am now trying this fix on Vivado 2016.2.

 

Would anyone from Xilinx like to comment on how safe this fix is and then getting it into the next release?

 

William.

 

 

 

Scholar
Scholar
8,503 Views
Registered: ‎09-16-2009

William,

 

Did you receive any response from Xilinx on this issue?

 

I wondering because our software team has started to report some flaky flash behaviors, and the symptoms look awfully similar to what you documented here.

 

Kudos for showing all the debug details, and listing all that you've found. I've not dug in yet in looking at the problem my team's seeing nor really understanding the details you've provided here.  But it looks to be a great place to start.

 

Thanks,

 

Mark

0 Kudos
Reply
Participant
Participant
8,450 Views
Registered: ‎10-15-2015

I have not heard anything from Xilinx.

I have the same fix working in Vivado 2016.2. Other parts of the core have changed at 2016.2, but the bit I modified was the same, so the same fix works.

I think I run into the problem as we are running the QSPI over a Chip to Chip link, which slows down the CPU access.

I don't know how this relates to your case, but it will depend on CPU power and SPI clock rate.

William.

 

0 Kudos
Reply
Observer
Observer
8,381 Views
Registered: ‎11-25-2015

 

Hi

 

The issue is seen when QSPI being set to dual mode by default in Xilinx linux source code.

 

When it has been disabled, the issue is not being seen. By default, it is enabled as “is-dual = <1> “. You can disable it by writing it as “is-dual = <0>” in your device tree.

 

&qspi {

                status = "okay" ;
                is-dual = <0>;

                flash@0  {

 

Can you verify if the above thing is working for you?

0 Kudos
Reply
Participant
Participant
8,272 Views
Registered: ‎10-15-2015

Is dual is set to 0.

The system is working, but with my modifications to the QSPI core firmware.

 

  qspi_flash: axi_quad_spi@80000000 {
   bits-per-word = <0x8>;
   compatible = "xlnx,xps-spi-2.00.a";
   fifo-size = <0x100>;
   interrupt-parent = <&intc>;
   interrupts = <0 34 1>;
   num-cs = <0x1>;
   reg = <0x80000000 0x1000>;
   xlnx,num-ss-bits = <0x1>;
   xlnx,spi-mode = <0x2>;
                        #address-cells = <1>;
                        #size-cells = <0>;
   is-dual = <0>;
   flash@0 {
    compatible = "s25fl512s";
                                reg = <0x0>;
                                spi-max-frequency = <5000000>;
                                #address-cells = <1>;
              spi-tx-bus-width = <1>;
                 spi-rx-bus-width = <4>;
                                #size-cells = <1>;
                                partition@test {
                                          label = "v7-flash";
                                          reg = <0x0 0x4000000>;
                                  };
   };

 

The label v7-flash gets through to /proc/mtd

I had another problem, possibly caused by out implementation, but I don't know why.

If I use this QSPI to configure the Virtex-7 (which is what I want it for), when Linux boots and tried to read the JEDEC code, it gets back 0, 0, 0 and hence cannot detector the device. If I try reading the JEDED code again it then works.

My work around is to use some Uboot code to force a dummy read of the JEDEC before allowing Linux to boot.

 

William.

 

0 Kudos
Reply
Explorer
Explorer
5,418 Views
Registered: ‎10-14-2015

Hi,

I have the same problem with the standard SPI. I have different slaves connected to the the same bus and when I try to access at the same time 2 different slaves on the same bus, I get error -110.

I am using petalinux 2016.2 ( kernel 4.4). in order to test I have created a couple of scripts that try to access in reading two different slaves in an infinite loop (a slave for each script). I have launched the scripts in two separate terminals and I have seen successful readings. After a while I have started seen errors, then it recovers and then fails again, in a "circular" way. It seems to me that there is some issue with the buffers as well. My clock is running at 5MHz, slowing down the clock doesn't fix the problem

 

I have tried with petalinux 204.4 and I don NOT have any problem with kernel 4.0.

 

Could be the problems related? what could be a valid solution for the standard SPI?

 

Regards,

Rocco

 

 

0 Kudos
Reply