02-11-2020 11:52 AM
We are tell Windows10 Vivado hardware manager to use JTAG to indirectly program the QSPI flash attached to an Artix7 FPGA. This always works if we put a Digilent HS3 programming cable between them.
If instead we put an XVC server between them, it always works for discovering FPGA, writing the FPGA's ephemeral config, and for chipscope. It can also use indirect QSPI to verify flash contents. But it cannot erase or write QSPI reliably. Usually Vivado locks up partway thru, but sometimes it succeds.
We ran TCPDUMP during one such lockup. It showed that the last transactions before timeout were Vivado PC requesting XVC to shift out 1228 bits (153.5 byes) each TMS and TDI vector, then XVC responding with the 1228 bits of TDO vector, then Vivado PC acknowledging receipt of those TDO bytes. Then no traffic between those IP addresses for 25 seconds, then Vivado times out.
We have tried Vivado 2017.1 2018.2 and 2019.2 with the same results.
We have tried all these TCK frequencies, and none works better than others: 16.667 MHz, 12.5 MHz, 10 MHz , 8.333 MHz, 1 MHz, 200 KHz
We have XVC server running on Avnet MicroZed MZ7010 board, like appnote 1251 suggests. The target (DUT) board whose JTAG is driven by PMOD of ZedBoard, is Trenz TEC0712 SOM with QSPI flash "s25fl256sxxxxxx0-spi-x1_x2_x4"
Is indirect QSPI programming really supported by Vivado for XVC (Xilinx Virtual Cable) 1.0 or is XVC only supported for debug?
02-13-2020 01:33 PM
02-17-2020 07:33 AM
Thank you for clarifying lack of support for flashing.
The problem is solved now. We can now do all the same operations on XVC that work on local or remote Digilent HS3 USB cable, including flash write. Note our XVC server is in the same building, not the other side of the world.
The problem was TCP/IP all along. We had ported xvcServer.c from the PetaLinux BSD-style socket API, to the LWIP raw API, since that is the only way supported for standalone (no OS) Zynq applications.
Various circumstances of client configs, timing, packet sizes, etc helped to hide the ways in which our code was not getting all the bytes of the TCP/IP stream.
Not all the necessary raw API's are documented, and more importantly the semantics are not documented, except via a few examples whose incompleteness tens to mislead about the semantics. We finally reverse-engineering the semantics from the library source code.
1) A single call to your receive callback can pass many packets. pbuf is a linked list. You must follow the next pointers until you see 0. Back-to-back packets is 1 case where the LL is usually longer than 1 packet.
2) You cannot free just 1 pbuf. The pbuf_free() function is supposed to be called with the head pbuf of the linked list, the one originally passed to your receive callback.
3) You cannot affect which payload bytes get acknowledged by the TCP stack. By the time your receive callback gets called, they have already all been acked. The only thing you can affect via the tcp_recved() function, is when and how much the TCP RX window advertisement changes. In other words, you cannot make the peer station re-transmit, you can only clog up the connection by telling him you have no room, for too long.
3a) You had better have called tcp_recved() with enough bytes to account for all pbuf's by the time you return from receive callback, or the window will close a little bit each time you get that callback. You could call this a "window leak". This means 1 call using the number of bytes in the tot_len field of the pbuf at the head of linked list, when you're all done with all pbuf's, or several calls with the number of bytes from the len field of each pbuf in the list.
4) TCP makes no guarantees about message boundaries at the RX side. Just because a sending station (i.e. Vivado) happened to tell his TCP/IP stack to TX N bytes in a single OS call (or many OS calls back to back), that does not mean that they will all be bundled into a single ethernet packet in the wire/fiber. TCP only promises that no byte will be delivered before bytes that the OS got later on the same stream.
With a BSD-style socket API, an RX application can tell the OS it wants N bytes (for a command header) and just sit there until it is available, regardless how many packet boundaries had to be crossed or or how many retransmissions happened. Then the app can proceed with parsing, knowing he has all N bytes in his hands. But the LWIP raw API delivers all TCP payload the bytes of each packet at once, and if a command happens to be truncated in the middle, and must be continued an indefinite time later in another packet, that's just tough for the application.
I had coded the app to assume that commands would always start at the beginning of a pbuf. The worst thing was, very often they did, so it was easy to think the system was working, until we did a longer baseline. In the end, rather than implementing generic SW layer to buffer packets like BSD sockets do, so we could continue the (comparing string as a whole several times) behavior, I preferred to pass packet bytes to a state machine 1 by 1, like I would do in hardware. The state machine promises to swallow everything it gets, does not care about packet boundaries, and returns a categorization to the caller. When it is expecting a command, it categorizes on the fly, updating a per-connection state, and returning an error as soon as it realizes a command is unrecognized. When it is expecting a TMS or TDI vector, it stores it in a per-connection buffer.
For the variable number of bytes in the TMS and TDI vectors, we still had to accumulate at least all of TMS before beginning pushing into the hardware block, since that HW wants the same number of bits of TMS and TDI each time it is started.
This makes sure that commands that begin after a shift cmd don't get lost, and we handle any number of cmd/arg per packet.
Using lots of function calls like this may be an issue with the slow clock rates that Microblaze tends to run at. But XVC running on Zynq 7000 and UltraScale+ that we used, seemed just as speedy as a local Digilent HS3 cable. BTW we ran XVC TCK at 16.777 MHz and Digilent HS3 at 15 MHz.
I hope this helps the next person who has to do a serious app using the LWIP raw API.