cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
7,087 Views
Registered: ‎10-18-2013

Burst read peripheral DMA weird behaviour

I am trying to get DMA working for my custom peripheral. I have a test program written in C that runs bare metal on the Zynq and two axi 4 buses. My peripheral is a slave on one bus and a master on the other.

 

The c program (listed below) initializes some memory and then writes to my peripheral to start reading (DMA) (after flushing caches). But as I can see in chipscope data read from memory is not the data from the correct memory address, and the RDATA is only updated every other clock cycle. (the burst being done has length 16, size 4 bytes and burst type INCR) (arcache and arprot are all 0:s)

 

C-program code:

#include <stdint.h>
#include "util.h"

int main(void)
{  
  int x[4096];
  int i;
  volatile int dummy;

  for(i = 0; i < 4096; ++i) {
    x[i] = 0x3f894c00+i;
    if(i > 4000)
      small_printf("x[i]: %X\n", x[i]);
  }
  
  write_mem32(0x40002000, 0x3f6b0679);
  write_mem32(0x40002004, 0x3fc0d3f0);

  for(i = 0; i < 4096; i++) {
    write_mem32(0x4000C000 + i*4, 0x3f894c92); // write x[4096-8191]
  }
  for(i = 0; i < 4096; i++) {
    write_mem32(0x40008000 + i*4, 0x3f894c92); // write x[4096-8191]
  }
  
  small_printf("x: %x", x);
  small_printf("\nDMA test.\n\n");

  Flush_DCache();

  // Start the DMA
  write_mem32(0x40000800, (uint32_t) x); // load with DMA: x[0-4095]

  small_printf("DMA started\n");

  for(i = 0; i < 4096; i++) {
    write_mem32(0x40004000 + i*4, 0x39800000); // write w
  }

  // Start the FSM
  write_mem32(0x40001000, 0xffffffff);

  // Read xhat
  int result[2];
  result[0] = read_mem32(0x40000000);
  result[1] = read_mem32(0x40000004);

  // Read weights
  for(i = 0; i < 4096; ++i) {
    dummy = read_mem32(0x40004000);
  }

  small_printf("Result 1: %x\nResult 2: %x\n", result[0], result[1]);

  return 0;
}

the vector x is stored starting at address 0x241f7c, the memory content is
 

0x00241f6c: 0x3f895050 0x3f895051 0x3f895050 0x7de00c02   PP.?QP.?PP.?...}
0x00241f7c: 0x3f894c00 0x3f894c01 0x3f894c02 0x3f894c03   .L.?.L.?.L.?.L.?
0x00241f8c: 0x3f894c04 0x3f894c05 0x3f894c06 0x3f894c07   .L.?.L.?.L.?.L.?
0x00241f9c: 0x3f894c08 0x3f894c09 0x3f894c0a 0x3f894c0b   .L.?.L.?.L.?.L.?
0x00241fac: 0x3f894c0c 0x3f894c0d 0x3f894c0e 0x3f894c0f   .L.?.L.?.L.?.L.?
0x00241fbc: 0x3f894c10 0x3f894c11 0x3f894c12 0x3f894c13   .L.?.L.?.L.?.L.?
0x00241fcc: 0x3f894c14 0x3f894c15 0x3f894c16 0x3f894c17   .L.?.L.?.L.?.L.?
0x00241fdc: 0x3f894c18 0x3f894c19 0x3f894c1a 0x3f894c1b   .L.?.L.?.L.?.L.?

 The data that is seen in the first burst on the bus with chipscope is:

0x7de00c02
0x3f894c01
0x3f894c01
0x3f894c03
0x3f894c03
0x3f894c05
0x3f894c05
0x3f894c07
0x3f894c07
0x3f894c09
0x3f894c09
0x3f894c0B
0x3f894c0B
0x3f894c0D
0x3f894c0D
0x3f894c0F

I am attaching a print screen of chipscope. I don't know why I am having this behaviour and don't know how to troubleshoot it either. Is there some setting that I have stupidly missed?

 

All help is greatly appreciated.

 

chipscope_burst_read.png
0 Kudos
9 Replies
Highlighted
Observer
Observer
7,008 Views
Registered: ‎10-18-2013

Now I see that whatever I address I try to get data from, I get the data that is 8 byte-aligned.
The bus between the PS and my peripheral is connected to the PS S_AXI_HP1, is that bus 64 bits wide? always?

I haven't found any strobe signal for the read channel. How can I know which bytes contain valid data?
0 Kudos
Highlighted
Observer
Observer
7,003 Views
Registered: ‎10-18-2013

In XPS, HP1 is enabled, access for HIGHOCM address range is not enabled. Base and High Addresses are set to AUTO and the DATA_WIDTH is 32.

HP0 is also enabled but it is not used.
0 Kudos
Highlighted
Observer
Observer
6,990 Views
Registered: ‎11-01-2013

Hi ,

 

What is your sampling clock . Are you sampling the data with a clock that is min 2 times faster than your DMA signals ?.  Many times I notice that , people tend to feed a clock of the same speed or slower ones . 

 

With regards

Vintu

0 Kudos
Highlighted
Observer
Observer
6,980 Views
Registered: ‎10-18-2013

This could very well be what is wrong. Could it be that I have correct behaviour but chipscope is showing me wrong data?

 

Unfortunately I dont know the answer to your question. Which clocks are we talking about here? I have one clock for the PL and one for PS and one for chipscope, correct?

 

Where can I see the frequencies of the clocks that are used?

 

Many thanks

0 Kudos
Highlighted
Observer
Observer
6,968 Views
Registered: ‎11-01-2013

What is the frequency of the clock connected to the Chipscope . Is it  min ( 2 times ) higher than the data rate of ddr3 ?.

 

With regards

Vintu

0 Kudos
Highlighted
Observer
Observer
6,944 Views
Registered: ‎10-18-2013

The clock used for my axi bus and peripheral is processing_system7_0::FCLK_CLK0 which is an IO PLL sourced 100 MHz clock.

 

The ARM CPU clock runs at 533 MHz.

 

I am not sure where I can see the frequency of my chipscope. The signals that it scopes are all in the clock domain processing_system7_0_FCLK_CLK0_pin.

 

So I believe it uses the same freq. as the reste of the design. Which is in line with another answer on this forum:

"Yes, use the clock that is used for the rest of the design.  Chipscope is designed to be synchronous with whatever it is sampling.  If you want to sample less often, you can use one of the trigger matches as a data storage qualifier."

 

 

 

0 Kudos
Highlighted
Observer
Observer
6,925 Views
Registered: ‎10-18-2013

In the Trace report I can see that I have a clock with a requirement of Period 30.000 ns. I believe this is the timing constraint for the chipscope which makes the whole design end up at Max Frequency 64.140 MHz after implementation.

 

Some timing constraints from TWR file:

 

Timing constraint: TS_clk_fpga_0 = PERIOD TIMEGRP "clk_fpga_0" 100 MHz HIGH
50%;

 

Timing constraint: TS_J_CLK = PERIOD TIMEGRP "J_CLK" 30 ns HIGH 50%;

  Source Clock:         CONTROL_cs_ila_0_0[0] rising at 0.000ns
  Destination Clock:    CONTROL_cs_ila_0_0[0] rising at 30.000ns

 

Timing constraint: TS_U_TO_J = MAXDELAY FROM TIMEGRP "U_CLK" TO TIMEGRP "J_CLK"
15 ns;

 

Timing constraint: PATH "TS_J_TO_D_path" TIG;

0 Kudos
Highlighted
Observer
Observer
6,883 Views
Registered: ‎11-01-2013

Hi ,

 

Have you managed to resolve the issue ?. I was not in this project for the past one month.

 

Do let me know if you have found a solution to your problem.

 

With regards

Vintu

0 Kudos
Highlighted
Observer
Observer
6,875 Views
Registered: ‎10-18-2013

Hi vjose1, thanks for the reply.

 

I have not found a solution.

 

My work around is to space out my software data on every other position in a vector. This way I can make my needed data appear on the bus in a correct fashion.

But it is very ugly (data takes double amount of memory space) so it would be nice if I could manage to fix this.

0 Kudos