cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
eugenewang55
Visitor
Visitor
4,313 Views
Registered: ‎05-18-2011

Question regarding Vertex 6 integrated block for pcie

Hi

I am a newbie in fpga and driver programming.  I followed the instructions on the XTP045 to generate a pcie gen1 integrated block with 1mb bar space for Vertex6 ML605 board and wrote a linux charcter driver to read into its bar space. Everything seems to work at this point.

 

However, when comparing the integrated block implementations of width x8 and width x1, they seem to have the same badwidth for reading/writing to their BAR memory space, which bothers me a lot.  Does anyone here know where might be the mistake?  The main function is pasted for reference.

 

 

Any advice/help would be appreciated.
Thanks a lot.   

 

 

 

#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include "iomap.h"
#include <sys/time.h>
#include <unistd.h>
#include <time.h>
#include <math.h>
#include <limits.h>
                                                      
int main(int argc, char *argv[])                                                                                      
{
printf("SSIZE_MAX=%ld\n", SSIZE_MAX);
  int fd1 = open("/dev/ml605", O_RDWR);                                                                              
 
                                                                                                  
  struct Iomap dev1;
  if (fd1==1)                                                                                         
  {
    perror("open");                                                                                                   
    printf("Error: ml605 already already opened\n");                                                             
    return 1;
  }
                                                                                          
  dev1.base = 0xf98000000;
  dev1.size = 8*1024*1024;
                                                                                                 
  if (ioctl(fd1, IOMAP_SET, &dev1))                                                                                   
  {     
    perror("ioctl");                                                                                                  
    printf("Error: ioctl of dev1 failed\n");
    return 2;                                                                                                         
  }
struct timeval start, end;
long  seconds, useconds;
float mtime;
int * buffer=malloc(1024*1024);
int readLength;
gettimeofday(&start, NULL);
readLength=read(fd1, buffer,1024, 0);
gettimeofday(&end, NULL);
        seconds  = end.tv_sec  - start.tv_sec;
        useconds = end.tv_usec - start.tv_usec;
        mtime = ((seconds) * 1000.0 + useconds/1000.0) ;
printf("time=%f\n", mtime);
free(buffer2);  
}                                                                                                                     
        

 

0 Kudos
3 Replies
luisb
Xilinx Employee
Xilinx Employee
4,304 Views
Registered: ‎04-06-2010

Unless you starting DMAing packets with high payload, you're not going to see much of a difference in performance with more lanes.  This is because the packets are limited to 1 Dword.  You're going to be subject to your processor speed when you only send 1 DW packets.  

 

If you set up a Bus Master DMA engine as described in XAPP 1052, then you will see the performance benefit.

 

I would also recommend reading through the following white paper on PCI Express Performance:

http://www.xilinx.com/support/documentation/white_papers/wp350.pdf

 

Hope this helps...

0 Kudos
eugenewang55
Visitor
Visitor
4,287 Views
Registered: ‎05-18-2011

thanks for your response Luisb.

So are you saying without DMA, the CPU by dafault construct pcie packet with only 1 DWORD payload?

I thought the max payload size of TLP sizes varies from 128-4k bits...so why is it limited 1 DWORD when CPU issues the read/write?

 

0 Kudos
luisb
Xilinx Employee
Xilinx Employee
4,252 Views
Registered: ‎04-06-2010

That is correct, without DMA the default packet is 1DWORD.  This is definitely a known restriction in the pcie world and the workaround is to have your own dma engine.  I really don't know the reason why most systems are designed this way, but I would guess it's so that the processor is not hung waiting for large transfers to occur.  If you offload this to another module, then the processor can continue while the hardware will read or write directly to memory.

0 Kudos