cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
medrano
Observer
Observer
1,097 Views
Registered: ‎03-11-2019

Matrix multiplication using the Python pynq library

Jump to solution

I am trying to implement a matrix multiplier on a PYNQ-Z1 board. 

I followed the reference design xapp1170_2015v4. I did the following modifications: the board part, I used xc7z020clg400-1, in the source code, in mmult.h I replaced 

// set it to sizeof(T) ones
e.strb = -1;
e.keep = 15;

by:

// set it to sizeof(T) ones
e.strb = (1<<sizeof(T))-1;
e.keep = (1<<sizeof(T))-1;

I used the vivado version 2019.2, and added one extra block: an axi interrupt controller (see tcl file). Without this extra block, pynq complains (as described here ).

 

Is there a reference implementation for this design using the Python library pynq? If not for this particular design for a similar one?

I haven't been able to correctly use the hardware component from within pynq. The result matrix contains always  zeros. I don't know whether I am failing to properly initialize the module or to configure the DMA transfers.  The Python code using pynq is enclosed. 

from pynq import Overlay
from pynq import allocate
import numpy as np


overlay = Overlay('/home/xilinx/overlays/matmult_xilinx/matmult.bit')

# Initialize multiplier
mmult_ip = overlay.HLS_accel_1
# Start the accelerator
ctrl = mmult_ip.read(0x00) & 0x08
mmult_ip.write(0x00, (ctrl|0x81))
ctrl = mmult_ip.read(0x00)
print(hex(ctrl))


# Interrupts
global_int = mmult_ip.read(0x04)
mmult_ip.write(0x04, ctrl & ~0x01) # Disable global interrupts

ip_ier = mmult_ip.read(0x08)
mmult_ip.write(0x08, ip_ier | 0x3) # enable channel 0, ap_done, and channel 1, ap_ready

ip_ier = mmult_ip.read(0x08)
print(ip_ier)


DIM = 32
IN_SIZE = 2 * DIM * DIM

OUT_SIZE = DIM * DIM
print(f"IN SIZE: {IN_SIZE}, OUT SIZE: {OUT_SIZE}")


in_buffer = allocate(shape=(IN_SIZE,), dtype='u4')
out_buffer = allocate(shape=(OUT_SIZE,), dtype='u4')

for i in range(IN_SIZE):
    in_buffer[i] = 1


dma = overlay.axi_dma_1

# Attempt 1, not working
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()

# attempt 2: not working
import asyncio
async def calculation():
    dma.sendchannel.transfer(in_buffer)
    dma.recvchannel.transfer(out_buffer)
    await dma.sendchannel.wait_async()
    await dma.recvchannel.wait_async()
    print(out_buffer)

loop = asyncio.get_event_loop()
calc_task = loop.create_task(calculation())
loop.run_until_complete(calc_task)

I will appreciate any hints. 

 

Best Regards,

Medrano

 

 

 

0 Kudos
1 Solution

Accepted Solutions
medrano
Observer
Observer
982 Views
Registered: ‎03-11-2019

I found the problem. The buffers have to be allocated with the right type; namely:

 

in_buffer = allocate(shape=(IN_SIZE,), dtype=np.float32)
out_buffer = allocate(shape=(OUT_SIZE,), dtype=np.float32)

 

Solved!

View solution in original post

0 Kudos
2 Replies
medrano
Observer
Observer
983 Views
Registered: ‎03-11-2019

I found the problem. The buffers have to be allocated with the right type; namely:

 

in_buffer = allocate(shape=(IN_SIZE,), dtype=np.float32)
out_buffer = allocate(shape=(OUT_SIZE,), dtype=np.float32)

 

Solved!

View solution in original post

0 Kudos
medrano
Observer
Observer
825 Views
Registered: ‎03-11-2019

An implementation of a matrix multiplication in hardware  is provided in this repo

https://github.com/twaclaw/matmult

 

Best,

Medrano

0 Kudos