We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Showing results for 
Search instead for 
Did you mean: 
Registered: ‎07-08-2016

HLS axi-slave behaves incorrectly under linux, but works in bare metal design

I have been stuck on an issue for a while now, so posting here to maybe get a new set of eyes on it (original thread).


I have an HLS block that interacts with the Zynq PS as an axi-lite slave. The HLS block has been validated in hardware in a bare-metal system, however it does not work under linux. This obviously doesn't seem like a hardware issue, however I'm starting to think it might be.


The HLS block is an RSA encryption accelerator that takes three ap_uint<1024> values as inputs: (base,  exponent, and modulus) and an operating mode, and returnsan ap_uint<1024> result= base^(exponent) % modulus. The behavior of the block changes slightly depending on the value of the "mode" input.


I have a bare-metal test program, which works. I also have a linux kernel module to map the HLS registers into userspace, and a userspace test program to write/read data from the block. The bare metal test program works, but the linux test program returns garbage data, and also returns different garbage data every time I run the hardware for the same inputs. This is inconsistent with how the block functions in the bare metal design.


"sounds like an issue in your kernel module" you say? I dont know.... I printed out the EXACT values I was writing into the AXI registers in the bare metal system, and inspected the memory just in case I was messing up the bye-Endianness (since each 1024-bit input has 32 input registers, I figured maybe I was writing the words in backwards?). It behaves as expected.


Once I obtained the EXACT values I was writing to the input ports in my bare-metal drivers, I proceeded to write a script that manually writes to each register using devmem, bypassing any potential errors that might lurk in my kernel module. This was to be absolutely SURE that I am writing the correct values.


Once I loaded the input values (base, exponent, modulus, and operating mode), I write 0x1 to the ap_ctrl register to start the block.


The answer is incorrect and different every time.


I still was suspicious, so I decided to throw an ILA on the AXI bus (between interconnect and the HLS block) to see the data going by. The writes to the input registers in the bare metal system are IDENTICAL to the writes using devmem2 AND in my kernel module. But sure enough, the data returning from the HLS block is different.


I have NO IDEA what is going on here. Again, the hardware WORKS when I write data to it using the Xilinx bare-metal drivers. But under linux, even when directly writing to physical memory using devmem2, everything breaks. And I know its most likely not the hardware, because it returns a DIFFERENT incorrect answer every time, which is inconsistent with the underlying hardare implementation.


So I have absolutely no idea how to proceed with debugging. For reference, I'd like to show my HLS top level function, and then compare my bare metal driver with the linux devmem2 method:


HLS top level function:

void wsrsa1024( memword_t privexp[NUM_MEMWORDS],    // BRAM holding private exponent
				RSAmode_t mode,     // mode: encrypt, decrypt, or load private exponent from BRAM
				uintRSA_t base,     // base (plain/cipher)text
				uintRSA_t publexp,  // public exponent
				uintRSA_t modulus,  // modulus
				uintRSA_t *result ) // result

	static uintRSA_t priv=0;

	// Load the private key (from BRAM) into local ap_uint<1024> variable.
		for (int i=0; i<NUM_MEMWORDS; i++)
			priv.range(NUM_BITS-1,(NUM_BITS)-MEMWORD_SIZE) = privexp[i];
			if (i!=NUM_MEMWORDS-1)
				priv >>= MEMWORD_SIZE;
		*result = 0;

	// Encrypts data using RSA modular exponentiation: result = base^(private exponent) % modulus
	// base is the plaintext, exponent is private exponent, modulus is the shared modulus
	case ENCRYPT:

	// Decrypts data using RSA modular exponentiation: result = base^(public exponent) % modulus
	// base is the ciphertext, exponent is private exponent, modulus is the shared modulus	case DECRYPT:
	case DECRYPT:



Bare metal driver, testing encryption

uint8_t privexp_arr[] = {0xA1,0x11,0xAD,0xAD,0x48,0x88,0xF5,0x2D,0x35,0xF5,0x42,0x8E,0x39,0x39,0x68,0x06,0xBE,0x32,0x52,0x5C,0xDA,0x2B,0xF2,0x2A,0x27,0x58,0x1B,0xDE,0xEE,0x18,0x63,0x92,0xD8,0x9F,0x02,0x2C,0xFB,0xDF,0x77,0xE6,0x1F,0xDB,0xDC,0x84,0x6C,0x90,0x38,0xA0,0x8D,0x8A,0xEB,0x5C,0x2A,0xF7,0xCC,0x25,0x9D,0x62,0xBA,0xB5,0xB2,0xB8,0x7B,0xCD,0x66,0xD6,0x77,0xD5,0x32,0x9D,0xF1,0x98,0x9C,0xB1,0xAC,0x50,0x23,0x7C,0xCF,0x28,0x69,0x32,0xD9,0x3A,0x21,0x82,0x9D,0xE0,0xE1,0xBA,0x12,0x3C,0x79,0x95,0x10,0x7A,0x50,0x6E,0xA2,0x91,0x87,0x04,0x2B,0x6F,0xE4,0x8C,0x05,0x51,0x31,0x81,0x50,0xE9,0x52,0x69,0x09,0xCF,0x68,0x1D,0x74,0x88,0x6B,0x17,0x43,0xE8,0xFD,0x9C,0x7B,0x04};
uint32_t publexp_arr [] = {0x10001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
uint8_t modulus_arr[] = {0x49,0xF5,0xEB,0x73,0x5B,0x82,0x9C,0xEB,0x4B,0xC2,0xAF,0x74,0x64,0x29,0x38,0xA8,0xAF,0x7E,0xA4,0x77,0xBA,0x9C,0x79,0xB6,0x9B,0x5E,0x65,0xBC,0xBA,0x74,0x84,0x3E,0x84,0xBF,0x5C,0xD4,0xD1,0xF4,0xEC,0xD4,0x83,0x3D,0xC6,0x9B,0x7B,0x52,0x5C,0x2F,0x25,0x79,0x6D,0x21,0x79,0xB3,0x31,0x7A,0x0D,0xAD,0xB1,0xB9,0xDC,0x5F,0xE5,0x3D,0x13,0x21,0xF6,0xFB,0x97,0x1A,0xFB,0xB9,0x7F,0x4D,0x26,0x0F,0x10,0x37,0xEA,0xEA,0xEC,0x97,0xA4,0x79,0x37,0xFB,0x62,0x33,0x9E,0xB3,0x28,0xC4,0x30,0x8A,0xA6,0x94,0x9A,0x9F,0x0D,0xDF,0xE2,0xF5,0xB4,0x1F,0x25,0x4F,0xE1,0x6F,0x35,0xBF,0x82,0xBF,0xE6,0xA2,0xA0,0x15,0x80,0xA1,0x69,0x97,0xD8,0x3D,0x85,0x88,0x9E,0x88,0x4D,0xD9};
const uint8_t ciphertext_golden_ans[] = {0xF0,0xCA,0x37,0xC7,0xFA,0x38,0xB3,0xDF,0x00,0xA6,0xFA,0x10,0x14,0xEA,0xD7,0x36,0x83,0x61,0x5F,0x12,0x29,0x6C,0x19,0xC3,0x3A,0xC6,0x03,0xC9,0x74,0xF2,0x9E,0x57,0x68,0x2C,0xA8,0xAD,0xE6,0xAF,0x27,0x35,0xEF,0xD6,0x33,0x34,0xA8,0x0F,0x8E,0x2D,0x84,0xA5,0xA9,0xF3,0xC6,0x9A,0xF7,0xC9,0xB6,0x9B,0x12,0x0E,0xF3,0x40,0x6E,0x8E,0x2A,0x40,0x4B,0x6C,0x63,0x6B,0x42,0xEC,0xE6,0xB5,0x2E,0x1D,0x5A,0x95,0xFF,0x8E,0xAF,0xB3,0x24,0x8D,0x88,0x01,0x61,0x42,0x1D,0xA9,0x80,0x93,0xD2,0xE9,0x04,0x30,0x63,0x43,0x16,0xC1,0xD0,0xCC,0xFD,0xD1,0xA0,0xA8,0xC3,0xD0,0x73,0xF6,0x66,0x38,0x95,0x42,0xA1,0x75,0x77,0xD1,0xE2,0xBB,0xB8,0x49,0x7B,0x78,0x6F,0x66,0x44,0x93};
const uint32_t plaintext_golden_ans[] = {0x726C6421,0x2C20576F,0x656C6C6F,0x00000048,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};

int32_t wsrsa_encrypt(uint8_t* plaintext, uint8_t* publexp, uint8_t*modulus,  uint8_t* ciphertext)
	// Set Base
	XWsrsa1024_Base_v plaintext_st;
	memcpy(&plaintext_st, plaintext, sizeof(XWsrsa1024_Base_v));

	// Set public exponent
	XWsrsa1024_Publexp_v publexp_st;
	memcpy(&publexp_st, publexp, sizeof(XWsrsa1024_Publexp_v));
	XWsrsa1024_Set_publexp_V(&xrsamodexp, publexp_st);

	// Set Modulus
	XWsrsa1024_Modulus_v modulus_st;
	memcpy(&modulus_st, modulus, sizeof(modulus_st));
	XWsrsa1024_Set_modulus_V(&xrsamodexp, modulus_st);

	// Create empty result struct and initialize it to zero
	XWsrsa1024_Result_v ciphertext_st;
        memset(ciphertext_st, 0, sizeof(XWsrsa1024_Result_v));

	// Print input data for debugging
	xil_printf("BASE DATA = \n");   printBaseData(&xrsamodexp,plaintext_st);
	xil_printf("EXPO DATA = \n");   printExpData(&xrsamodexp,publexp_st);
	xil_printf("MODU DATA = \n");   printModData(&xrsamodexp,modulus_st);

	// Set mode to encrypt

	// Start hardare block
	// wait for result
	while( !XWsrsa1024_IsDone(&xrsamodexp));

	// read back data into local buffer
	ciphertext_st = XWsrsa1024_Get_result_V(&xrsamodexp);

	// copy local struct data into user buffer
	memcpy(ciphertext, &ciphertext_st, sizeof(XWsrsa1024_Result_v));

	// compare result against golden truth data, and fail if its wrong
	if (memcmp(ciphertext, golden_ans, sizeof(XWsrsa1024_Result_v)) )
		return XST_FAILURE;
		return XST_SUCCESS;

int32_t rsa_test(void)
	uint8_t result[RSA_NUM_BYTES];
	// initialize RSA block
	uint32_t ret = rsa_init();
	if (ret != XST_SUCCESS)
	        xil_printf("RSA init error!\n");

	// Test public encyption on plaintext, comparing against known ciphertext
	ret = wsrsa_encrypt(plaintext_golden_ans, publexp_arr, modulus_arr, result);
	xil_printf("Enc result = "); printHex(result,RSA_NUM_BYTES);
        return ret;


And with devmem2, manually setting the input registers



# Set mode to 0 (ENCRYPT)
devmem2 0x43c00010 b 0

devmem2 0x43C00018 w 0x726C6421
devmem2 0x43C0001C w 0x2C20576F
devmem2 0x43C00020 w 0x656C6C6F
devmem2 0x43C00024 w 0x00000048
devmem2 0x43C00028 w 0x00000000
devmem2 0x43C0002C w 0x00000000
devmem2 0x43C00084 w 0x00000000
# ....
#  zero writes through 0x43C00094 ....
# ....
devmem2 0x43C00094 w 0x00000000

# WRITE EXPONENT WORD-BY-WORD devmem2 0x43C0009C w 0x00010001 devmem2 0x43C000A0 w 0x00000000 devmem2 0x43C000A4 w 0x00000000 # .... # zero writes through 0x43C00118 .... # .... devmem2 0x43C00118 w 0x00000000
# WRITE MODULUS WORD-BY-WORD devmem2 0x43C00120 w 0x73EBF549 devmem2 0x43C00124 w 0xEB9C825B devmem2 0x43C00128 w 0x74AFC24B devmem2 0x43C0012C w 0xA8382964 devmem2 0x43C00130 w 0x77A47EAF devmem2 0x43C00134 w 0xB6799CBA devmem2 0x43C00138 w 0xBC655E9B devmem2 0x43C0013C w 0x3E8474BA devmem2 0x43C00140 w 0xD45CBF84 devmem2 0x43C00144 w 0xD4ECF4D1 devmem2 0x43C00148 w 0x9BC63D83 devmem2 0x43C0014C w 0x2F5C527B devmem2 0x43C00150 w 0x216D7925 devmem2 0x43C00154 w 0x7A31B379 devmem2 0x43C00158 w 0xB9B1AD0D devmem2 0x43C0015C w 0x3DE55FDC devmem2 0x43C00160 w 0xFBF62113 devmem2 0x43C00164 w 0xB9FB1A97 devmem2 0x43C00168 w 0x0F264D7F devmem2 0x43C0016C w 0xEAEA3710 devmem2 0x43C00170 w 0x79A497EC devmem2 0x43C00174 w 0x3362FB37 devmem2 0x43C00178 w 0xC428B39E devmem2 0x43C0017C w 0x94A68A30 devmem2 0x43C00180 w 0xDF0D9F9A devmem2 0x43C00184 w 0x1FB4F5E2 devmem2 0x43C00188 w 0x6FE14F25 devmem2 0x43C0018C w 0xBF82BF35 devmem2 0x43C00190 w 0x15A0A2E6 devmem2 0x43C00194 w 0x9769A180 devmem2 0x43C00198 w 0x88853DD8 devmem2 0x43C0019C w 0xD94D889E
# disable autorestart devmem2 0x43C00000 b 0
# set ap_start=1 devmem2 0x43C00000 b 1

# wait for it to be done
sleep 2
# read back the results to the console devmem2 0x43C001A4 w devmem2 0x43C001A8 w devmem2 0x43C001AC w devmem2 0x43C001B0 w devmem2 0x43C001B4 w devmem2 0x43C001B8 w devmem2 0x43C001BC w devmem2 0x43C001C0 w devmem2 0x43C001C4 w devmem2 0x43C001C8 w devmem2 0x43C001CC w devmem2 0x43C001D0 w devmem2 0x43C001D4 w devmem2 0x43C001D8 w devmem2 0x43C001DC w devmem2 0x43C001E0 w devmem2 0x43C001E4 w devmem2 0x43C001E8 w devmem2 0x43C001EC w devmem2 0x43C001F0 w devmem2 0x43C001F4 w devmem2 0x43C001F8 w devmem2 0x43C001FC w devmem2 0x43C00200 w devmem2 0x43C00204 w devmem2 0x43C00208 w devmem2 0x43C0020C w devmem2 0x43C00210 w devmem2 0x43C00214 w devmem2 0x43C00218 w devmem2 0x43C0021C w devmem2 0x43C00220 w

 Sure enough, I get garbage back. Every time I call the devmem2 script, the results are different. But I can loop my bare metal test program, constantly setting ap_start, and the result never changes (as it shouldnt).



Any and all help appreciated,





Tags (3)
7 Replies
Registered: ‎07-08-2016

Re: HLS axi-slave behaves incorrectly under linux, but works in bare metal design

If anyone has any ideas or suggestions, I would greatly appreciate it. I'm really at a loss for what to do here, and this is a pretty critical portion of my thesis. I'm currently in the middle of rewriting the HLS block to output a bunch of internal signals, so that I can monitor them with an ILA, as well as attempting to store "golden truth" data in internal ROM that I can compare against and set a flag to trigger the ILA if it is incorrect.


 What is really blowing my mind is that the design works in bare metal.....this obviously indicates that it is a problem with my linux kernel module. So I bypass the kernel module and manually set the registers using devmem. But then it still doesn't work, and when I put in an ILA I see that all the input data is indeed correct. Then I run my kernel module again, connected to the ILA, and see the same data as in the bare metal system AND with devmem coning across the AXI bus..... so all three are the same (same address and write data, plus or minus timing differences). So now THIS makes it seem like a hardware issue. But it cant be a hardware issue, because the design works in bare metal....so it must be a linux kernel issue....but it.....you get the idea....

0 Kudos
Registered: ‎07-08-2016

Re: HLS axi-slave behaves incorrectly under linux, but works in bare metal design

OK I don't want to get ahead of myself here, but I think I have enough to show that this is some sort of hardware bug (or toolsuite bug).


So, in the process of debugging, I added a bunch of ap_ctrl_none top level ports to my HLS block, exposing the internals to the ILA to try and figure out what is going on. No new AXI interfaces, so nothing with the address map should change. I export hardware, program the device, boot into linux, and BAM my kernel module crashes the OS when it tries to write. Again, this kernel module, while never able to return the right answer, was NOT crashing EVER when used repeatedly. All I did was change a few things that aren't visible to the PS in the HLS block, and my kernel module starts crashing.


"OK brett, so maybe the new logic you added is just exposing some flaw in your kernel module that wasn't there before. Maybe it is just increasing the latency of the block, or something like that, and your module is poorly written and is breaking".......My thoughts exactly! Which is why I decided to do the whole thing with devmem again, just to idiot check myself.


Well, at the EXACT SAME POINT that my kernel module crashes (when polling ap_ctrl register to see if the block has finished), the hardware server on the Zynq stops responding, and ap_start bit is STUCK at 1. Again, this happens when I just tell the hardware to start by using devmem to write 0x1 to the ap_ctrl register. I can't clear the bit when I write a zero into ap_ctrl either.....


@gdg I don't know if you solved your problem, but this is looking more and more like your issue?


I'm not making this stuff up. Either I'm going absolutely insane, or I've found something going on here. And again, the block STILL PASSES RTL COSIM AND WORKS IN BARE METAL. My best guess is that it has to do with the HDL for the AXI interface that Vivado HLS 2017.2 is generating? Because the only thing that I have changed was in HLS. This does not explain why the bare metal drivers work, but I know that a lot is going on under the hood when linux is running on the PS, and I don't know what memory protection magic might be interfering with things.


So please, if someone here from Xilinx would chime in, that would be great. Or, if one of the awesome community super users who have been helping me out think this is worthwhile maybe you could put something in the "private forums" about this, that would be much appreciated.


Again, I really really hope I am wrong here. But as the days go by I fear I am not......

Registered: ‎03-22-2017

Re: HLS axi-slave behaves incorrectly under linux, but works in bare metal design

@bigbrett, on the recent few days I am working really hard on something else, but that is on my TODO list with high priority as well. I will get back to you as soon as I can.

0 Kudos
Registered: ‎06-24-2015

Re: HLS axi-slave behaves incorrectly under linux, but works in bare metal design

Hi all,


We have intimated the factory to delve deeper into this issue.

Google your question before posting. If someone's post answers your question, mark the post as answer with "Accept as solution". If you see a particularly good and informative post, consider giving it Kudos (click on the 'thumbs-up' button).
Registered: ‎06-24-2015

Re: HLS axi-slave behaves incorrectly under linux, but works in bare metal design



Is it possible for you to upload the complete testcase?
I have sent you an ezmove package.

Google your question before posting. If someone's post answers your question, mark the post as answer with "Accept as solution". If you see a particularly good and informative post, consider giving it Kudos (click on the 'thumbs-up' button).
0 Kudos
Registered: ‎07-08-2016

Re: HLS axi-slave behaves incorrectly under linux, but works in bare metal design

testcase has been PM'ed.


Also, an update: Just confirmed that the issue is present in 2016.4 as well.

0 Kudos
Registered: ‎07-08-2016

Re: HLS axi-slave behaves incorrectly under linux, but works in bare metal design

@nupurs I ended up submitting my thesis without this issue getting resolved......I'm curious if Xilinx ever mentioned anything to you about it. Is this whole issue just a mystery, and Xilinx doesn't care enough to solve it?

0 Kudos