cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
bharat1995
Visitor
Visitor
1,304 Views
Registered: ‎07-11-2018

Question regarding Latency = 0 for modular exponentiation function which has 512 bit numbers

Jump to solution

Hi,

 

I am fairly new to posting a question on the community forms but I have been taking help of community forms and helping out myself whenever I was stuck with something so far. I will try to post the question as clear as possible. Please be patient in reading the post.

 

So, Here goes my problem:

 

I am implementing the encryption of a message using RSA cryptography.

 

Say, If there is a message " m = 65 ", encryption key " e = 65537 " and a modulo (calculated from the prime integers chosen prior) as " n = 22909970124297718957842684219100283124200864032201744627122779819532871095058006341703915321355045493655666890829770151532787071682995175119 "

 

Then, encrypted Cypher would be    c \equiv m^e \pmod{n}

This was implemented using an algorithm which goes like:

 

ap_uint<512> modexp(ap_uint<512> base, ap_uint<512> exp, ap_uint<512> n_modulus)
{

 

ap_uint<512> C = 1;
int count1 = 0, count2 = 0;
  while (exp > 0)
  {
      if (exp % 2 == 1)
      {
         C = (C * base) % n_modulus;
       }
       exp = exp >> 1;
       base = (base * base) % n_modulus;
  }

return C;
}

 

 

 

 

So, when I tried to implement this function as in the file " normal.cpp " with the testbench file " normal_test.cpp " , the synthesis report was showing the latency as = 0 and as if there was no operation done on it at all.

 

To breakdown the implementation of the "normal" thingy, It goes something like this:

 

TESTBENCH:

 

#includes

definitions and declarations

 

int main()

{

test(); // is the top function that is being invoked from the testbench here

return 0;

}

 

Source Code:

 

#includes

//definitions and declarations

 

test()

{

//Some variable calculations  // m , e, n

ap_uint<512> C = modexp(m,e,n);

 return 0;

}

 

ap_uint<512> modexp(ap_uint<512> message, ap_uint<512> exp, ap_uint<512> n)

{

//function working

return C;

}

 

Here obviously, the top function was set to " test() " under project settings > synthesis

 

 

 

 

However when I tried implementing the same but this time, I called the just the " modexp() " function from the testbench file directly to see that the latency was now showing as " ? " I looked up on the forums for this long time ago and this happens if there is a loop bound that is dynamically being set which I think it is true as I used a while loop which checks for the condition. But what wonders me is that when I was implementing using the "normal.cpp" files and that sort of implementation, that did not throw this bizarre " ? " latency value on the synthesis report though. I didn't understand as to what was happening with these implementations.

 

 

I implemented the same code on the arm processor (ARM A9) that is onboard of the Zedboard to see that encryption did take about 335,535 cycles (by using performance counters of course) but this effect isn't being shown on the latency reports of the implementation done on Vivado HLS 2017.4

 

 

 

Question (1)  Why is the latency = 0 even when obviously the modular exponentiation is a compute intensive function

Question (2) Why is the latency being " ? " when implemented it in a different way? (the same function but called directly from the testbench with that function as the top level function while synthesizing it) (but there was some utilization that I could see on the report generated with this sort of implementation)

 

 

PS: I tried attaching the pseudo codes for it but something didn't work and it wasn't possible for me to do so.

Please reach out to me at " bharathsharma95@gmail.com " and I can email you my codes

0 Kudos
1 Solution

Accepted Solutions
u4223374
Advisor
Advisor
1,383 Views
Registered: ‎04-26-2015

(1) Probably because your top function (test()) takes no inputs and provides no outputs. So HLS, quite correctly, says "I could just replace this with an immediate return!" and deletes all the underlying code. Your top function should be the one that you intend to call in the FPGA. If you intend to call test() in the FPGA then the results will be exactly what you see in a C simulation (ie no output because test() has no return value).

 

(2) This is the correct way to implement it (using modexp as the top function). The unknown latency is because you have an unknown-length loop in your function. If exp == 0 when you start it, latency is going to be a cycle or two at most (because the loop never runs). If exp == 2^512 - 1 (ie all ones) then it'll take a lot of cycles. The modulo operations may also be of unknown length, because obviously in some cases you can short-cut those (eg. an obvious check would be "for a % b, is b > a? If so, return a". In this case the run-time will again be very short.

 

HLS cannot give you an exact time to run this function because the time is not constant.

View solution in original post

0 Kudos
2 Replies
u4223374
Advisor
Advisor
1,384 Views
Registered: ‎04-26-2015

(1) Probably because your top function (test()) takes no inputs and provides no outputs. So HLS, quite correctly, says "I could just replace this with an immediate return!" and deletes all the underlying code. Your top function should be the one that you intend to call in the FPGA. If you intend to call test() in the FPGA then the results will be exactly what you see in a C simulation (ie no output because test() has no return value).

 

(2) This is the correct way to implement it (using modexp as the top function). The unknown latency is because you have an unknown-length loop in your function. If exp == 0 when you start it, latency is going to be a cycle or two at most (because the loop never runs). If exp == 2^512 - 1 (ie all ones) then it'll take a lot of cycles. The modulo operations may also be of unknown length, because obviously in some cases you can short-cut those (eg. an obvious check would be "for a % b, is b > a? If so, return a". In this case the run-time will again be very short.

 

HLS cannot give you an exact time to run this function because the time is not constant.

View solution in original post

0 Kudos
bharat1995
Visitor
Visitor
1,193 Views
Registered: ‎07-11-2018
Hi,
Thank you for the pointers about the unbound loop (as in (1)) and for the explanation in(2).
The work around with known loop bounds with a for loop worked well and did get synthesized.

As to how I could get things to worl and the rest, anyone who has this doubt can email me on bharadwaj.gorthy@gmail.com and I will try to help at the earliest
0 Kudos