I've implemented a FIR filter in C. HLS is able to infer the multiply accumulate logic perfectly, so far so good. However I'd like to be able to specify a slower input rate to save DSP48s. So I use the allocation directive to limit the number of multiplies thinking no problem. However an unintentional consequence of this is that HLS has stopped generating multiply accumulates and instead multiplies and accumulates are separate. This ends up using a extra DSP48s to do the adds (or a ton of LUTs if I force the adds into LUTs). I then place my MAC into a separate function and use allocation on that function. That technically works but the latency increases drastically due to it now being a function. Inlining that function results in HLS ignoring the allocation directive. In short I'm not sure if there's a good way to do this. Any suggestions?