For your first two questions:
1. How to deal with the front and end pixel, when doing a 1-D x-filter(1,k) with FIR filter?
One approach is to essentially assume there is a "mirror" on each edge of the image and create pixels as necessary to fill the filter pipeline. For example if you need 3 extra pixels at the beginning and end of a 10 pixel row, whre your original data is p1, p2, p3, ... p9, p10, then you would run the filter on:
p3, p2, p1, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p10, p9, p8
2. How to apply a 1-D y-filter(k,1) to the input (m,n) image or a time serialized (1, m*n) image?
While the X filtering is simple in a rasterized image, Y filtering generally requires line buffers. It also means you'll need to use multiplier based filters rather than distributed arithmetic, because the filter is working on a different column on each successive clock cycle. The only other approach is to have a filter per column, which can require a lot of logic if the image size is large.
-- Gabor