Video Scaling is the process of changing the resolution (size) of a video stream. You can do this in 2 directions: up-scaling and down-scaling (or zoom in and zoom out if you prefer).
How do we do video scaling?
Let’s talk about the up-scaling operation. We can take as our example a video input with a resolution of 720p (1280x720). This means that each frame is made up of 720 rows of 1280 pixels, which means 961,600 pixels per frame.
If we have a video output with a resolution of 1080p (1920x1080), each output frame is made up of 1080 rows of 1920 pixels or 2,073,600.
This means that the output would have twice as many pixels as the input.
So where are the new pixels coming from?
We need to create the new pixels by estimating their value then adding them to the output frame.
Most algorithms scale one side, either vertically or horizontally, then the other.
For example, the scaler used in the Xilinx Video Processing IP (PG231) starts by scaling the frames vertically, then scales them horizontally.
Below is a quick example of what it would look like.
Imagine a 3x3 image (9 pixels) that you want to scale to 5x5 (25 pixels). This means that we need to recreate 16 pixels. Note that I am taking a really simplified case here.
If we start by doing vertical scaling this mean that we need to create 2 lines in-between the existing lines, or 6 pixels (where I have added the crosses below).
The simplest algorithm would be to take the mean value of each component (Red, Green, Blue) of the pixels above and below the one we are adding.
For example, let's take the first pixel, which is between the red and the green pixels.
A red pixel means the following values for its components (if 8 bits/per component): Red = 255, Green = 0 and Blue = 0.
And for the green pixel: Red = 0, Green = 255, Blue = 0.
Thus, if we take the mean value for each component of the new pixel we would get: Red = 127, Green = 127, Blue = 0 (a kind of yellow-green color).
If we do the same for all of the pixels this is what we would get after the upscaling:
I know what you are thinking. It does not look pretty.
But this is only a very simple algorithm, and it is not really common to find pictures with adjacent pixels so different from each other.
Now at least you understand the principle.
To get the 9x9 image you would need to repeat the process horizontally to create 10 new pixels (2 columns of 5 pixels).
What if we need to recreate more pixels than just one between 2 pixels?
In this case you would need to estimate the position of your output pixels compared to the input ones and then compute the coefficient according to this position (the closer an output pixel is to an input pixel, the more it will look like it).
The image below shows a horizontal scaling of 1.2x the input. The circles are the input pixels and the cross the output pixels:
Note that only 2 pixels of the line will get their value copied from the input as the crosses and circles are overlapping only in the corners.
The same principle applies for down-scaling.
Example of a scaler application with the VPSS (Scaler Only)
The attached example shows the Video Processing Subsystem configured as a Scaler only.