Higher-Order Motion Model for Temporal Frame Interpolation with Applications to Video Coding

D. Ruefenacht, R. Mathew, and D. Taubman
alt text 

Abstract

We have recently proposed a motion-centric temporal frame interpolation (TFI) method, called BAM-TFI, which is able to produce high quality interpolated frames under a constant velocity assumption. However, for objects that do not follow constant velocity motion, the predictions, although credible, will differ from the ‘‘true’’ target frames, leading to high prediction residuals. In this paper, we show how higher-order motion models can be incorporated into the BAM-TFI scheme to interpolate frames that better predict the target frames. This opens up the door to a seamless integration of TFI with a video coding scheme. Comparisons on a variety of both synthetic and natural video sequences highlight the benefits of a second-order motion model. We further integrate the proposed TFI scheme into HEVC; preliminary comparisons with HEVC show promising results.

Supplementary Material

All videos first show the original (subsampled) sequence, played at 24 frames per second, with frame repetition. We then show the output of the proposed BAM-TFI method (first order), where 7 frames are interpolated in between each pair of reference frames, without residual coding. As can be seen, the interpolated frames are highly credible in both videos.

The videos were compressed using the following ffmpeg command:

ffmpeg -framerate 24 -i input_files%03d.png -s:v 'width'x'height' -c:v libx264 \\
-profile:v high -crf 15 -pix_fmt yuv420p output_file.mp4

Surfer Jump

This sequence contains highly complex motion in the lower left part of the frame (splashing waves). While the optical flow cannot handle the high motion complexity, and in turns inaccurately predicts the ‘‘true’’ position of the splashes, the interpolated frames are highly credible.

Shields

The rate-distortion performance in the ‘‘Shields’’ sequence is much lower than for HEVC because the frames are interpolated with slight shifts, leading to high texture residuals that are very expensive to code. As can be seen in the video, BAM-TFI creates a very smooth zoom without any visual artefacts, that is hardly distinguishable from the original sequence, even without any residual coding.

References

[1] D. Ruefenacht, R. Mathew, and D. Taubman, ‘‘Higher-Order Motion Model for Temporal Frame Interpolation with Applications to Video Coding,’’ Accepted for publication at Picture Coding Symposium (PCS), 2016.
[2] D. Ruefenacht, R. Mathew, and D. Taubman, ‘‘Temporally Consistent High Frame-Rate Upsampling with Motion Sparsification,’’ IEEE International Workshop on Multimedia Signal Processing (MMSP), 2016. Preprint