DEV Community

Aleksandr Gushchin
Aleksandr Gushchin

Posted on • Updated on

New Scene Change Detector version

There are three versions of the algorithm based on speed setting of rav1e. Detailed description of each version is down below.

  • Fast version - pixel-based version with improved threshold.
    • Corresponds to speed level 10 of ravie
    • Performs a downsampling
  • Medium version - based on motion vectors with iproved threshold.
    • Corresponds to speed level 7-9 of ravie
  • Slow version - histogram metric with block-based approach.
    • Corresponds to speed level 0-6 of ravie

Results

Version F score on BBC Planet Earth F score on open source videos
New fast version 0.7441 0.6652
Old fast version 0.6543 0.5951
New medium version 0.7802 0.7032
New slow version 0.9217 0.7504
Old slow version 0.7024 0.5628

So the F score of the fast version is improved by 0.0898 on BBC and 0.0701 on open sorce videos.
So the F score of the slow version is improved by 0.2193 on BBC and 0.1876 on open sorce videos.

Desciption of each version

  • Fast version is a simple calculation of the pixel-wise difference. For each corresponding pixel the difference of values is taken and summed up. The final dissimilarity metric is the average values of all pixels. I improve the old version by adjusting threshold and modifying the metric itself by calculating numerical differentive.
  • Medium version is improved version of the old slow version with adaptive threshold. To build dissimilarity metric motion vectors between two consecutive frames are computed. Frames are divided into blocks and each block on the second frame is shifted by motion vector. Dissimilarity metric is the average difference between all blocks. I improve the old version by adjusting threshold and modifying the metric itself by calculating numerical differentive.
  • Slow version block-based histogram metric. Frames are divided into non-overlapping blocks. Then the mean value of this histogram is calculated and compared with the value of the corresponding block. Dissimilarity metric is the average difference between all blocks.

Results and examples of each version

Slow version

The slow version is marked in the legend as "with blocks". "Without blocks" is the similar metric but without division of frames into blocks.
Results on BBC dataset:
Alt Text
Alt Text
Results on open-source videos:
Alt Text
Alt Text

Medium version

Here you can see charts of performance (F score, precision and recall) of the algorith depending on the threshold. These results were obtained with open-sourced videos from youtube.com and vimeo.com
Alt Text
Alt Text
Alt Text

Here also the results on BBC Planet Earth datset:
Alt Text
Alt Text
Alt Text
And precision-recall curve:
Alt Text

Fast version

Here you can see experiments with threshold for fast version. The bold line represents old fast version, the bottom line here is the old slow version of the algorithm:
Alt Text

Detailed analysis can be seen here:

Speed

Version Average FPS on BBC Planet Earth (360x288) Average FPS on open source videos (1280x720)
Fast version 234 22
Medium version 222 18
Slow version 156 13

Overall metric improvement

Here I want to show how I improved metric values in all version from the old ones.
Blue line represent values of algorithm's metric on frames, orange - threshold, gray lines represent if algorithm marked this frame as scene change.
Here is the example of the outcome on one of the videos:
Alt Text
The top picture shows original metric values, the bottom one shows metric after improvement.
You can see that the peaks with the scene changes became more distinct, so the threshold is easier to tune.

Unsuccessful ideas

Here is a list of ideas that I implemented, but they turned out to be impractical:

  • Slow version with motion vectors
    • Each block is shifted by motion vectors. It slowed down algorithm even more and decreased F score.
  • Combining medium version with the slow one
    • The idea was to marked frames as scene change if one of version said so. Again, it slowed down algorithm and did not bring any gain to F score.
  • Separate metric for flashes
    • I implemented a few metrics for flashes detection and deployed them. But ofted flashes occurs several frames in a row and and contains scene change. Because of this it is difficult for algorithm to decide if these flashes contains scene change or not.

Possible improvements:

  • Threshold
    • Possible dependency on metric values (maximum value on past frames, mean and std values). Current threshold can perform worse then static threshold in some cases. The example is on the picture. It can be seen that threshold varies around the same value. If it took into account the mean value of past frames, for example, it would be more accurate: Alt Text
  • Version based on edge detection
    • It can be useful to take into accound another feature of frames - object edges. Combining with existing versions it can boost F score
  • Metric
    • Adjusting metric values according to nearest values. For example, by substructing the mean value of surrounding frames.
  • Block-based metric improvement
    • It may be useful to experiment with the blocks individually rather than just taking the mean value of all them all. For example, if difference between k blocks is near zero algorithm should'n mark this frame as scene change no matter what other blocks has. Or if k blocks have difference about maximum of possible value algorithm mark this frame as scene change no matter what other blocks has.
  • Dowsampling for medium and slow versions:
    • For videos with high resolution it may be considered to dowsample them to HD or so. This will significantly increase the speed, but will have a small impact on F score.

Discussion (0)