There are three versions of the algorithm based on speed setting of rav1e. Detailed description of each version is down below.
- Fast version - pixel-based version with improved threshold.
- Corresponds to speed level 10 of ravie
- Performs a downsampling
- Medium version - based on motion vectors with iproved threshold.
- Corresponds to speed level 7-9 of ravie
- Slow version - histogram metric with block-based approach.
- Corresponds to speed level 0-6 of ravie
|Version||F score on BBC Planet Earth||F score on open source videos|
|New fast version||0.7441||0.6652|
|Old fast version||0.6543||0.5951|
|New medium version||0.7802||0.7032|
|New slow version||0.9217||0.7504|
|Old slow version||0.7024||0.5628|
So the F score of the fast version is improved by 0.0898 on BBC and 0.0701 on open sorce videos.
So the F score of the slow version is improved by 0.2193 on BBC and 0.1876 on open sorce videos.
- Fast version is a simple calculation of the pixel-wise difference. For each corresponding pixel the difference of values is taken and summed up. The final dissimilarity metric is the average values of all pixels. I improve the old version by adjusting threshold and modifying the metric itself by calculating numerical differentive.
- Medium version is improved version of the old slow version with adaptive threshold. To build dissimilarity metric motion vectors between two consecutive frames are computed. Frames are divided into blocks and each block on the second frame is shifted by motion vector. Dissimilarity metric is the average difference between all blocks. I improve the old version by adjusting threshold and modifying the metric itself by calculating numerical differentive.
- Slow version block-based histogram metric. Frames are divided into non-overlapping blocks. Then the mean value of this histogram is calculated and compared with the value of the corresponding block. Dissimilarity metric is the average difference between all blocks.
Detailed analysis can be seen here:
|Version||Average FPS on BBC Planet Earth (360x288)||Average FPS on open source videos (1280x720)|
Here I want to show how I improved metric values in all version from the old ones.
Blue line represent values of algorithm's metric on frames, orange - threshold, gray lines represent if algorithm marked this frame as scene change.
Here is the example of the outcome on one of the videos:
The top picture shows original metric values, the bottom one shows metric after improvement.
You can see that the peaks with the scene changes became more distinct, so the threshold is easier to tune.
Here is a list of ideas that I implemented, but they turned out to be impractical:
- Slow version with motion vectors
- Each block is shifted by motion vectors. It slowed down algorithm even more and decreased F score.
- Combining medium version with the slow one
- The idea was to marked frames as scene change if one of version said so. Again, it slowed down algorithm and did not bring any gain to F score.
- Separate metric for flashes
- I implemented a few metrics for flashes detection and deployed them. But ofted flashes occurs several frames in a row and and contains scene change. Because of this it is difficult for algorithm to decide if these flashes contains scene change or not.
- Possible dependency on metric values (maximum value on past frames, mean and std values). Current threshold can perform worse then static threshold in some cases. The example is on the picture. It can be seen that threshold varies around the same value. If it took into account the mean value of past frames, for example, it would be more accurate:
- Version based on edge detection
- It can be useful to take into accound another feature of frames - object edges. Combining with existing versions it can boost F score
- Adjusting metric values according to nearest values. For example, by substructing the mean value of surrounding frames.
- Block-based metric improvement
- It may be useful to experiment with the blocks individually rather than just taking the mean value of all them all. For example, if difference between k blocks is near zero algorithm should'n mark this frame as scene change no matter what other blocks has. Or if k blocks have difference about maximum of possible value algorithm mark this frame as scene change no matter what other blocks has.
- Dowsampling for medium and slow versions:
- For videos with high resolution it may be considered to dowsample them to HD or so. This will significantly increase the speed, but will have a small impact on F score.