At the beginning of this week I started implementing changes to the scene change detector threshold. I made its values lower, and also changed the behavior when using max_keyint anf min_keyint options (before algorithm chose non-optimal frames, after algorithm chooses frames with highest metric value). Before I finished implementing other strategies for threshold, I found out about update in algorithm.
After that I analyzed new version of the scene change detector of the rav1e.
- On 4k videos fast version of the algororithm works better than slow version. The x-axis shows the number of the video in the dataset, the y-axis shows F score of the algorithm. Blue line is the fast version, yellow is the slow version of the algorithm. It can be seen that fast version shows much better results.
But on BBC Planet Earth dataset slow version shows better results.
You can see from the table that fast version has similar recall score to slow version but worse precision.
So I will try to improve the precision by increasing the base threshold. In fast version there is no adaptive threshold either so I will implement and experiment with it.
Definition of F score, precision and recall can be seen in wikipedia. In short, the higher the precision, the less amount of false positive frames, the higher the recall, the less amount of misses by the algorithm. The F score performs as a balance metric between precision and recall.
The example of how low the base threshold for fast version is.
The blue line is the metric value, the orange one is the threshold. The vertical grey lines shows the scene changes. On the first picture the grey lines is the ground truth, on the second is predicted scene changes be the algorithm. As you can see on the second picture there are a lot of false positives. If the threshold value was aroud 20-24, the precision and F score would be a lot higher.
On the other hand, for the slow version of the algorithm threshold is still too high. It can be seen from these two pictures.
The concept is similar to the pictures above except for the version of the algorithm and used video. It can be senn that if the threshold was lower the algorithm would have higher F score.
Examples with other videos:
A similar problem is observed in the rest of the video.
On the average, the speed of the fast version is 1.3 times more that the speed of the slow version.
The new version of the algorithm is better than the old one by about 0.05-0.1 in terms of F score. Based on the results of the analysis, it can be improved even further.