Blender Git Loki
February 23, 2021, 23:53 (GMT) |
"Merge" changes from D10394. |
February 23, 2021, 13:57 (GMT) |
VSE: Automatic proxy building Add `Proxy Setup` enum to user preferences with 3 choices: Manual, For Added strips and Automatic. With `For Added strips` Only build proxies when adding movie strips. With `Automatic` setting, proxies are built when preview size changes as well. TODO: - Decide what to do when workspace has multiple previews with different preview sizes. See `seq_get_preview_size()` - Solution may be to change current design to allow only one size or just build multiple sizes. Additional possible improvements: - Cancel running job if preview size is changed while building proxies - Use proxy of different size while building proxies for new size. it would be best to do after some refactoring, so in separate patch. ref T85469 Maniphest Tasks: T85469 Differential Revision: https://developer.blender.org/D10363 |
February 23, 2021, 13:54 (GMT) |
VSE: Simplify proxy settings - Remove Full Render size from VSE preview size. Use just 100% instead - Add Use Proxies checkbox to control whether proxies are used globally - Move preview size to top so it is most prominent - Set default to 100% preview size and use proxies (could be separate patch as well) Design task: T85469 {F9735445} No change has been done to individual strip setting as users may need to turn proxies on/off individually. I think it would be best if size selection will be managed when automatic proxy building is enabled. In that case proxy panel can be simplified a lot. This is probalby better to leave for separate patch Maniphest Tasks: T85469 Differential Revision: https://developer.blender.org/D10362 |
February 23, 2021, 13:54 (GMT) |
Improve proxy building performance ====Principle of operation==== Proxy rebuild job will spawn 2 threads that are responsible for reading packets from source file and writing transcoded packets to output file. This is done by functions `index_ffmpeg_read_packets()` and `index_ffmpeg_write_frames()`. These threads work rather quickly and don't use too much CPU resources. Transcoding of read packets is done in thread pool by function `index_ffmpeg_transcode_packets()`. This scheme is used because transcoded packets must be read and written in order as if they were transcoded in one single loop. Transcoding can happen relatively (see next paragraph) asynchronously. Because decoding must always start on I-frame, each GOP is fed to transcoding thread as whole. Some files may not have enough GOPs to be fed to all threads. In such case, performance gain won't be as great. This is relatively rare case though. According to FFmpeg docs some packets may contain multiple frames, but in such case only first frame is decoded. I am not sure if this is limitation of FFmpeg, or it is possible to decode these frames, but in previous proxy building implementation such case wasn't handled either. Similar as above, there is assumption that decoding any number of packets in GOP chunk will produce same number of output packets. This must be always true, otherwise we couldn't map proxy frames to original perfectly. Therefore it should be possible to increment input and output packet containers independently and one of them can be manipulated "blindly". For example sometimes decoded frames lag after packets in 1 or 2 or more steps, sometimes these are output immediately. It depends on codec. But number of packets fed to decoder must match number of frames received. Transcoding contexts are allocated only when building process starts. This is because these contexts use lot of RAM. `avcodec_copy_context()` is used to allocate input and output codec contexts. These have to be unique for each thread. Sws_context also needs to be unique but it is not copied, because it is needed only for transcoding. ====Job coordination==== In case that output file can not be written to disk fast enough, transcoded packets will accumulate in RAM potentially filling it up completely. This isn't such problem on SSD, but on HDD it can easily happen. Therefore packets are read in sync with packets written with lookahead. When building all 4 sizes for 1080p movie, writing speed averages at 80MBps. During operation packets are read in advance. Lookahead is number of GOPs to read ahead. This is because all transcoding threads must have packets to decode and each thread is working on whole GOP chunk. Jobs are suspended when needed using thread conditions and wake signals Threads will suspend on their own and are resumed in ring scheme: ``` read_packets -> transcode -> write_packets ^ | |______________________________| ``` In addition, when any of threads above are done or cancelled, they will resume building job to free data and finish building process. ====Performance==== On my machine (16 cores) building process is about 9x faster. Before I have introduced job coordination, transcoding was 14x faster. So there is still some room for optimization, perhaps wakeup frequency is too high or threads are put to sleep unnecessarily. --------- ====Code layout==== I am using `FFmpegIndexBuilderContext` as "root" context for storing contexts. Transcode job is wrapped in `TranscodeJob` because I need to pass thread number, that determines on which GOP chunks this job will work on. `output_packet_wrap` and `source_packet_wrap` wrap `AVPacket` with some additional information like GOP chunk number (currently `i_frame_segment`). These 2 structs could be consolidated which will simplify some auxilary logic. This is bit tricky part because sometimes `output_packet_wrap` must lag one step after `source_packet_wrap` and this needs to be managed when jumping between GOP chunks properly. Other than that, I am not super happy with amount of code and mere setup this patch adds. But it doesn't look like anything could be simplified significantly. ====Problems / TODO==== I am not aware of any bugs currently Maniphest Tasks: T85469 Differential Revision: https://developer.blender.org/D10394 |