FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

Gork@lemm.ee · 6 months ago

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

Avid Amoeba · edit-2 6 months ago

This is the right way to optimize performance. Write everything in a decent higher level language, to achieve good maintainability. Then profile for hotspots, separate them in well defined modules and optimize the shit out of them, even if it takes assembly inlining. The ugly stays its own box and you don’t spend time optimizing stuff that doesn’t need optimization.

andyburke@fedia.io · 6 months ago

This person programs. ☝️ 🤝

mEEGal@lemmy.world · 6 months ago

this person optimizes

lol@discuss.tchncs.de · edit-2 6 months ago

deleted by creator

MorphiusFaydal@lemmy.world · 6 months ago

7000 series run AVX512 as two 256 bit data paths, while the 9000 series has a native 512 bit data path for AVX512.

Decipher0771 · 6 months ago

Yes, but it’ll likely still be faster, just not as dramatically. Half of 4-94x is still 2-47x faster.

InverseParallax@lemmy.world · 6 months ago

I mean why not, that worked out perfectly fine for bulldozer…

chellomere@lemmy.world · 6 months ago

Yeah 7000-series Ryzen benefits from the avx512 code paths in ffmpeg. I’ve benchmarked a 5900x vs a 7900x specifically for software H.265 decoding and there was a sizeable difference.

chellomere@lemmy.world · edit-2 6 months ago

This is great, but the context is that this is for specific inner loops, and it is compared to the C version of that specific inner loop. Typically what was used before this on a computer with avx512 was the avx2 version of the inner loop, and the speedup compared to that version appears to be up to 60%: https://x.com/FFmpeg/status/1852542388851601913 . Then as not a specific inner loop isn’t run all the time, the speedup is probably much less than 60%. This is still sizeable, but the actual speedup in practice with this implementation is far far from 94x.

0x0@programming.dev · 6 months ago

Unsung heroes.

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

FFmpeg devs boast of up to 94x performance boost after implementing handwritten AVX-512 assembly code

archive.ph