I recently moved my files to a new zfs-pool and used that chance to properly configure my datasets.

This led me to discovering zfs-deduplication.

As most of my storage is used by my jellyfin library (~7-8Tb), which is mostly uncompressed bluray rips I thought I might be able to save some storage using deduplication in addition to compression.

Has anyone here used that for similar files before? What was your experience with it?

I am not too worried about performance. The dataset in question is rarely changed. Basically only when I add more media every couple of months. I also have overshot my cpu-target when originally configuring my server so there is a lot of headroom there. I have 32Gb of ram which is not really fully utilized either (but I also would not mind upgrading to 64 too much).

My main concern is that I am unsure it is useful. I suspect just because of the amount of data and similarity in type there would statistically be a lot of block-level duplication but I could not find any real world data or experiences on that.

  • Decipher0771
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    16 hours ago

    I think the universal consensus is that outside of a very specific use case: multiple VDI desktops that share the same image, ZFS dedupe is completely useless at best and will destroy your dataset at worst by causing to be unmountable on any system that has less RAM than needed. In every other use case, the savings are not worth the trouble.

    Even in the VDI use case, unless you have MANY copies of said disk images(like 5+ copies of each), it’s still not worth the increase in system resources needed to use ZFS dedupe.

    It’s one of those “oooh shiny” nice features that everyone wants to use, but will regret it nearly every time.