Back when I was even less experienced in self-hosting I setup my media/backup server using a RAIDZ1 array and 3 x 8TB disks. It’s been running well for a while and I haven’t had any problems and no disk errors.

But today I read a post about ‘pool design rules’ stating that RAIDZ1 configurations should not have drives over 1TB because the chances of errors occurring during re-silvering are high. I wish I had known this sooner.

What can I do about this? I send ZFS snapshots to 2 single large (18TB) hardrives for cold backups, so I have the capacity to do a migration to a new pool layout. But which layout? The same article I referenced above says to not use RAIDZ2 or RAIDZ3 with any less than 6 drives…I don’t want to buy 3 more drives. Do I buy an additional 8TB drive (for a total of 4 x 8TB) and stripe across two sets of mirrors? Does that make any sense?

Thank you!

  • spencer
    link
    fedilink
    English
    arrow-up
    20
    ·
    3 months ago

    Honestly, if you’re doing regular backups and your ZFS system isn’t being used for business you’re probably fine. Yes, you are at increased risk of a second disk failure during resilver but even if that happens you’re just forced to use your backups, not complete destruction of the data.

    You can also mitigate the risk of disk failure during resilver somewhat by ensuring that your disks are of different ages. The increased risk comes somewhat from the fact that if you have all the same brand of disks that are all the same age and/or from the same batch/factory they’re likely to die from age around the same time, so when one disk fails others might be soon to follow, especially during the relatively intense process of resilvering.

    Otherwise, with the number of disks you have you’re likely better off just going with mirrors rather than RAIDZ at all. You’ll see increased performance, especially on write, and you’re not losing any space with a 3-way mirror versus a 3-disk RAIDZ2 array anyway.

    The ZFS pool design guidelines are very conservative, which is a good thing because data loss can be catastrophic, but those guidelines were developed with pools that are much larger than yours and for data in mind that is fundamentally irreplaceable, such as user generated data for a business versus a personal media server.

    Also, in general backups are more important than redundancy, so it’s good you’re doing that already. RAID is about maintaining uptime, data security is all about backups. Personally, I’d focus first on a solid 3-2-1 backup plan rather than worrying too much about trying to mitigate your current array suffering catastrophic failure.