It’s fairly obvious why stopping a service while backing it up makes sense. Imagine backing up Immich while it’s running. You start the backup, db is backed up, now image assets are being copied. That could take an hour. While the assets are being backed up, a new image is uploaded. The live database knows about it but the one you’ve backed up doesn’t. Then your backup process reaches the new image asset and it copies it. If you restore this backup, Immich will contain an asset that isn’t known by the database. In order to avoid scenarios like this, you’d stop Immich while the backup is running.

Now consider a system that can do instant snapshots like ZFS or LVM. Immich is running, you stop it, take a snapshot, then restart it. Then you backup Immich from the snapshot while Immich is running. This should reduce the downtime needed to the time it takes to do the snapshot. The state of Immich data in the snapshot should be equivalent to backing up a stopped Immich instance.

Now consider a case like above without stopping Immich while taking the snapshot. In theory the data you’re backing up should represent the complete state of Immich at a point in time eliminating the possibility of divergent data between databases and assets. It would however represent the state of a live Immich instance. E.g. lock files, etc. Wouldn’t restoring from such a backup be equivalent to kill -9 or pulling the cable and restarting the service? If a service can recover from a cable pull, is it reasonable to consider it should recover from restoring from a snapshot taken while live? If so, is there much point to stopping services during snapshots?

  • Avid AmoebaOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    I dump the db too.

    With that said if backing up the raw files of a db while the service is stopped can produce a bad backup, I think we have bigger problems. That’s because restoring the raw files and starting the service is functionally equivalent to just starting the service with its existing raw files. If that could cause a problem then the service can’t be trusted to be stopped and restarted either. Am I wrong?

    • johntash@eviltoast.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      I was talking about dumping the database as an alternative to backing up the raw database files without stopping the database first. Taking a filesystem-level snapshot of the raw database without stopping the database first also isn’t guaranteed to be consistent. Most databases are fairly resilient now though and can recover themselves even if the raw files aren’t completely consistent. Stopping the database first and then backing up the raw files should be fine.

      The important thing is to test restoring :)