I've been party recently to a few short discussions about using "Incremental" backups with BTRFS vs. a "Full" backup. I thought it would be good to have a discussion about the differences and maybe weigh in on the pluses/minuses.
I'm a big fan of using Incremental backups. Why?
Advantages
Disadvanatages
Initially, you must make a full backup of your subvolume. But from there you need not. You can send a partial backup of only the differences from the past backup to the current state. This is where the extra space may be consumed. In order for a partial backup to be calculated, there must be something to compare it to - specifically the previous backup. In order to use an incremental backup you must keep at least one previous backup snapshot.
For this discussion, I will refer to @home as the source subvolume (the one being backed up). All the backups will be named @home_backup along with a number so we can keep them straight. All snapshots/backups will be read-only as this is required for send|receive so I will leave out reference to read-only to save some words. Root access is required for this and I typically use "sudo -i" to begin a root session prior to making backups. I will also abbreviate the btrfs commands as allowed by the command line.
Here's the process for incremental backups;
Take a snapshot of @home as @home_backup1:
btrfs su sn -r @home @home_backup1
Send this snapshot to the backup file system;
btrfs send @home_backup1 | btrfs receive /mnt/backup/
Now you have a full backup of your @home subvolume. Next week I want to make a new backup so I take another snapshot;
btrfs su sn -r @home @home_backup2
But rather than sending the entire subvolume again, I will only send the difference;
btrfs send -p @home_backup1 @home_backup2 | btrfs receive /mnt/backup/
The "-p" switch here means "parent" as is @home_backup1 is the parent of @home_backup2
I now have my @home subvolume and 2 backup snapshots on my main filesystem;
@home @home_backup1 @home_backup2
and two subvolumes in my backup file system;
@home_backup1 @home_backup2
Here's the part you have to understand and keep your mind around:
The only unique subvolume is @home on the main file system (assuming any changes were made after sending @home_backup2). This will always be true.
@home_backup1 contains the data when the initial backup snapshot was made. @home_backup2 contains only the changes to @home that occurred during the following week - the differences from the time @home_backup1 was made to the time @home_backup2 was made. This is the same for both the main file system and the backup file system.
So looking at just @home_backup1 and @home_backup2 - remember, snapshots share file data where it overlaps. In other words, any files that are unchanged from the first snapshot to the second do not make the second snapshot larger. This explains why incremental backups are faster. Only changes are sent.
Once the send|receive operation is complete, you are free to delete the initial backup snapshots but you must retain both the latest snapshots on the main and (obviously) on the backup file systems. The reason is so that you may continue to send only a small portion of data each week. In other words, once @home_backup2 is received as a backup, you may delete @home_backup1 in both locations. Why? Because when you delete the "parent" snapshot, any shared file data in the "child" snapshot remains in place. Only the changes from @home_backup1 to @home_backup2 are actually deleted.
Putting numbers and times to the operation to make the concept clearer (these are totally made-up as actual times and sizes will vary wildly);
Full backup:
My @home subvolume of 60GB takes two hours to transmit.
If I do a full backup every week, it takes two hours every week to send the backup.
Incremental backup:
If the changes to @home are about 1GB every week - an incremental backup takes only 1/60th or 2 minutes each week to complete.
The additional commands to delete the previous backups take only a few seconds.
So you can see there may be a tremendous time savings. The downside? If you must keep the previous snapshot on your source file system to retain the capability to use incremental backups, there will be some additional space used by this extra subvolume. Exactly how much will depend on the changes made to the original subvolume and how long you go between backups. The great thing is: By simply retaining one previous backup snapshot, you can have a complete and constant backup without spending the time required to send all the data every time.
If you opt to keep several backups rather than delete all the previous backups, you retain a "rollback" ability to a specific week or to restore a file deleted some period ago. Using my example of a weekly backup, if you retain 5 backups you can go back a full month to recover an accidental deletion. Obviously, a long interval would usually mean larger backups so retaining a year's worth of backups might get cumbersome.
So should you use full or incremental backups? The choice depends on your backup strategy, the time interval between backups, the backup device (installed in your computer or external), and the type of data you're backing up. I use different strategies depending on my use. For example, I don't often add music or videos to my permanent collections so a bi-monthly backup without retaining any rollback is sufficient. My work documents folder undergoes changes almost daily so a month's worth of incremental backups protects me from accidental deletions.
I'm a big fan of using Incremental backups. Why?
Advantages
- Saves time - in some cases a considerable amount.
- Can act a "rollback" or secondary backup to prevent unintended file loss.
Disadvanatages
- Not as straight forward - you have to keep your mind wrapped around what you're doing.
- Takes a few more commands (assuming you're using the command line to do this rather than a script).
- May take up more space on your source file system than a simple or "Full" backup.
Initially, you must make a full backup of your subvolume. But from there you need not. You can send a partial backup of only the differences from the past backup to the current state. This is where the extra space may be consumed. In order for a partial backup to be calculated, there must be something to compare it to - specifically the previous backup. In order to use an incremental backup you must keep at least one previous backup snapshot.
For this discussion, I will refer to @home as the source subvolume (the one being backed up). All the backups will be named @home_backup along with a number so we can keep them straight. All snapshots/backups will be read-only as this is required for send|receive so I will leave out reference to read-only to save some words. Root access is required for this and I typically use "sudo -i" to begin a root session prior to making backups. I will also abbreviate the btrfs commands as allowed by the command line.
Here's the process for incremental backups;
Take a snapshot of @home as @home_backup1:
btrfs su sn -r @home @home_backup1
Send this snapshot to the backup file system;
btrfs send @home_backup1 | btrfs receive /mnt/backup/
Now you have a full backup of your @home subvolume. Next week I want to make a new backup so I take another snapshot;
btrfs su sn -r @home @home_backup2
But rather than sending the entire subvolume again, I will only send the difference;
btrfs send -p @home_backup1 @home_backup2 | btrfs receive /mnt/backup/
The "-p" switch here means "parent" as is @home_backup1 is the parent of @home_backup2
I now have my @home subvolume and 2 backup snapshots on my main filesystem;
@home @home_backup1 @home_backup2
and two subvolumes in my backup file system;
@home_backup1 @home_backup2
Here's the part you have to understand and keep your mind around:
The only unique subvolume is @home on the main file system (assuming any changes were made after sending @home_backup2). This will always be true.
@home_backup1 contains the data when the initial backup snapshot was made. @home_backup2 contains only the changes to @home that occurred during the following week - the differences from the time @home_backup1 was made to the time @home_backup2 was made. This is the same for both the main file system and the backup file system.
So looking at just @home_backup1 and @home_backup2 - remember, snapshots share file data where it overlaps. In other words, any files that are unchanged from the first snapshot to the second do not make the second snapshot larger. This explains why incremental backups are faster. Only changes are sent.
Once the send|receive operation is complete, you are free to delete the initial backup snapshots but you must retain both the latest snapshots on the main and (obviously) on the backup file systems. The reason is so that you may continue to send only a small portion of data each week. In other words, once @home_backup2 is received as a backup, you may delete @home_backup1 in both locations. Why? Because when you delete the "parent" snapshot, any shared file data in the "child" snapshot remains in place. Only the changes from @home_backup1 to @home_backup2 are actually deleted.
Putting numbers and times to the operation to make the concept clearer (these are totally made-up as actual times and sizes will vary wildly);
Full backup:
My @home subvolume of 60GB takes two hours to transmit.
If I do a full backup every week, it takes two hours every week to send the backup.
Incremental backup:
If the changes to @home are about 1GB every week - an incremental backup takes only 1/60th or 2 minutes each week to complete.
The additional commands to delete the previous backups take only a few seconds.
So you can see there may be a tremendous time savings. The downside? If you must keep the previous snapshot on your source file system to retain the capability to use incremental backups, there will be some additional space used by this extra subvolume. Exactly how much will depend on the changes made to the original subvolume and how long you go between backups. The great thing is: By simply retaining one previous backup snapshot, you can have a complete and constant backup without spending the time required to send all the data every time.
If you opt to keep several backups rather than delete all the previous backups, you retain a "rollback" ability to a specific week or to restore a file deleted some period ago. Using my example of a weekly backup, if you retain 5 backups you can go back a full month to recover an accidental deletion. Obviously, a long interval would usually mean larger backups so retaining a year's worth of backups might get cumbersome.
So should you use full or incremental backups? The choice depends on your backup strategy, the time interval between backups, the backup device (installed in your computer or external), and the type of data you're backing up. I use different strategies depending on my use. For example, I don't often add music or videos to my permanent collections so a bi-monthly backup without retaining any rollback is sufficient. My work documents folder undergoes changes almost daily so a month's worth of incremental backups protects me from accidental deletions.
Comment