I wanted to re-align my SSDs partitioning (actually, create some). Problem was I had both disks in use as a single btrfs filesystem without any partitioning. Not being one to take the traditional path - backup, partition, re-install, restore - I decided to see if I could do this while still using the system. I didn't want Vinny to have more fun than me, so I did it
Please remember I did all this while running from the install on these disks and without rebooting;
First, I deleted almost all snapshots and an old install I no longer had interest in. This minimized the amount of data that had to be moved and insured that there was enough space to hold all the data on a single disk.
Then I removed a device (SSD1) using "btrfs device delete...".
Once complete (an hour, maybe less), I partitioned SSD1 to my liking and then added the free space no in a partition back to the btrfs volume with "btrfs device add..."
A while later I saw the operation was complete. Now here I made a mistake: I assumed I could convert to RAID1 - thus coping the data to both devices - and then just delete SSD2. This isn't allowed (although I can't see why not) so I wasted the time on the balance command only to have to do it all again. This time, I converted it to "single" using "btrfs balance -dsingle -msingle..." and then deleted device SSD2. This moved all data to SSD1. In the future, I would just add the device, then delete the other without a balance in between. The add operation doesn't move data by itself, but the delete operation would have moved all the data automatically.
Then I partitioned SSD2 to my liking and added the partition back to the btrfs filesystem with another "device add" command. I re-balanced to RAID0 for data and RAID1 for metadata as this is recommended, but I'm not sure why. I guess if a drive failed, you could recover partial files. Maybe it's for performance. Done.
One note: Although I did this while using the system, the device deleting and balance commands take a considerable amount of computing power. My system ran noticeably much slower.
Error catching
After this was completed, I decided to check the file system. This is the only btrfs operation you can't do while it's mounted. So I booted to an alternate install on a hard drive (a live CD/USB would work just as well) and ran:
btrfs check <DEVICE>where device is the partition (or whole disk volume) to check.
and to my surprise I had 10 errors reported that looked like this:
root 1215 inode 234095 errors 400, nbytes wrong
The error repeated 10 times with 5 having the root ID 1215 and 5 having 1321, and 5 different inode IDs each repeated on both root IDs. In other words; five bad inodes in two subvolumes. This means I had five files with possible corruption in two different subvolumes. But since the inode IDs matched - it clearly was a volume and a snapshot of that volume - so really only 5 files were affected. This might have come from the earlier operations I outlined above, but it's just as likely that they were caused by several lockups I've had recently requiring a hard reset.
Btrfs does have a repair utility but it's not often useful and sometimes not even recommended for use. It's really still under development. I tried it anyway but it didn't change anything. I re-mounted the filesystem and now I needed to see what files were at issue. This command shows the exact file that has the error:
btrfs inspect-internal inode-resolve <INODE> <PATH>
with INODE being the inode ID from the check error and path the location where the filesystem is mounted. All five files were small text files in /var/lib/bluetooth/<MAC ADDRESS>/. Visual inspection showed no real corruption of the files, so I tried simply copied the files off the drive, deleted the originals, and copied them back. Volia, I ran the check again and the errors were gone. Had they been more critical files or larger I might have taken a different approach, but this worked and was easy for these small text files.
Please remember I did all this while running from the install on these disks and without rebooting;
First, I deleted almost all snapshots and an old install I no longer had interest in. This minimized the amount of data that had to be moved and insured that there was enough space to hold all the data on a single disk.
Then I removed a device (SSD1) using "btrfs device delete...".
Once complete (an hour, maybe less), I partitioned SSD1 to my liking and then added the free space no in a partition back to the btrfs volume with "btrfs device add..."
A while later I saw the operation was complete. Now here I made a mistake: I assumed I could convert to RAID1 - thus coping the data to both devices - and then just delete SSD2. This isn't allowed (although I can't see why not) so I wasted the time on the balance command only to have to do it all again. This time, I converted it to "single" using "btrfs balance -dsingle -msingle..." and then deleted device SSD2. This moved all data to SSD1. In the future, I would just add the device, then delete the other without a balance in between. The add operation doesn't move data by itself, but the delete operation would have moved all the data automatically.
Then I partitioned SSD2 to my liking and added the partition back to the btrfs filesystem with another "device add" command. I re-balanced to RAID0 for data and RAID1 for metadata as this is recommended, but I'm not sure why. I guess if a drive failed, you could recover partial files. Maybe it's for performance. Done.
One note: Although I did this while using the system, the device deleting and balance commands take a considerable amount of computing power. My system ran noticeably much slower.
Error catching
After this was completed, I decided to check the file system. This is the only btrfs operation you can't do while it's mounted. So I booted to an alternate install on a hard drive (a live CD/USB would work just as well) and ran:
btrfs check <DEVICE>where device is the partition (or whole disk volume) to check.
and to my surprise I had 10 errors reported that looked like this:
root 1215 inode 234095 errors 400, nbytes wrong
The error repeated 10 times with 5 having the root ID 1215 and 5 having 1321, and 5 different inode IDs each repeated on both root IDs. In other words; five bad inodes in two subvolumes. This means I had five files with possible corruption in two different subvolumes. But since the inode IDs matched - it clearly was a volume and a snapshot of that volume - so really only 5 files were affected. This might have come from the earlier operations I outlined above, but it's just as likely that they were caused by several lockups I've had recently requiring a hard reset.
Btrfs does have a repair utility but it's not often useful and sometimes not even recommended for use. It's really still under development. I tried it anyway but it didn't change anything. I re-mounted the filesystem and now I needed to see what files were at issue. This command shows the exact file that has the error:
btrfs inspect-internal inode-resolve <INODE> <PATH>
with INODE being the inode ID from the check error and path the location where the filesystem is mounted. All five files were small text files in /var/lib/bluetooth/<MAC ADDRESS>/. Visual inspection showed no real corruption of the files, so I tried simply copied the files off the drive, deleted the originals, and copied them back. Volia, I ran the check again and the errors were gone. Had they been more critical files or larger I might have taken a different approach, but this worked and was easy for these small text files.
Comment