Snapper and btrfs disk usage on openSUSE Tumbleweed

Section: Linux

Since moving to openSUSE Tumbleweed (openSUSE's perpetually updated "rolling distro"), I decided to experiment with more modern file systems that support snapshots. This is particularly useful with a rolling distribution, as automated snapshots when installing packages allow you to roll back when an update goes wrong!

Getting started

When installing openSUSE Tumbleweed then I chose btrfs for the root file system and let openSUSE define the appropriate sub-volumes. I've kept my home file system on EXT4 for now, because I've never had any problems with it and I don't need to "undo" actions on it in quite the same way.

The default configuration for openSUSE with btrfs and Snapper automatically configures your root partition for snapshots when Zypper runs to install or remove packages. All sub-volumes are excluded from snapshots. This allows you to roll back a bad update without rolling back your database at the same time.

Configuring Snapper

Before you start working with Snapper and quotas, you need to enable the appropriate configuration. To do this, run sudo snapper setup-quota. This should enable "quota groups" on btrfs and make Snapper set the group ID when creating snapshots. To check current quota usage, run sudo btrfs qgroup show -p /.

The default root configuration is stored in /etc/snapper/configs/root. The default configuration is normally enough, but you may want to adjust how long snapshots are kept for and enable automated cleanup. My settings are currently set for five "number" snapshots with a minimum age of 1800 seconds (30 minutes), three hourly, three daily and four weekly "timeline" snapshots with a minimum age of 1800 seconds, and pre/post snapshots with a minimum age of 1800 seconds. My space limit is set to 0.4.

Understanding quotas and disk usage

If you run sudo snapper list then it will list all snapshots, their date and their type. It won't tell you how much space they use, though. There are requests and scripts available, but it is trivial to work it out yourself with a few commands and it helps you understand how disk usage works.

Snapper snapshots are built on btrfs subvolumes. Running sudo btrfs qgroup show -p / shows something like the following:

qgroupid rfer excl parent -------- ---- ---- ------ 0/5 16.00KiB 16.00KiB --- 0/257 9.38GiB 2.08MiB --- 0/258 108.61MiB 108.61MiB --- 0/259 16.00KiB 16.00KiB --- 0/260 16.00KiB 16.00KiB --- 0/261 3.42MiB 3.42MiB --- 0/272 2.21MiB 2.21MiB --- 0/856 9.38GiB 2.35GiB --- 0/870 9.28GiB 53.14MiB --- 0/871 9.32GiB 54.17MiB --- 0/873 9.22GiB 1.59MiB --- 0/874 9.31GiB 11.94MiB --- ⋮ 1/0 16.77GiB 7.40GiB 0/856,0/870,0/871,0/873,0/874,0/875,0/877,0/878,0/879,0/880,0/881,0/882,0/883,0/884,0/885,0/886

Note that not all of these "quota groups" are snapshots. The first few are rather important subvolumes!

To find out which part of your disk or which snapshot each quota group relates to, run sudo btrfs subvolume list /

ID 257 gen 26013 top level 5 path @ ID 258 gen 25855 top level 257 path opt ID 259 gen 25855 top level 257 path srv ID 260 gen 25855 top level 257 path boot/grub2/i386-pc ID 261 gen 25855 top level 257 path boot/grub2/x86_64-efi ID 272 gen 26006 top level 257 path .snapshots ID 856 gen 25520 top level 272 path .snapshots/540/snapshot ID 870 gen 25728 top level 272 path .snapshots/551/snapshot ID 871 gen 25740 top level 272 path .snapshots/552/snapshot ID 873 gen 25786 top level 272 path .snapshots/554/snapshot ID 874 gen 25793 top level 272 path .snapshots/555/snapshot ⋮

This shows that 5 is the main volume (because it is only listed as a "top level" in line 1 for the root), all of the IDs in the 200 range are system sub-volumes, and everything from 856 and above is a snapshot. Note, though, that the quota group ID (on the left) and the snapshot ID used by Snapper (in the path, on the right) are not identical.

These snapshot IDs can then be examined in Snapper's output from sudo snapper list

Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+-----+-------+------------------------------+------+----------+--------------+-------------- single | 0 | | | root | | current | single | 540 | | Mon 15 Jan 2018 20:15:02 GMT | root | timeline | timeline | pre | 551 | | Wed 17 Jan 2018 19:51:39 GMT | root | number | zypp(zypper) | important=yes post | 552 | 551 | Wed 17 Jan 2018 19:58:23 GMT | root | number | | important=yes pre | 554 | | Thu 18 Jan 2018 18:46:29 GMT | root | number | zypp(zypper) | important=yes post | 555 | 554 | Thu 18 Jan 2018 18:49:52 GMT | root | number | | important=yes ⋮

As you can see, some snapshots are labelled "single" (one-off) and others are labelled "pre" and "post" (Zypper software installations). Each type uses a different cleanup mechanism by default.

So, to follow this through:

  1. On Wednesday 17th January, I installed some updates using Zypper. This:
    1. Took a "pre" snapshot (551 @ 19:51:39 GMT)
    2. Took a "post" snapshot (552 @ 19:58:23 GMT)
  2. Snapshots 551 and 552 are both "important" (marked to be kept longer when cleaning up)
  3. Snapshots 551 and 552 will be cleaned up using the "number" algorithm
  4. Based on the subvolume paths, snapshots 551 and 552 relate to qgroups 870 and 871
  5. Based on the qgroup output:
    • Snapshot 551/qgroup 870 shares 9.28GiB of storage with at least one other snapshot (rfer)
    • Snapshot 551/qgroup 870 has 53.14MiB of data that is unique to that snapshot (excl)
    • Snapshot 552/qgroup 871 has 9.32GiB and 54.17MiB shared and unique respectively

Using qgroups to recover space

Once the cleanup jobs are enabled in your config then Snapper should normally recover space by cleaning up snapshots regularly using a systemd job. This can also be forced by running sudo snapper cleanup number to delete Zypper pre/post snapshots and sudo snapper cleanup timeline to delete timeline snapshots.

However, by understanding what the qgroups show then we can work out how to recover disk space when we run low and the cleanup process hasn't automatically freed up enough space (which can be a problem with rolling distros with small root partitions, where the regular updates cause lots of churn that leads to lots of files that are unique to each pre/post snapshot)

In the example output above, we can see that we wouldn't gain much by deleting snapshot 554/qgroup 873, because it only has 1.59MiB of exclusive file system data. However, if we delete the one-off snapshot 540 then we will delete qgroup 856, which will currently has 2.35GiB of unique files!

To test this out, we can check the disk usage before with df -h /

Filesystem Size Used Avail Use% Mounted on /dev/mapper/main-root 21G 18G 3.5G 84% /

And then delete the snapshot with sudo snapper delete 540 (note that we use the Snapper ID, not the qgroup, and that we always let Snapper delete its snapshot and the underlying qgroup rather than deleting the qgroup directly with btrfs tools).

If we check the disk usage again then we should see that it has come down:

Filesystem Size Used Avail Use% Mounted on /dev/mapper/main-root 21G 15G 5.7G 73% /

(Note: This may take a few seconds to show a difference, as the file system deletes the files)

Conclusion

In general, Snapper should Just Work™ once enabled and should manage itself once the quotas are enabled. However, I found that a lot of documentation made assumptions about how much you knew of btrfs and Snapper when talking about quotas and automated cleanup.

Hopefully you should now be able to use Snapper and keep your disk usage under control!

Navigation