Fruitbak: efficient disk based backups

Fruitbak is a backup tool heavily inspired by BackupPC but written from the ground up in multithreaded Python.

Features

Getting started

The recommended way to install Fruitbak is using the pre-built packages. If you cannot use these for any reason, please see below for instructions on building from source.

Debian/Ubuntu

Configure the APT repository. Use these components: fruitbak avl.

Install the package:

apt update
apt install fruitbak

Alpine

Add the following line to ‘/etc/apk/repositories’:

@fruit https://www.fruit.je/alpine

Install the package:

apk update
apk add fruitbak@fruit

Installing from source

External dependencies

Installation instructions for these are out of scope for this document. The names of the Python packages are taken from PyPy, the links point to the project homepage.

Project libraries

Some parts of Fruitbak are split out into separate libraries. You will need to build these before you can use Fruitbak. Click the name to go to the download location and download the latest version.

libavl
autoreconf -fsi
./configure
make install
libhardhat
autoreconf -fsi
./configure
make install
hardhat-python
python3 setup.py install
hashset-python
python3 setup.py install
rsync-fetch-python
python3 setup.py install

Fruitbak itself

And finally, to install Fruitbak itself:

fruitbak
python3 setup.py install
autoreconf -fsi
./configure --sysconfdir=/etc
make install

Setting up shop

Now that you have installed the software, it's time to set it up.

User account

If you use the Debian packages a user account called ‘fruitbak’ was created for you. If not, you will have to create it yourself. Do this now (consult the documentation of your operating system for the proper way to create a system user). Fruitbak can run as root, but this is not recommended.

Storage directory

Create a directory, writable by the fruitbak user. This documentation will assume this directory is /var/lib/fruitbak

If you can create a separate filesystem for this directory, that would probably be a good idea. In fact, it would be beneficial to create separate filesystems for /var/lib/fruitbak and /var/lib/fruitbak/pool. The pool filesystem should be large enough to contain the file data you expect to back up. The parent filesystem can be much smaller, about 2% to 5% of the pool directory should be enough. If you can resize these filesystems—if you use LVM for example—start small and grow the filesystem as needed. Fruitbak was tested primarily with XFS as its filesystem but ext4 will work just fine as well.

Configuration files

Fruitbak's configuration files use Python syntax. Don't worry if you don't know Python, you can get a complete (if basic) working configuration with just simple assignments in the forms you see below.

The first configuration file you need to create is /etc/fruitbak/global.py. It will contain the information Fruitbak needs to find its storage directories as well as the user account it will run as. If /etc/fruitbak does not exist yet, you will need to create it. It should be readable for Fruitbak but not writable.

# The directory you just created.
rootdir = '/var/lib/fruitbak'

# The user the process runs as.
# Fruitbak switches to this user just after starting, so that you
# can conveniently start it as root without repercussions.
user = 'fruitbak'

Now it's time to configure a client. But first, a small intermezzo.

About hosts, backups and shares

In Fruitbak parlance (mostly borrowed from BackupPC) a host is a machine that must be backed up. A backup is what is created whenever you type fruitbak backup. Backups are identified by a number; each host has its own set of backups.

Lastly, each backup has a set of ‘shares’ (again, BackupPC terminology). These are probably most useful for backing up Windows volumes but you can also use it to back up each mountpoint of a unix system separately. That requires a bit more work however, so the examples below put everything in a single share.

Configuring hosts

The next file sets up defaults for all clients your want to back up. For example, the method with which these clients are contacted. The default behavior of Fruitbak is to back up files from the local filesystem of the machine it runs on. We will change that to remote backups later on, but for new we'll stick to local backups so we can get Fruitbak up and running quickly. Since we'll stick with the default, we do not need to add any configuration statements here. The configuration file still needs to exist though, so create an empty /etc/fruitbak/common.py file.

We'll create an example host called melon. It will have only one share which is the local /usr/lib directory of the machine that Fruitbak runs on. Feel free to select a different directory, as long as the fruitbak user has sufficient access permissions to read it and its contents. Fruitbak identifies hosts by the base name of the configuration file so put this content in /etc/fruitbak/host/melon.py:

shares = [
    {'name': 'example', 'path': '/usr/lib'},
]

Testing the configuration

The following command will create a backup of /usr/lib:

$ fruitbak backup melon

Let's take a look at the result:

$ fruitbak ls
Host name  Last backup          Duration  Index  Type  Level  Status
melon      2020-09-20 02:45:45    22.77s      0  full      0  done

$ fruitbak ls melon
Index  Start                End                  Duration  Type  Level  Status
    0  2020-09-20 02:45:45  2020-09-20 02:46:08    22.77s  full      0  ok

$ fruitbak ls melon 0
Name     Mount point  Start                End                  Duration  Status
example  /usr/lib     2020-09-20 02:45:47  2020-09-20 02:45:49    22.77s  done

$ fruitbak ls melon 0 example ''
total 115
drwxr-xr-x  root  root        6  2017-05-03 11:38:16  X11
drwxr-xr-x  root  root       95  2020-05-20 10:04:05  apt
[…]
drwxr-xr-x  root  root    94208  2020-09-09 23:57:44  x86_64-linux-gnu
$

In the last ls command we added an empty string at the end. This is how we tell Fruitbak that we wish to see the root of the share. When listing or retrieving stuff, Fruitbak is generally very casual about paths and slashes. For example, apt/methods, /apt/methods, apt/methods/ and ///apt///methods/// all refer to the same directory. Likewise, /, /// and ‘’ (the empty string) all refer to the same directory, that is, the root directory.

Setting up remote backups

Setting up remote backups requires working SSH connectivity. Fruitbak needs to be able to ssh from its user account to the remote machine, usually as root. Setting up SSH keys and authorized_keys files is outside the scope of this document, but let's check if we can make a connection by switching to the fruitbak account:

su - fruitbak

or:

sudo -u fruitbak -i

and issuing commands there:

ssh root@melon rsync --version

This should output something that starts with:

rsync  version 3.1.3  protocol version 31
Copyright (C) 1996-2018 by Andrew Tridgell, Wayne Davison, and others.
[…]

Configuring Fruitbak for remote backups

Fruitbak backups locally by default. To tell it that we wish to use our newly set up SSH to fetch file data, add these lines to /etc/fruitbak/host/melon.py:

from fruitbak.transfer import RsyncTransfer
transfer_method = RsyncTransfer

If you've already decided that all (or most) of your hosts will use rsync-over-ssh you can add these lines to /etc/fruitbak/common.py instead. You can always switch a single host back by setting transfer_method to LocalTransfer in its own configuration file.

You can test your new connection using the commands we used earlier.

Backing up the root filesystem

Backing up /usr/lib may be interesting for an example but it's probably not the data you care most about. Let's back up the root filesystem instead. Change the shares statement in /etc/fruitbak/host/melon.py to:

shares = [{
    'name': 'root',
    'path': '/',
}]

One gotcha there is that we must be careful to not back up things like /proc, /sys, etcetera. These are not real files and trying to back up some of the more esoteric contents there may even gum up the works completely. Add this statement to the same file:

excludes = {
    '/dev/',
    '/media/',
    '/mnt/',
    '/proc/',
    '/sys/',
    '/tmp/',
    'lost+found',
}

The trailing / means: exclude the contents of this directory but do back up the directory itself.

The leading / means that this exclusion is relative to the filesystem root. This makes no difference when we back up / but in the old configuration (where we backed up /usr/lib) it would cause /usr/lib/lost+found to be excluded but not, for example, /usr/lib/sys.

Nightly backups

To make sure your filesystems get backed up regularly we can add the neccessary commands to cron. We'll add two lines, one to do incremental backups on weekdays and Saturdays, and one to do a full backup every Sunday. We'll also tack on a command that cleans up older backups and purges the file data that belonged to those. You can put these lines in /etc/cron.d/fruitbak:

13 2 * * 1-6	root	fruitbak backup --all
13 2 * * 0	root	fruitbak backup --all --full && fruitbak gc

Common tasks

The most important task (but hopefully not the most common) is to restore files.

Restoring a single file

Restoring a single file can be as easy as:

fruitbak cat melon -1 root etc/fstab

In this command -1 refers to the most recent backup.

Restoring a directory tree

To restore melon's /etc directory, and everything under it, including all subdirectories and their contents:

fruitbak tar melon -1 root etc >/tmp/melon-etc.tar

Of course you can redirect the output to the original system without creating an intermediate file:

fruitbak tar melon -1 root etc | ssh root@melon tar xC /tmp

You will find the restored contents of the /etc directory in /tmp/etc on melon.

Finding files

If you're looking for a lost file but do not know the exact name or path, the FUSE filesystem may come in handy.

fruitbak fuse /var/lib/fruitbak/fuse

Here you can browse through all backups of all hosts. You can use the standard unix find command to find your lost files, copy them to their original location, etcetera. Be aware that though it is certainly possible to restore files this way, it is much slower than the fruitbak cat/tar commands above for large files and large directory trees.

Removing hosts and backups

If you just want Fruitbak to stop backing up a host, you can either remove the configuration file (/etc/fruitbak/host/melon.py in our example) or add auto = False to it. In both cases the already existing backups will remain available to restore from.

To also delete the backups you will need to look in /var/lib/fruitbak/host. Here you will find a directory for every host that has ever been backed up. To delete the backups for host melon, check if no other Fruitbak processes are running and then simply rm -r melon. This just removes the file metadata (the names of the files, their permissions, etcetera). To free the storage of the file contents you need to run fruitbak gc afterwards—but unless you're in a hurry you could just wait for the Sunday cron job to do that for you.

To delete a single backup of a host all you need to do is go one level deeper. You will find numbered directories that correspond directly to the backups of the host. Like above they can be removed using a simple rm -r.

Back to the index page

mail me