Journal

Writing down the things I learned. To share them with others and my future self.

08 Jan 2021

My Personal Backup Design

Compute devices may fail and lose your data. Hardware failures, software bugs and encrypting trojans threaten your data. And even if your computers don’t lose your data, a human error can erase a lot of data quickly. Without a backup you are either doomed to redo a lot of work or the precious pictures of the last years are lost forever. In this blogpost I want to show you the design goals I use to design my backup strategy. I will start with the goals itself and will outline my reasoning later and describe my implementation of this design.

Design Goals

  • Have two backups, an onsite and an offsite backup
  • Encrypt at least your offsite backup
  • Be able to restore your offsite backup without any of your personal computing devices
  • Use different backup tools for your onsite and offsite backups
  • Script your backups
  • Test a restore regularly
  • Synchronisation without version history is not a backup

Reasoning

The two kinds of backups are used to protect your data from different threads and enable restores on different time scales. The onsite backup protects your data from hardware failures, software bugs, encrypting trojans or human errors. The local backup target may be an external USB harddrive or a NAS. Since your backup target device is local to the source of the data, the network or USB speeds are quite high. This allows for a high frequency of backups. The frequency of backups are between every couple of minutes up to every day. The restore time is low since we can achieve a couple hundert MBit/s up to GBit/s over local network or USB connections.

In contrast the offsite backup is often connected with way lower speeds to your home. The offsite backup protects your data from a total loss of your home. Common causes of a total loss of your home is a fire, earth quake or if your local computing hardware is in the basement, a flood. As an offsite backup target you can either use a service like Backblaze, Amazon S3 Glacier or rsync.net via you internet connection or you can store a (USB) harddrive at a friends house or at work. An external harddrive costs no recuring fee, but the creation of a backup involves alot of repetitive manual labor. Thus it requires alot of discipline to get a regular backup. Using a service via the internet allows you to automate the complete backup process.

Regardless of your of your offsite backup target, you want to encrypt your backup since you can’t physical prevent any unauthorized third party from reading your data. In case of the harddrive backup target you can either encrypt the drive itself or use a tool encrypting your files in the filesystem. Most modern backup tools like restic or borg backup the data encrypted by default. Moreover they support to upload the backup directly to services like Backblaze, Amazon S3 Glacier or rsync.net via their native APIs.

Encrypting your backup data and using an external service requires you to memorize some credentials. Those credentials are required to access your data at the external service itself, and you need credentials to decrypt your backup. Todays best practice is to use a password manager to use completly random passwords for every service. As discussed earlier, an offsite backup protects your data from a total loss of your home. In this case it is very likely that you not only use your house but also all of your personal computing devices. Therefore you must be able to restore your offsite backup without any of your personal computing devices in your home. If you use a hardware token to access your password manager – as a service or an offline password manager like keepass – you should have a backup hardware token stored somewhere outside of your home. Hint: Storing the hardware token in the garage instead of your home, may not be the best idea. Disasters like fire, earthquake and floods are not limited to your home. Storing the credentials to your backup somewhere else introduce a conflict with your OpSec. This tradeoff can only made by yourself. If you manage the backup for multiple members of your familie or friends, consider to share the secrets required to restore the backup. Natural disasters threatens not only your data but also your life and thus the capability to decrypt your backup. If you don’t want to trust a single person, you can use Shamir's Secret Sharing to distribute parts of the secret to trusted persons. The Shamir’s Secret Sharing allows the reconstruction of the secret only if a certain number of persons combine their part of the secret. It’s like distributing your digital horcruxes.

A common proverb around the IT operations community is “Nobody wants a backup, everyone wants a restore”. Since creating a backup is a dull repeating task, it is best automated. This reduces the likelyhood that you forget – intentionally or not – to create a backup. You should also test to restore your backups. There are many stories like these gitlab outtage around the internet. They had multiple backups automated, by the automation failed. Upon requiring a restore due to human error deleting production data, they found there backups empty. To prevent this, regularly test a restore. This ensures you know how to restore the data, and secondly you know your backup process work. A second major takeaway from that gitlab post mortem is to use different backup mechanism for your offsite and onsite backup. Two different backup mechanism ensure that a bug or configuration error in one of the backup tools only one backup corrupts.

Data synchronisation services like Dropbox, Google Drive or Qsync are used to synchronise data between multiple devices. Since they replicate the data onto multiple devices, some use them as their backup tool, too. This works perfectly fine if you want to prevent data loss due to a hardware failure. But synchronisation itself can’t prevent you from data loss due to human error, software bugs or encrypting trojans. In all those cases your data synchronisation service will replicate the unwanted data change - deletion or encryption - in real time onto every device. You can use them as backup if they support a file version history with a restore functionality. This allows you to restore an old version in case of an unwanted data change. If your data synchronisation tool doesn’t support a file version history it is not a proper backup tool.

My Implementation

The my NAS in the basement is my central data sink. The NAS is backed up daily to a connected external USB harddrive. To create this backup I use the build in QNAP backup tool. Once a day I create a snapshot of the NAS with restic into my Backblaze B2 bucket. The backed up dataset on the NAS is roughly 1 TB in size with a couple of hundred Megabytes added per day. Restic uploads only the added and changed chunks, hence the daily backup is finished on average after five minutes. The initial backup took 4 days with my 30 Mbit/s upload. I use resillio sync to sync the pictures taken by our smartphones to my NAS. As argued earlier this synchronisation itself is not a backup, but I backup the NAS. Hence, all my pictures are backed up automatically. My wife and I use free Dropbox accounts to share our daily documents like letters or forms. I synchronize my Dropbox account via the QNAP buildin synchronization tool to the NAS. This synchronized copy is also backed up. To enable me to access my backup without any of my computing devices I placed a printed and sealed copy of the credentials required to access the backup at another place. I did this OpSec tradeoff since the backup contains all the pictures of my family. The data private to myself is stored in encrypted data containers inside the global backup.