Backup

From Things and Stuff Wiki
Jump to: navigation, search

Things and Stuff Wiki - an organically evolving knowledge base personal wiki with a totally on-the-fly taxonomy containing topic outlines, descriptions and breadcrumbs, with links to sites, systems, software, manuals, organisations, people, articles, guides, slides, papers, books, comments, screencasts, webcasts, scratchpads and more. use the Table of Contents for navigation on longer pages. see About for further information. / et / em

out of date, confusing

General

See also Sharing, Cloud#Storage



"Delta based incrementals make sense for tape drives. You run a full backup once, then incremental deltas for every day. When enough time has passed since the full backup, you do a new full backup, and then future incrementals are based on that. Repeat forever."


Copying

rsync

  • rsync(1) - a fast, versatile, remote (and local) file-copying tool.
  • rsync is a software application and network protocol for Unix-like systems with ports to Windows that synchronizes files and directories from one location to another while minimizing data transfer by using delta encoding when appropriate. Quoting the official website: "rsync is a file transfer program for Unix systems. rsync uses the 'rsync algorithm' which provides a very fast method for bringing remote files into sync." An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. rsync can copy or display directory contents and copy files, optionally using compression and recursion.


"Unfortunately “--sparse” and “--inplace” cannot be used together. Solution: When copying the file the first time, which means it does not exist on the target server use “rsync --sparse“. This will create a sparse file on the target server and copies only the used data of the sparse file. When the file already exists on the target server and you only want to update it use “rsync --inplace“. This will only transmit the changed blocks and can also append to the existing sparse file."



rsync [OPTION...] SRC... [DEST]
  # copy source file or directory to destination
rsync local-file user@remote-host:remote-file

rsync -e='ssh -p8023' file remotehost:~/
  # non-standard remote shell command, copy to remote users home directory
rsync -r --partial --progress srcdirectory destdirectory
  # recursive
  # --partial - resume partial files,
  # --progress - show progress during transfer, equiv to --info=flist2,name,progress

rsync -rP srcdirectory/ destdirectory
  # recursive, resume partial files with progress bar, don't copy root source folder
rsync -avh --inplace --no-whole-file SRC DEST

-a, --archive
  #  archive mode; equals -rlptgoD (no -A,-X,-H)
    # recursive, links, preserve permissions/times/groups/owner/device files.

-r, --recursive             recurse into directories
-l, --links                 copy symlinks as symlinks
-p, --perms                 preserve permissions
-t, --times                 preserve modification times
-g, --group                 preserve group
-o, --owner                 preserve owner (super-user only)
-D  --devices --specials.
    --devices               preserve device files (super-user only)
    --specials              preserve special files

-v, --verbose               list files transfered
-vv
-h, --human-readable        output numbers in a human-readable format

    --inplace               This option is useful for transferring large files with block-based changes or appended data, and also on systems that are disk bound, not network bound. It can also help keep a copy-on-write filesystem snapshot from diverging the entire contents of a file that only has minor changes.
    --no-whole-file         incremental delta-xfer
-A, --acls                  preserve ACLs (implies -p, save permissions)
-X, --xattrs                preserve extended attributes
-H, --hard-links            preserve hard links
--exclude-from=FILE     read exclude patterns from FILE
--sparse                handle sparse files efficiently
-W, --whole-file            copy files whole (w/o delta-xfer algorithm)
-d, --dirs                  transfer directories without recursing
    --numeric-ids           transfer numeric group and user IDs rather than mapping user and group name
-x, --one-file-system       don't cross filesystem boundaries

    --delete                delete extraneous files from dest dirs
    --delete-after          receiver deletes after transfer, not during
    --delete-excluded       also delete excluded files from dest dirs

    --ignore-errors         go ahead even when there are IO errors instead of regarding as fatal
    --stats                 give some file-transfer stats


rsync -aAXv --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} /* /path/to/backup/directory
  # archive with permissions/ACL and attributes, exclude directories


#!/bin/sh
if [ $# -lt 1 ]; then 
    echo "No destination defined. Usage: $0 destination" >&2
    exit 1
elif [ $# -gt 1 ]; then
    echo "Too many arguments. Usage: $0 destination" >&2
    exit 1
fi

if [ $2 = "new" ]; then
   SPARCEINPLACE="--sparse"
elif [ $2 = "rerun" ]; then
   SPARCEINPLACE="--inplace"
fi

RSYNC="ionice -c3 rsync"
RSYNC_ARGS="-aAXv --no-whole-file --stats --ignore-errors --human-readable --delete"
# -a = --recursive, --links, --perms, --times, --group, --owner, --devices, --specials
# -A = --acls, -X = --xattrs, -v = --verbose
# --no-whole-file = delta-xfer algorithm
# --delete = delete extraneous destination files
START=$(date +%s)
# SOURCES="/dir1 /dir2 /file3"
TARGET=$1

echo "Executing dry-run to see how many files must be transferred..."
TODO=$(${RSYNC} --dry-run ${RSYNC_ARGS} ${SOURCES} ${TARGET}|grep "^Number of files transferred"|awk '{print $5}')

${RSYNC} ${RSYNC_ARGS} ${SPARCEINPLACE} --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} \
--exclude={/home/*/.gvfs,/home/*/.thumbnails,/home/*/.cache,/home/*/.cache/mozilla/*,/home/*/.cache/chromium/*} \
--exclude={/home/*/.local/share/Trash/,/home/*/.macromedia,/var/lib/mpd/*,/var/lib/pacman/sync/*, /var/tmp*} \
 / $TARGET | pv -l -e -p -s "$TODO"

FINISH=$(date +%s)
echo "-- Total backup time: $(( ($FINISH-$START) / 60 ))m, $(( ($FINISH-$START) % 60 ))s"
touch $1/backup-from-$(date '+%a_%d_%m_%Y_%H%M%S')
# after refining excluded, --delete-excluded
0 5 * * * rsync-backup.sh /path/to/backup/directory rerun


Daemon

rsync --daemon
  # run as a daemon
lsyncd

Services

Versioning with rsync

rsnapshot

  • rsnapshot - Local filesystem snapshots are handled with rsync. Secure remote connections are handled with rsync over ssh, while anonymous rsync connections simply use an rsync server. Both remote and local transfers depend on rsync. rsnapshot saves much more disk space than you might imagine. The amount of space required is roughly the size of one full backup, plus a copy of each additional file that is changed. rsnapshot makes extensive use of hard links, so if the file doesn't change, the next snapshot is simply a hard link to the exact same file.

Grsync

Arno's SmartBackup Script

luckyBackup

  • luckyBackup is an application that backs-up and/or synchronizes any directories with the power of rsync. It is simple to use, fast (transfers over only changes made and not all data), safe (keeps your data safe by checking all declared directories before proceeding in any data manipulation ), reliable and fully customizable

oldtime

syncbackup

gutbackup

zsync

  • zsync is a file transfer program. It allows you to download a file from a remote server, where you have a copy of an older version of the file on your computer already. zsync downloads only the new parts of the file. It uses the same algorithm as rsync. However, where rsync is designed for synchronising data from one computer to another within an organisation, zsync is designed for file distribution, with one file on a server to be distributed to thousands of downloaders. zsync requires no special server software — just a web server to host the files — and imposes no extra load on the server, making it ideal for large scale file distribution.

Psync

rclone


librsync

rdiff-backup

  • rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted. Finally, rdiff-backup is easy to use and settings have sensical defaults.


  • rdiffWeb is a web interface for browsing and restoring from rdiff-backup repositories. It is written in Python and is distributed under the GPL license.



rbackup

duply (simple duplicity)

Duplicity

  • Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.


DejaDup

  • Déjà Dup (day-ja-doop) is a simple backup tool. It hides the complexity of doing backups the Right Way (encrypted, off-site, and regular) and uses duplicity as the backend.

Duplicati

  • Duplicati is a free backup client that securely stores encrypted, incremental, compressed backups on cloud storage services and remote file servers. It works with Amazon S3, Windows Live SkyDrive, Google Drive (Google Docs), Rackspace Cloud Files or WebDAV, SSH, FTP (and many more). Duplicati has built-in AES-256 encryption and backups can be signed using GNU Privacy Guard. A built-in scheduler makes sure that backups are always up-to-date. Last but not least, Duplicati provides various options and tweaks like filters, deletion rules, transfer and bandwidth options to run backups for specific purposes.

Areca Backup

BackupPC

AMANDA

"the Amanda planner runs on the server to decide exactly how to go about backing things up. It, too, contacts each Amanda client and requests an estimate of the size of full and incremental dumps for each DLE. It then does some complex planning based on the history of each DLE, the estimated sizes, the available storage space, and a number of tweakable parameters to decide what to back up. This often confuses newcomers, who have control issues and want to tell Amanda when to do full backups and when to do incrementals. The planner is one of Amanda's strengths! Don't fight it!"

Bacula

Backup Ninja

  • Backupninja allows you to coordinate system backup by dropping a few simple configuration files into /etc/backup.d/. Most programs you might use for making backups don't have their own configuration file format. Backupninja provides a centralized way to configure and schedule many different backup utilities. It allows for secure, remote, incremental filesytem backup (via rdiff-backup), compressed incremental data, backup system and hardware info, encrypted remote backups (via duplicity), safe backup of MySQL/PostgreSQL databases, subversion or trac repositories, burn CD/DVDs or create ISOs, incremental rsync with hardlinking.

DAR

  • Disk ARchive is a shell command that backs up directory trees and files, taking care of hard links, Extended Attributes, sparse files, MacOS's file forks, any inode type (including Solaris Door inodes), etc.

backup2l

  • backup2l - low-maintenance backup/restore tool. backup2l is a lightweight command line tool for generating, maintaining and restoring backups on a mountable file system (e. g. hard disk). The main design goals are are low maintenance effort, efficiency, transparency and robustness. In a default installation, backups are created autonomously by a cron script. supports hierarchical differential backups with a user-specified number of levels and backups per level. With this scheme, the total number of archives that have to be stored only increases logarithmically with the number of differential backups since the last full backup. Hence, small incremental backups can be generated at short intervals while time- and space-consuming full backups are only sparsely needed.

Obnam

  • Obnam is an easy, secure backup program. Snapshot backups. Every generation looks like a complete snapshot, so you don't need to care about full versus incremental backups, or rotate real or virtual tapes. Data de-duplication, across files, and backup generations. If the backup repository already contains a particular chunk of data, it will be re-used, even if it was in another file in an older backup generation. This way, you don't need to worry about moving around large files, or modifying them. Encrypted backups, using GnuPG.

Box Backup

ZBackup

urbackup

File and image backups are made while the system is running without interrupting current processes. UrBackup also continously watches folders you want backed up, in oder to quickly find differences to previous backups. Thus incremental file backups are really fast. Your files can be restored through the web interface or the Windows Explorer while the backups of drive volumes can be restored with a bootable CD or USB-Stick (bare metal restore).

Bup

  • https://github.com/bup/bup - Very efficient backup system based on the git packfile format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images). Current release is 0.29, and the development branch is master.

Doesn't remove large deleted files from archive?

burp

  • Burp is a network backup and restore program. It uses librsync in order to save network traffic and to save on the amount of space that is used by each backup. It also uses VSS (Volume Shadow Copy Service) to make snapshots when backing up Windows computers.

ddar

ZPAQ

SNEBU

Attic

BorgBackup

  • https://github.com/borgbackup/borg - fork of Attic, a deduplicating backup program. Optionally, it supports compression and authenticated encryption. The main goal of Borg is to provide an efficient and secure way to backup data. The data deduplication technique used makes Borg suitable for daily backups since only changes are stored. The authenticated encryption technique makes it suitable for backups to not fully trusted targets. Fork of Attic.
usage: borg create [-h] [-v] [--debug] [--lock-wait N] [--show-rc]
                  [--no-files-cache] [--umask M] [--remote-path PATH] [-s]
                  [-p] [--filter STATUSCHARS] [-e PATTERN]
                  [--exclude-from EXCLUDEFILE] [--exclude-caches]
                  [--exclude-if-present FILENAME] [--keep-tag-files]
                  [-c SECONDS] [-x] [--numeric-owner]
                  [--timestamp yyyy-mm-ddThh:mm:ss]
                  [--chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE]
                  [-C COMPRESSION] [--read-special] [-n]
                  ARCHIVE PATH [PATH ...]

  # Create new archive
# Backup ~/Documents into an archive named "my-documents"
$ borg create /mnt/backup::my-documents ~/Documents
 
# Backup ~/Documents and ~/src but exclude pyc files
$ borg create /mnt/backup::my-files   \
    ~/Documents                       \
    ~/src                             \
    --exclude '*.pyc'

# Backup the root filesystem into an archive named "root-YYYY-MM-DD"
# use zlib compression (good, but slow) - default is no compression
NAME="root-`date +%Y-%m-%d`"
$ borg create -C zlib,6 /mnt/backup::$NAME / --do-not-cross-mountpoints
-e PATTERN, --exclude PATTERN
                       exclude paths matching PATTERN
 --exclude-from EXCLUDEFILE
                       read exclude patterns from EXCLUDEFILE, one per line
 --exclude-caches      exclude directories that contain a CACHEDIR.TAG file
                       (http://www.brynosaurus.com/cachedir/spec.html)
 --exclude-if-present FILENAME
                       exclude directories that contain the specified file

-x, --one-file-system
                       stay in same file system, do not cross mount points

Rclone

To sort

Btrfs

See also *nix#Btrfs.

Using btrfs snapshots instead of cp -al has two major advantages. First of all creating a snapshot is much faster than using hardlinks. and the second advantage is, that meta-information about the file will be preserved (ownership, access and modification-time and also file-attributes). When using hardlinks this information will have the state of the most recent backup-process (also for older backups). Last but not least, if you use a new version of btrfs you can also lock the snapshots down to be read-only.

"Of course, the problem with this is that snapshots are, essentially, COW hard links; this means that if there's a corruption on the disk for a file, it'll affect all child snapshots."

"Rsync integration. Now that we have code to efficiently find newly updated files, we need to tie it into tools such as rsync and dirvish. (For bonus points, we can even tell rsync _which blocks_ inside a file have changed. Would need to work with the rsync developers on that one.)"

basic snapshot management

from+to btrfs

rsync based

  • snap - bash, 3 months ago
  • btrbackup - bash, moderatly complex, 3 months ago


Imaging

Other

FSArchiver