dbackup
Status | : stable | |
Download | : dbackup (11 KB) |
While searching for a backup tool, I didn't find anything between partition dumping and a plain tar invocation. I ended up doing a simple cp -auxf to backup the major parts of my system (/etc, /home, /var) to a trusted hard disk devoted to this task. However this was not satisfying.
Addendum : rsync and/or unison are actually killer tools that should do better than dbackup.
The problem
Every day the backup was overwritten with new data. As a result, the backup was containing the whole history, since files were never deleted from there. And of course you couldn't restore data from the backup properly : I once did it naïvely with a simple reverse copy, and I actually reinjected 6 months worth of mail spool that were still in the backup. The SMTP server became frantic and poured over 100 mail/sec over the LAN :).
I can't afford having multiple full copies of my backup. The minimum to make things safe is to keep the two last backups, and overwrite them cyclically. But you loose any content that was deleted more that two days ago. I need something tailored to my modest data and modest hard drives.
The solution
I need 1) a 'backup repository' which can be sync'ed exactly with my data, 2) a lenghty history of the data that was once on my hard disk. I wrote a Perl script that does this job. Let me define some terms :
- Source : the data you want to backup. Sometimes referred as sources since it is a list of files and folders.
- Repository : the backup copy of your sources, an exact replication which is sync'ed on demand.
- Archive : data which is in the backup repository but no longer in your sources ends up there.
dbackup -a /mnt/safe/homes-arc /home /root /mnt/safe/homes
This invocation will backup the sources '/home' and '/root' into the repository '/mnt/safe/homes'. When you sync the repository, any data which is in the repository but no longer among the sources is automatically moved to the archive '/mnt/safe/homes-arc' (the '-a archive' part is a dbackup option, see below). The tree structure is replicated in the archive, and files are renamed if necessary :
- if an extension is detected, 'file.ext' will be renamed 'file_xxx.ext' (where 'xxx' is an increasing number starting from 1)
- otherwise '_xxx' is simply appended to the file name
You end up with this directory layout :
/mnt/safe/homes
home
(replication of /home content)
root
(replication of /root content)
/mnt/safe/homes-arc
home
(archive/history of /home content)
root
(archive/history of /root content)
You can forget the '-a' (archive) option and work the old way, were the repository is a cumulative copy of your sources. You can also use '-d' to force an exact replication, meaning that it will delete repository items which are no longer among your sources (warning there !). You can also specify files as sources. Use '--help' for more information.
How it works
This is a recursive program, this algorithm is applied to every source path :
- Stat nodes from the source and the repository using the same current relative path, ie. :
source / current_relative_path / *
repository / current_relative_path / *
Then, for every node :- If the node is a folder, recurse with a new relative path (relative_path / node /)
- If the node exist in source, but not in repository : copy (create)
- If the node exist in the repository, but not in source : archive
- If the node exist both in repository and source :
- If the node type changed (file/folder/socket/pipe/...) :
- first archive repository node
- then copy source node to repository
- Else : same node, more recent in source : copy (update)
- Else : same node, up to date : ignore
- If the node type changed (file/folder/socket/pipe/...) :
If you use the '-df' flag, you'll see letters explaining the decision for each node. Please use '--help' for more information.
Self documentation
sh: 1: dbackup: not found
Todo
- Pluggable source-side or repository-side file access methods : ie. make do_command() and stat_node() versions that support invocations accross a FTP or SSH session for remote backuping.
- Testing ! As of today, has been running pretty well on this machine as a daily backup tool for 8238 days.