Statistic: 90% of all companies that suffer catastrophic data lost (a disk crash) are out of business within one year.
Backups are made for the purpose of rebuilding a system
that is identical to the current one.
Backups are thus for recovery,
not transferring of data to another system.
They do not need to be portable.
In this sense backup
is used to mean a complete backup of an
entire system:
not just regular files but all owner, group, date, and permission
information, for all files, links, /dev entries, some
/proc entries, etc.
Archives are for transferring data to other systems, or making copies of files for historical or legal purposes. As such they should be portable so that they may be recovered on new systems when the original systems are no longer available. For example it should be possible for an archive of the files on a Solaris Unix system to be restored on an AIX Unix, or even a Linux system. (Within limits this portability should extend to Windows and Macintosh systems as well.)
Most of the time the two terms are used interchangeably.
(In fact the above definitions are not universally agreed
upon!)
In the rest of this document the term backup
will be used
to mean either a backup or an archive as defined above.
Most real-world situations call for archives, since the other objects
(such as /dev entries) rarely if ever change on a
production server once the system is installed and configured.
A single true backup
is usually sufficient.
For home users, the original system CDs often serve as
the only backup need; all other backups are of modified files only
and hence are archives
.
Using RAID is not a replacement for regular backups! (Imagine a disaster such as a fire on your site, an accidentally deleted file, or corrupted data.)
Creating backup policies (includes several sub-policies, discussed below) can be difficult. Keep in mind the requirements of the organization, often specified in an SLA (or service level agreement). Make sure users/customers are aware of what gets backed up and what doesn't, how to request a restore, and how long it might take to restore different data from various problems (very old data, a fire destroys the hardware, a DB record accidentally deleted from yesterday, ...).
Most people underestimate how slow a restore operation can be. It is often 10–20 times longer to restore a file than to back one up. (One reason: Operating systems are usually optimized for read operations, not write operations.)
You must also worry about security of your backups. Have a clear policy on who is allowed to request a restore and how to do so, or else one user might request a restore of other's files. In some cases this may be allowed, say by a manager or auditor. (In a small organization where everyone knows everyone, this is not likely to be a problem.)
Customers should be able to recover any file version (with a granularity of a day) from the past 6 months, and any file version (with a granularity of a month) from the past 3 years. Disk failure should result in no more than 4 hours of down-time, with at worst 2 business days of data lost. Archives will be full backups each quarter and kept forever, with old data copied to new medium when the older technology is no longer supported. Critical data will be kept on a separate system with hourly snapshots between 7:00 AM and 7:00 PM, with a midnight snapshot made and kept for a week. Users have access to these snapshots.
(Database and financial data have a different SLA for compliance reasons.)
It is possible to backup only a portion of the files (and other objects in the case of a backup) on your systems. In fact there are three types of backups (or archives):
epochor
complete) — everything gets backed-up.
(Not everyone distinguishes between incremental and differential backups, but they are different so you must make sure anyone you speak with is using the same definitions as you are.)
A system administrator must choose a backup strategy (a combination of types) based on several factors. The factors to consider are safety, required down time, cost, convenience, and speed of backups and recovery. These factors vary in importance for different situations. Common strategies include full backups with incremental backups in-between, and full backups with differential backups in-between (a two-level differential). Sometimes a three level differential is used but rarely more levels. (You never use both incremental and differential backups as part of a single strategy.) The strategy of using only full backups is rarely used.
What with modern backup software, the differences between the strategies mentioned above aren't that large. Incrementals take less time to backup and more time to restore (since several different backup media may be needed), compared with differential backups (where at most two media, the last differential and the last full backup media, are used to recover a file). Full backups take a huge amount of time to make, but recovery is fast (only a single media must be read). Note most commercial software keeps a special directory file that is reset for each full backup, and keeps track of which incremental tape (or other media) holds which files. This file is read from the last incremental tape during a restore, to determine exactly which tape to use to recover some file.
The frequency of backups (the backup schedule) is another part of the policy. In some cases it is reasonable to have full backups daily and incremental backups several times a day. In other cases a full backup once a year with weekly or monthly incremental backups could be appropriate. A common strategy suitable for most corporate environments would be monthly full backups and daily differential backups. (Another example might be quarterly full (differential level 0) backups, with monthly level 1 differentials, and daily level 2 differentials.) However more frequent full backups may save tapes (as the incremental backups near the end of the cycle may be too large for a single tape).
Note that in some cases there will be legal requirements for backups at certain intervals (e.g., the SEC for financial industries, the FBI for defense industries, or regulations for medical/personal data). Depending on your backup software, it may be required to bring the system partially or completely off-line during the backup process. Thus there is a trade-off between convenience versus cost, versus the safety of more frequent backups.
In a large organization it may not be possible to perform a full backup on all systems on the same weekend. A staggered schedule is needed, where (say) 1/4 of the severs get backed up on the first Sunday of the month, 1/4 the second Sunday, and so on. Each server is still being backed up monthly but not all on the same day of the month.
Ideally you don't want to have a single backup require more than one tape or whatever medium you're using. Having to change tapes makes backup and recorery slower, and may make automatic backups impossible (if someone has to manually change tapes.)
Be aware that small changes to the schedule can result in dramatic changes in the amount of backup media needed. For example suppose you have 4 GB to backup within this SLA: full backup every 4 weeks (28 days) and differential backups between. Now assume the differential backup grows 5% per day for the first 20 days (80% has changed) and stays the same size thereafter. Some math reveals that doing full backups each week (which still meets the SLA) will use a third the amount of tape of a 28 day cycle, in this case.
Good schedules minimize backup and recovery times, minimize the amount of backup media required, and still meet the SLAs. Such schedules require a lot of complex calculation to work out. Modern backup software (such as Amanda) allows one to specify the SLA and will create a schedule automatically. A dynamic schedule is adjusted automatically to optimize the backups, depending on how much data is actually copied for each backup. Such software will simply inform the SA when to change the tapes in a jukebox.
On a busy (e.g., database) server downtime will be the most critical factor. In such cases consider using LVM snapshots, which very quickly makes a read-only copy of some logical volume using very little extra disk space. You can then backup the snapshot while the rest of the system remains up.
Another strategy is called disk-to-disk-to-tape, in which the data to be backed up is quickly copied to another disk and then written to the slower backup medium later.
Deciding what to backup is part of your policy too. Are you responsible for backing up the servers only? All partitions, some partitions, some directories, or just a few selected files? The Boss' workstation? All workstations? (The users need to know!) Network devices (e.g., routers and switches)? It may be appropriate to use a different backup strategy for user workstations than for servers, for different servers, or even different partitions/directories/files of servers.
Another part of your backup policy is determining how long to keep old backups around. This is called the backup retention policy. In many cases it is appropriate to retain the full backups indefinitely. In some cases backups should be kept for 7 to 15 years (in case of legal action or an IRS audit). Such records are often useful for more than disaster recovery. You may discover your system was compromised months after the break-in. You may need to examine old files when investigating an employee. You may need to recover an older version of your company's software. Such records can help if legal action (either by your company for by someone else suing your company) occurs.
Since Enron, Microsoft scandals (when corp. officers had
emails subpoenaed by
DoJ),
a common new policy became if it doesn't exist it can't be
subpoenaed!
These events led to a revision of the FRCP:
The FRCP include rules for handling of
ESI
(Electronically Stored Information) when
legal action (e.g. lawsuits) are immanent or already underway.
You must suspend normal data destruction activities (such as reusing
backup media), possibly make snapshot
backups of various
workstations, log files, and other ESI, classify the
ESI as easy or hard to produce,
and the cost to produce the hard ESI (which the other
party must pay), work out a discovery
(of evidence) plan,
and actually produce the ESI in a legally acceptable
manner.
An SA should consult the corporate lawyers well in
advance to work out the procedures.
It is important to decide where store the backup media (the storage policy). These tapes or CDs contain valuable information and must be secured. Also it make no sense to store media in the same room as the server the backup was made from; if something nasty happens to the server such as theft, vandalism, fire, etc., then you lose your backups too. A company should store backup media in a secure location, preferably off-site. A bank safe-deposit box is usually less than $50 a year and is a good location to store backup media. If on-site storage is desirable, consider a fire-proof safe. (And keep the door shut all the time!) Consider remote storage companies but beware of bandwidth and security issues.
Backup media will not last forever. Considering how vital the backups might be, it is a false economy to buy cheap tape or reuse the same media over and over. A reasonable media replacement policy (also known as the media rotation schedule) is to use a new tape once for a full backup, then use it 12 times for differential or 31 times for incremental backups, then toss it. Before using new media for the first time, test it.
For security reasons you should completely erase the media before throwing the media in the trash. (This is harder than you think!) An alternative is to shred or burn old media, and/or encrypting backups as they are made.
There are too many choices to count today. For smaller archives flash disks, writable CD-ROMs, writable DVDs, (these are WORM media) and old fashioned DLT, DAT, DDS-{2,4,8,16} tape drives are popular. (I used a DDS-2 SCSI drive at home for many years.) Today (2010) consider LTO2 drives.
Tape storage is very cheap, typically less then $20 for 80 gigabytes of storage. (DDS-2 tapes cost about $7 and hold 4 GB each. DDS-4 are fast backups and hold ~100 GB each.) However tapes and other magnetic media can be affected by strong electrical and magnetic fields, heat, humidity, etc.
An external hard drive (less than $100 for 1TB) connected directly to your PC can use the backup program that came with your operating system (Backup and Restore Center on Windows, and Time Machine on OS X). Most backup software can automate backups of all new files or changed ones on a regular basis. This is a simple option if you only have one PC.
Optical media such as CDs are more durable and fairly cheap but take longer to write. They can be reused less often than magnetic media, and are still susceptible to heat and humidity. Optical media can scratch if not carefully handled. Also consider the bulk of the media. If you must store seven year's worth of backups, it may be important to minimize the storage requirements and expense. A CD-ROM can hold about 700 MiB while a dual-layer Blu-ray can hold 50 GiB.
A choice becoming popular (since 2008) is on-line storage,
e.g., HP Upline,
Google GDrive, etc.
(For SOHO you can
use Mozy or BackBlaze).
This is a market that is growing and changing rapidly, so you need
to do your own research on available choices (as the list above is
likely out of date.)
These companies offer cheap data storage and complete system
backups, provided you have a fast Internet connection.
Many collocation (network exchange points) provide this service as
well to the connected ISPs.
Whether or not to trust the Internet and some outsourced company
with your vital business data is a decision you will need to
make.
If you go this route make sure all the data is encrypted using
industry standard encryption at your site before transmission
across the Internet.
(Never use any company that uses proprietary
encryption
regardless of how secure they claim it is!)
When backing up large transaction database files, the speed of the media transfer is important. For instance, a 6 Mbps (Megabits per second) tape drive unit will backup 10 gigabytes in about 3 hours and 45 minutes. (In most cases incremental or differential backups contain much less data!)
For IDE controllers you only choice is a TRAVAN backup drive. These are very slow, don't use! For SCSI drives (such as DDS drives from HP) there are two speeds for the SCSI controller, depending on what devices are on it. A tape drive will slow down the SCSI bus by half, so consider dual SCSI controllers.
For networks, consider a networked backup unit. This would allow a single backup system to be used with many different computers. Thus you can buy one high-speed device for about the same money as several lower-speed devices. Keep in mind however that a network backup can bring a standard Ethernet network to its knees. (The network only shares 10 Mbps for all users on a SOHO or wireless LAN.) Even a Fast Ethernet (100 Mbps) LAN might suffer noticeable delays and problems.
An excellent choice for single-system backup is a USB disk. Also using SAN/NAS to centralize your storage makes it easy to use a single backup system (robot tapes).
It is a good idea to have a spare media drive (e.g., DLT tape drive), in case the one built into a computer fails when the computer fails. This is especially true for non-standard backup devices that may not be available from Circuit City on a moment's notice. Regularly clean and maintain (and test) your backup drives. (While I don't know of any organization that does this, consider copying old data to new hardware once the old drives are no longer supported or available. If you don't have a working drive, old backups are useless!)
Suppose a medium to large organization uses 8 backup tapes a day, 6 days a week, means 48 tapes. If your retention policy is to keep 6 months worth of incrementals, that's 1,248 tapes needed. High capacity quality tapes might go for $60, so you would need $74,880.00. In the second part of the year you only need new tapes for full backups, an additional 260 tapes say, for $15,600, or more than $90k for the first year. (Not counting spares or the cost of drive units.) Changes to the policies can result in expense differences of over $1,000 per month!
As backup technology changes over the years, it is important to keep old drives around so as to read old backup tapes when needed. You should keep old drives around long enough to cover your data retention policies. Try to avoid upgrading your backup technology (drives, tapes, software) every few years, or you'll end up with many different and incompatible backup tapes.
Archives are easier to make than backups, so most tools perform
archives.
A tool cannot make a backup
without knowing the underlying
filesystem intimately, i.e. it must parse the filesystem on
disk.
The reason is twofold:
GNU tar is a
popular tool for archiving the user's view of files.
Another standard (and free) choice is cpio.
Note neither tool is standardized by POSIX.
A new standard tool, based on both (and hopefully better than
either) is pax.
These, combined with find and some compression program
(such as gzip or bzip2) are used to easily
make archives.
You can ask find to locate all files modified since a
certain date, and add them to a compressed tar archived
created on a mounted backup tape drive.
A backup shell script can be written, so you don't end up attempting
to backup /dev, /sys, or /proc
files.
(Note! Unix tar ≠ GNU
tar; use the GNU version.)
If you want to store the kernel's view of files, along with all of
the semantics the filesystem provides, and none of the
non-filesystem objects that might appear to inhabit the filesystem
(such as /proc entries), use the filesystem's native
dump (and restore) programs provided by
your vendor specifically for that purpose, for your filesystem type
(note for Reiser4Fs you can just use
star).
dump uses /etc/dumpdates to track dump
levels.
Some of the differences between dump (for backups) and
tar or cpio for archives are:
dump is not confused by object types that the
particular operating system has defined as extensions to the
standard filesystem;
it also does not attempt to archive objects that do not actually
reside on the filesystem, e.g. doors and
sockets.
Consider what GNU tar does to
UNIX-domain sockets:
it archives them as named pipes.
They are not on the filesystem, so really they should not be
archived at all.
dump handles this situation correctly. tar ignores ACLs, while a native
dump program will correctly archive them.
(A new extensible backup format known as pax will archive
ACLs, SELinux labels, and other meta-data stored in
extended attributes.
A tool called star uses a version
of this format.
Oddly the pax utility doesn't seem to backup
extended attributes correctly.
Find out about star on the web.) tar cannot detect reliably where file
holes are.
dump is not confused by files with holes (such as
utmp on some systems); it will backup only
the allocated blocks and restore will reconstruct the
file with its original layout. tar uses normal filesystem semantics to read
files, so it modifies the access times recorded in the filesystem's
inodes.
This effectively deletes an audit trail which may be
required for other purposes.
(Modern tar has extra options to handle this
correctly.)
dump parses and records the filesystem outside of
kernel filesystem semantics, and therefore doesn't modify the
filesystem in the process of copying it.
Not all filesystem types support dump and
restore utilities.
When picking a filesystem type keep in mind your backup requirements.
In any case, crontab can be used to schedule backups
according to the backup schedule discussed earlier.
If your company prefers to have a human perform backups,
remember that root permission will be needed to
access the full system.
Often the backup program is controlled by sudo or a
similar facility so the backup administrator doesn't need the root
password.
The find command can be used to locate which files
need to be backed-up.
Use
for
incrementals and differentials to find files changed
since find / -mtime -xx
(/etc/last-backup.{full,incremental,differential}).
Use with tar roughly like this:
mount /dev/zip find / -mtime -1 -depth | xargs tar -rf /tmp/$$ gzip /tmp/$$; mv /tmp/$$.gz /zip/incremental-6-20-01 date >/etc/last-backup.incremental umount /dev/zip
Commercial software is affordable and several packages are popular
for Unix and Linux systems, including
BRU
(TolisGroup.com),
BackupEXEC, and
Arkeia
(www.arkeia.com).
I haven't used these, I just use tar and
find.
Of course there are other choices as well, such as KDE
ark or amanda (network backups).
Some of these can create schedules, label tapes, encrypt tapes,
follow media rotation schedules, etc.
The most important tool is the documentation: the backup strategy, media types and rotation schedule, hardware maintenance schedule, location of media storage (e.g., the address of the bank and box number), and all the other information discussed above. This document is collectively referred to as the backup policy. This document should clearly say to users what gets backed up and when, and what to do and who to contact if you need to recover files.
Note: Whatever tools you use, make sure you test your backup method, by attempting to use the recovery procedure. (I know someone who spent 45 minutes each working day doing backups for years, only to realize none of the backups ever worked the first time he attempted to recover a file!)
(Parts of this document were adopted from netnews (Usenet) postings
in the newsgroup
during 5/2001
by Jefferson Ogata.)
Other parts were adopted from The Practice of System and Network
Administration, by Limoncelli and Hogan, 1st Ed.
©2001 by Addison-Wesley.
comp.unix.admin
Solaris zones contain a complication for backup: many standard directories are actually mounted from the global zone via LOFS (loopback filesystem). These should only be backed up from the global zone. The only items in a local zone needing backup are (usually) application data and configuration files. Using an archive tool (such as cpio, tar, or star) will work best:
find export/zone1 -fstype lofs -prune -o -local \ | cpio -oc -O /backup/zone1.cpio
Whole zones can be fully or incrementally backed up using
ufsdump.
Shut down the zone before using the ufsdump command, to put the
zone in a quiescent state, and avoid backing up shared file systems,
with:
global# zlogin -S zone1 init 0
Solaris supports filesystem snapshots (like LVM does on Linux) so you don't have to shut off a zone. However it must be quiesed by turning off applications before creating the snapshot. Then you can turn them back on and perform the backup on the snapshot. Create it with:
global# fssnap -o bs=/export /export/home # create the snapshot: global# mount -o ro /dev/fssnap/0 /mnt # then mount it.
You should make copies of your non-global zones' configurations in case you
have to recreate the zones at some point in the future.
You should create the copy of the zone's configuration after you have
logged into the zone for the first time and responded to the
sysidtool questions:
global# zonecfg -z zone1 export > zone1.config
Added SCSI controller (ADAPTEC 2940)
Added SCSI DDS2 Tape drive
On reboot kudzu detected and configured SCSI controller and tape device
Verify devices found with 'dmesg': indicate tape is /dev/st0
and /dev/nst0
Verify scsi devices with 'scsi_info' (/proc/scsi)
Verify device working with: mt -f /dev/st0 status
Create link: ln -s /dev/nst0 /dev/tape
Verify link: mt status
Note: /dev/st0 causes automatic tape rewind after
any operation, /dev/nst0 has no automatic rewind,
but most backup software knows to rewind before finishing.
If you plan to put multiple backup files on one tape you must use
/dev/nst0.
mt (/dev/mt0, /dev/rmt0)
st (/dev/st0, /dev/nst0 —
use nst for no auto rewind)
mt and rmt (remote tape backups) —
rewind, erase, ...
dump/restore (These operate on the
drive as a collection of disk blocks,
below the abstractions of files, links and directories that are created
by the file systems. dump backs up an entire file system at a time.
It is unable to backup only part of a file system or a directory tree that
spans more than one file system.)
tar, cpio, dd, star
(and pax)
A comparison of these tools:
cpio has many more conversion options than tar
and supports many formats. tar supports multiple hard links on FSes that have
32 bit inode numbers, but cpio can only hand up to
18 bits in the default mode. tar copies a file with multiple hard links once,
cpio each time. tar
will stop at that point.
cpio will skip over corruption and try to restore
the rest of the files. cpio is reported faster than tar,
and uses less space (because tar uses 512 byte blocks
for every file header, cpio just uses whatever it needs
only). tar can support archives that span multiple volumes;
cpio can't. tar (star) supports extended
attributes, used for SE Linux and
ACLs.
cpio and most older tar versions
don't support these ( at least not currently). pax is POSIX's answer to tar and
cpio shortcomings.
pax attempts to read and write many of the various
cpio and tar formats,
plus new formats of its own.
Its command set more resembles cpio than
tar.
(Unfortunately I can't get it to backup the extended attributes
like star.)
dd is a command that copies and optionally
converts data.
It isn't used to create archives, but because of its many options
it is frequently used to copy disks and backup/archive files, and
to create (empty image) files.
The command was named for the IBM mainframe
JCL of the same name (and ugly, non-standard syntax);
no one knows anymore what the name originally meant. If you need to backup large (e.g., DB) files use a large blocksize.
Many types of systems can use LVM, ZFS or some equivalent that supports snapshots for backup without the need to taking the filesystem off-line.
NAS (and some SAN) systems are commonly backed up with some tool that supports NDMP (the network data management protocol), which usually works by doing background backup to tape of a snapshot. This has a minimal effect on users of the storage system.
Jörg Schilling's star program currently supports
archiving of ACLs.
IEEE Standard 1003.1-2001 (POSIX.1
)
defined the pax interchange format
that can handle
ACLs and other extended attributes
(e.g., SE Linux stuff).
Gnu tar supposedly handles pax
and star formats.
There is also a spax tool that
supports the star extensions.
A tool that supposedly easily and correctly backs up
ACLs, ext2 and ext3
attributes, and extended attributes (such as for SE
Linux) is
, a BSD modified
version of bsdtartar that uses libarchive.so
to read/write a variety of formats.
(Personally I've only been able to use star
and spax to make and restore such archives.)
bru (commercial software)
amanda (a powerful network backup scheduling tool)
unison (Uses rsync to mirror directories between
systems, including between Unix/ Linux, and Windows systems.
rsync.
To make a backup of /home to server.kaos.org
with rsync via ssh:
(untested!)
rsync -avre "ssh -p 2222" /home/ server.kaos.org:/home rsync -azve ssh me@ server.kaos.org:documents documents rsync -azve ssh documents me@ server.kaos.org:documents
rsync -HavRuzc /var/www/html/ example.com:/var/www/html/
# or copy ~/public_html to/from me@example.com:public_html/
-v = verbose,
-c = use MD4 to see if dest file different than src,
-a = archive mode = -rlptgoD = preserve almost everything,
-r = recursive,
-R = preserve src path at dest,
-z = compress when sending,
-b = backup dest file before over-writing/deleting,
-u = don't over-write newer files,
-l = preserve symlinks,
-H = preserve hard links,
-p = preserve permissions,
-o = preserve owner,
-g = preserve group,
-t = preserve timestamps,
-D preserve device files,
-S = preserve file "holes",
--modify-window=1 = timestamps match if different by
less than this number of seconds
(required on Windows which only has 2 second time precision)
Modern rsync has many options to control the attributes
at the destination.
You can use --chmod, transfer ACLs and
EAs.
You can create rsyncd.conf files to control behavior
(and use a special ssh key to run a specific command),
and define new arguments via ~/.popt.
But older rsync versions don't have all those
features.
On Fedora Core 2 for example, rsync has no options to
set/change the permissions when coping new files from Windows;
umask applies.
(You can use special ssh tricks to work around this,
to run a
command after each use of rsync.)
find ... |xargs chmod...
A better way on Fedora is to set a default ACL on each
directory in your website.
Then all uploaded files will have the umask over-ridden:
cd ~/public_html # or wherever your web site is.
find . -type d -exec xargs setfacl -m d:o:rX {} +
This ACL says to set a default ACL on all
directories, to provide others
read, plus execute if a
directory.
(New directories get this default ACL too.)
With this ACL, uploading a file will have
644 or 755 permissions, rather than
640 or 750.
find ... | cpio -o --format=crc > file.cpio
# crc is the new SysVr4 format with CRCs
cpio -idl < file.cpio # -d means to create directories if needed;
# -l means to link files if possible
cpio -id glob-pattern <file.cpio
# note these globs (wildcards) match leading dot and slashes!
cpio -it < file.cpio # table of contents
# "-v" means print filenames as processed;
# "-V" means print a dot per file processed
Sample commands to backup all files to backup media:
find . -path './proc' -prune -o -print | \ cpio -o --format=crc > /dev/tape # Command to restore complete (full) backup: cpio -imd < /dev/tape # Command to get table of contents: cpio -tmd < /dev/tape
find ... | pax -wvx ustar > pax.out pax [-v] < pax.out pax -rv -pe < pax.out # "-pe" means preserve everything # Note spax also has an "-acl" option
To duplicate some files or a whole directory tree requires more then
just copying the files.
Today you need to worry about ACLs, extended attributes
(SE Linux labels), non-files (e.g., named pipes, or
FIFOs), files with holes, etc.
Gnu cp has options for that but can't be
used to copy files between hosts.
The best way to duplicate a directory tree on the same host is
Gnu cp -a, or if not available use:
spax -pe -rw olddir newdir
To copy a whole volume to another host you can use dump
and then transfer that, and restore it on the remote system.
Files or backups and archives can be copied between hosts with
scp, sftp, or rsync.
tar is often used to duplicate a directory tree to the
same host if Gnu cp isn't available.
tar can also be used to duplicate a directory tree to
a different host, via ssh:
tar czf - -C sourcedir files \ | ssh remote_host 'tar xzf - -C destdir'
Use tar with ssh if this is a complete
tree transfer.
For extra performance use different compression (e.g., -j
for
bzip2).
Using rsync over ssh often performs better
than tar if it is an update (i.e., some random subset
of files need to be transferred).