Table of Contents

rsync

Of course there is the man page, but here's a quick summary of what is going on.

Summary

rsync is a utility to synchronize files. The algorithm is apparently fundamentally simple at its core1). It has a similar interface to scp, but offers a couple other advantages, including delta transfer, post-transfer checksum verification and transfer interrupt handling over unreliable connections. Also known to work at scale.

Because of the need to communicate filelists (as opposed to a straightforward copy), both the server and client must have the rsync program.

Usage

A common starter command to transfer from local to remote is given by,

rsync -av [SRC] [USER]@[HOST]:[DEST]

where -a is a compound flag representing archive mode (-rlptgoD, recursive copy, preserve symlinks, permissions, modification times, group, and owner, as well as transfer block Devices + sockets/fifos), and -v for verbose output. What it possibly looks like in the wild:

$ rsync -av ~/bitcoin.txt justin@192.168.100.1:/bank/
Password:
sending incremental file list

sent 70 bytes  received 20 bytes  36.00 bytes/sec
total size is 2,023  speedup is 22.48

Rsync is typically configured to tunnel transfer over SSH, but a different remote shell can be specified using the -e flag. This is useful for passing shell specific parameters, e.g. a different ssh user, known_hosts file, or private key,

rsync -av -e "ssh -i ~/.ssh/id_rsa_user" ~/bitcoin.txt justin@192.168.100.1:/bank/

Where large files need to be regularly synced, a good idea is to force delta transfers (--no-whole-file) and block-level compression (-z), together with in-place modification (--inplace) although this also risks the file being corrupted during concurrent access,

rsync -vz --inplace --no-whole-file ~/bitcoins.txt justin@192.168.100.1:/bank/

rsyncd

The rsync client can also communicate with a server-side rsyncd daemon, skipping the SSH tunnel. This can prove useful in situations where the SSH server only allows administrators to log into remote shells, which gives rsync scripts an excessive level of privilege.

For this functionality, the daemon must first be launched before use. Network-attached storage software typically have an "rsync server" setting that can be enabled - same thing. The use of the rsync daemon restricts the addressable locations to "modules" within the rsyncd configuration, typically found at /etc/rsyncd.conf2).

The list of modules can be viewed without breaking into the server,

$ rsync justin@192.168.100.1::
Password:
Web
Public
homes
mybank

The equivalent example rsync command then becomes,

rsync -av ~/bitcoin.txt justin@192.168.100.1::mybank/

which implies the rsync protocol is used over the default rsyncd TCP port 873, in other words:

rsync -av ~/bitcoin.txt rsync://justin@192.168.100.1:873/mybank/

Gotchas

Since the rsync on client and server side need to communicate, they need to know where rsync is located. If the location of rsync on the server is non-standard (can be found with a quick which rsync), use:

rsync -av --rsync-path=/path/to/rsync ~/bitcoin.txt justin@192.168.100.1:/bank/

If rsyncd daemon requires authentication, the password should ideally be provided as a single line in a password file and passed as an argument to rsync:

rsync --password-file=~/my_secure_pwd ~/bitcoin.txt justin@192.168.100.1:/bank/

For the regular SSH-based rsync, opt to use key-based access instead.

Synology

As of 2021-11-10 in latest firmware, the rsync service is found under Control Panel > File Services > rsync. The rsync account needs to be enabled to allow daemon-based access. The SSH-based rsync is still available as long as the user has SSH permissions and write permissions to the target directory.

An example command for rsync syncing is supplied below:

rsync -av --password-file=/password --rsync-path=/bin/rsync /usr/backup/* backup@$192.168.100.1::backups/

QNAP

As of 2021-11-10 in latest firmware, the rsync server is found under Hyper Backup Service 3 app, enabling the "rsync server". Works fine, other than connecting to a remote rsyncd from the QNAP results in a segmentation fault. Since both the remote and local rsyncd works fine, the issue is likely with the QNAP rsync having issue with remote rsyncd connection.

Workaround is to issue the rsync command from the remote itself, and referencing the QNAP as the remote instead, e.g.

rsync -av justin@192.168.100.1::mybank/bitcoin.txt ~/

Windows

rsync is built for Linux, but doesn't mean Windows cannot have a pie of the fun too. Mainly drew notes from this Medium article, amongst others.

Here we assume you're already using Git Bash (that is prepackaged with Git for Windows), which is based off MSYS2, so we look for compatible binaries from the same MSYS2 repository as well. In the spirit of verifying your downloads, links are decidedly not spoonfed directly.

Unpack .zst extensions with zstd -d [FILE], and the .tar using your favorite unzipper. Copy the files in the \usr directory in the unzipped archives to C:\Program Files\Git\usr, then run rsync in Git Bash, e.g.

PS \> bash .
justin@laptop  ~/
$ rsync --version
rsync  version 3.2.6  protocol version 31

Voila! If you see something like:

C:/Program Files/Git/usr/bin/rsync.exe: error while loading shared libraries: msys-zstd-1.dll: cannot open shared object file: No such file or directory

then please install the libraries specified. Those are not optional.


1)
Implementation in Python. Summary: For each file in destination, build list of Alder32 checksum + MD5 hashing at block level. Compare with same filename in source, using checksum to quickly verify if block modified - hash for deconflict. Patch by inserting delta changes, otherwise copy blocks from original file.
block_size = 4096
 
def signature(f):
    while True:
        block_data = f.read(block_size)
        if not block_data:
            break
        yield (zlib.adler32(block_data), hashlib.md5(block_data).digest())
 
class RsyncLookupTable(object):
 
    def __init__(self, checksums):
        self.dict = {}
        for block_number, c in enumerate(checksums):
            weak, strong = c
            if weak not in self.dict:
                self.dict[weak] = dict()
            self.dict[weak][strong] = block_number
 
    def __getitem__(self, block_data):
        weak = zlib.adler32(block_data)
        subdict = self.dict.get(weak)
        if subdict:
            strong = hashlib.md5(block_data).digest()
            return subdict.get(strong)
        return None
 
def delta(sigs, f):
    table = RsyncLookupTable(sigs)
    block_data = f.read(block_size)
    while block_data:
        block_number = table[block_data]
        if block_number:
            yield (block_number * block_size, len(block_data))
            block_data = f.read(block_size)
        else:
            yield block_data[0]
            block_data = block_data[1:] + f.read(1)
 
def patch(outputf, deltas, old_file):
    for x in deltas:
        if type(x) == str:
            outputf.write(x)
        else:
            offset, length = x
            old_file.seek(offset)
            outputf.write(old_file.read(length))
Source preserved from: https://blog.liw.fi/posts/rsync-in-python/ Note that the filelist is typically generated incrementally - "avoids keeping the whole file list in memory, and allows the transfer to start working on changed files before it has completed the recursive scan of the sending side." https://www.mail-archive.com/rsync@lists.samba.org/msg17767.html
2)
An example provided natively by QNAP (setup by HBS3):
/etc/rsyncd.conf
HBS3 AuthMode = RSYNC_AND_NAS
QWanTag = 0
RLimitRate = 0
SLimitRate =
gid = administrators
hosts allow = *
max downrate = 10240
pid file = /var/run/rsyncd.pid
port = 873
read only = false
rsync enpswd = V2@y9R4vABYBoy6nfHO+KwA3pF/hTJrk2E9UfqU4OqVGNV=
rsync user = justin
status = None
uid = admin
 
[Web]
path = /share/CACHEDEV1_DATA/Web
 
[Public]
path = /share/CACHEDEV1_DATA/Public
 
[homes]
path = /share/CACHEDEV1_DATA/homes
 
[mybank]
path = /bank
The modules are specified in square brackets.