block_size = 4096 def signature(f): while True: block_data = f.read(block_size) if not block_data: break yield (zlib.adler32(block_data), hashlib.md5(block_data).digest()) class RsyncLookupTable(object): def __init__(self, checksums): self.dict = {} for block_number, c in enumerate(checksums): weak, strong = c if weak not in self.dict: self.dict[weak] = dict() self.dict[weak][strong] = block_number def __getitem__(self, block_data): weak = zlib.adler32(block_data) subdict = self.dict.get(weak) if subdict: strong = hashlib.md5(block_data).digest() return subdict.get(strong) return None def delta(sigs, f): table = RsyncLookupTable(sigs) block_data = f.read(block_size) while block_data: block_number = table[block_data] if block_number: yield (block_number * block_size, len(block_data)) block_data = f.read(block_size) else: yield block_data[0] block_data = block_data[1:] + f.read(1) def patch(outputf, deltas, old_file): for x in deltas: if type(x) == str: outputf.write(x) else: offset, length = x old_file.seek(offset) outputf.write(old_file.read(length))Source preserved from: https://blog.liw.fi/posts/rsync-in-python/ Note that the filelist is typically generated incrementally - "avoids keeping the whole file list in memory, and allows the transfer to start working on changed files before it has completed the recursive scan of the sending side." https://www.mail-archive.com/rsync@lists.samba.org/msg17767.html
rsync
Of course there is the man page, but here's a quick summary of what is going on.
Summary
rsync
is a utility to synchronize files. The algorithm is apparently fundamentally simple at its core1).
It has a similar interface to scp
, but offers a couple other advantages, including delta transfer, post-transfer checksum verification and transfer interrupt handling over unreliable connections. Also known to work at scale.
Because of the need to communicate filelists (as opposed to a straightforward copy), both the server and client must have the rsync
program.
Usage
A common starter command to transfer from local to remote is given by,
rsync -av [SRC] [USER]@[HOST]:[DEST]
where -a
is a compound flag representing archive mode (-rlptgoD
, recursive copy, preserve symlinks, permissions, modification times, group, and owner, as well as transfer block Devices + sockets/fifos), and -v
for verbose output. What it possibly looks like in the wild:
$ rsync -av ~/bitcoin.txt justin@192.168.100.1:/bank/ Password: sending incremental file list sent 70 bytes received 20 bytes 36.00 bytes/sec total size is 2,023 speedup is 22.48
Rsync is typically configured to tunnel transfer over SSH, but a different remote shell can be specified using the -e
flag. This is useful for passing shell specific parameters, e.g. a different ssh user, known_hosts file, or private key,
rsync -av -e "ssh -i ~/.ssh/id_rsa_user" ~/bitcoin.txt justin@192.168.100.1:/bank/
Where large files need to be regularly synced, a good idea is to force delta transfers (--no-whole-file
) and block-level compression (-z
), together with in-place modification (--inplace
) although this also risks the file being corrupted during concurrent access,
rsync -vz --inplace --no-whole-file ~/bitcoins.txt justin@192.168.100.1:/bank/
rsyncd
The rsync
client can also communicate with a server-side rsyncd
daemon, skipping the SSH tunnel. This can prove useful in situations where the SSH server only allows administrators to log into remote shells, which gives rsync
scripts an excessive level of privilege.
For this functionality, the daemon must first be launched before use. Network-attached storage software typically have an "rsync server" setting that can be enabled - same thing.
The use of the rsync daemon restricts the addressable locations to "modules" within the rsyncd
configuration, typically found at /etc/rsyncd.conf
2).
The list of modules can be viewed without breaking into the server,
$ rsync justin@192.168.100.1:: Password: Web Public homes mybank
The equivalent example rsync
command then becomes,
rsync -av ~/bitcoin.txt justin@192.168.100.1::mybank/
which implies the rsync
protocol is used over the default rsyncd
TCP port 873, in other words:
rsync -av ~/bitcoin.txt rsync://justin@192.168.100.1:873/mybank/
Gotchas
Since the rsync
on client and server side need to communicate, they need to know where rsync
is located. If the location of rsync
on the server is non-standard (can be found with a quick which rsync
), use:
rsync -av --rsync-path=/path/to/rsync ~/bitcoin.txt justin@192.168.100.1:/bank/
If rsyncd
daemon requires authentication, the password should ideally be provided as a single line in a password file and passed as an argument to rsync
:
rsync --password-file=~/my_secure_pwd ~/bitcoin.txt justin@192.168.100.1:/bank/
For the regular SSH-based rsync
, opt to use key-based access instead.
Synology
As of 2021-11-10 in latest firmware, the rsync
service is found under Control Panel > File Services > rsync
. The rsync
account needs to be enabled to allow daemon-based access. The SSH-based rsync is still available as long as the user has SSH permissions and write permissions to the target directory.
An example command for rsync syncing is supplied below:
rsync -av --password-file=/password --rsync-path=/bin/rsync /usr/backup/* backup@$192.168.100.1::backups/
QNAP
As of 2021-11-10 in latest firmware, the rsync
server is found under Hyper Backup Service 3 app, enabling the "rsync server". Works fine, other than connecting to a remote rsyncd
from the QNAP results in a segmentation fault. Since both the remote and local rsyncd
works fine, the issue is likely with the QNAP rsync
having issue with remote rsyncd
connection.
Workaround is to issue the rsync
command from the remote itself, and referencing the QNAP as the remote instead, e.g.
rsync -av justin@192.168.100.1::mybank/bitcoin.txt ~/
Windows
rsync
is built for Linux, but doesn't mean Windows cannot have a pie of the fun too. Mainly drew notes from this Medium article, amongst others.
Here we assume you're already using Git Bash (that is prepackaged with Git for Windows), which is based off MSYS2, so we look for compatible binaries from the same MSYS2 repository as well. In the spirit of verifying your downloads, links are decidedly not spoonfed directly.
- Download the Windows build for
zstd
from their release page - binaries forrsync
and thelibxxhash
library are compressed using ZStandard. - Download the
rsync
binary from the corresponding MSYS2 package page.
Unpack .zst
extensions with zstd -d [FILE]
, and the .tar
using your favorite unzipper. Copy the files in the \usr
directory in the unzipped archives to C:\Program Files\Git\usr
, then run rsync
in Git Bash, e.g.
PS \> bash . justin@laptop ~/ $ rsync --version rsync version 3.2.6 protocol version 31
Voila! If you see something like:
C:/Program Files/Git/usr/bin/rsync.exe: error while loading shared libraries: msys-zstd-1.dll: cannot open shared object file: No such file or directory
then please install the libraries specified. Those are not optional.
- /etc/rsyncd.conf
HBS3 AuthMode = RSYNC_AND_NAS QWanTag = 0 RLimitRate = 0 SLimitRate = gid = administrators hosts allow = * max downrate = 10240 pid file = /var/run/rsyncd.pid port = 873 read only = false rsync enpswd = V2@y9R4vABYBoy6nfHO+KwA3pF/hTJrk2E9UfqU4OqVGNV= rsync user = justin status = None uid = admin [Web] path = /share/CACHEDEV1_DATA/Web [Public] path = /share/CACHEDEV1_DATA/Public [homes] path = /share/CACHEDEV1_DATA/homes [mybank] path = /bank