I have two servers where I regularly need to transfer large files from one to another. With the way my applications are setup I need the two server directories to be in “sync” with each other.
Previously I used to use a small-ish bash script that would scp
the files after they were collection at the source server. Often the two directories would be out of sync while the script moves about some 300GB of data.
I was recently introduced to sshfs
by a friend so I decided to give it a try.
sshfs
Was rather easy to setup, I just had to install sshfs
on the destination server and run the following command:
sshfs -o allow_other,default_permissions,uid=911,gid=911,umask=0000 source-server:/archives/ /backups/
I needed to add the uid
, gid
and umask
options to make sure the files were readable by the applications on the destination server. Often they would be owned by root
and I would have to chown
them.
Some rudimentary benchmarks ⬇️
Let’s create a 1GB file on the destination server:
$ dd if=/dev/zero of=testfile bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.854064 s, 1.3 GB/s
Now let’s copy it to the mounted directory:
time dd if=testfile of=/backups/testfile bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.6881 s, 91.9 MB/s
real 0m11.773s
user 0m0.011s
sys 0m0.519s
91.9MB/s! Not bad at all. I was pleased with the performance but this was not the intended use case. I needed to transfer files from the source server to the destination server.
Reading from the mounted directory:
time dd if=/backups/testfile of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 59.2111 s, 18.1 MB/s
real 0m59.401s
user 0m0.027s
sys 0m1.380s
… which was disappointing considering both machines have a 1Gbps symmetrical connection.
rclone mount
My first attempt to mount looked like this:
rclone mount source-server:/archives/ /backups-rclone/ --allow-other --uid 911 --gid 911 --umask 0000 --default-permissions
I was getting a measly 2MB/s. I found a few possible optimisations:
- Apparently by default
rclone
does not use caching and the--vfs-cache-mode
flag needs to be set towrites
to enable it. Thewrites
cache mode enables write-back caching, which can improve performance by caching file writes locally and uploading them to the remote server in the background. - Increasing
--buffer-size
to 64M - Enabling multi-threading with
--multi-thread-streams 4 --multi-thread-cutoff 250M
rclone mount source-server:/archives/ /backups-rclone/ --allow-other --uid 911 --gid 911 --umask 0000 --default-permissions --vfs-cache-mode writes --buffer-size 64M --multi-thread-streams 4 --multi-thread-cutoff 250M
The performance was unbelievably better:
time dd if=testfile of=/backups-rclone/testfile bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.20803 s, 889 MB/s
real 0m1.233s
user 0m0.031s
sys 0m0.343s
… and reading from the mounted directory:
time dd if=/backups-rclone/testfile of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.09872 s, 977 MB/s
real 0m1.217s
user 0m0.016s
sys 0m0.426s
At 977MB/s, rclone
claims to be copying over at 7816Mbps on a 1Gbps network which is fishy to say the least. Thinking it must be the VFS cache, I tried again with a different file:
time dd if=/backups/testfile2 of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.5331 s, 69.1 MB/s
real 0m15.605s
user 0m0.012s
sys 0m0.789s
Which is quite a ways off from the 1Gbps capacity but still almost 4x better than sshfs
.
Conclusion
It seems sshfs
is deprecated as well, from the README:
SSHFS is shipped by all major Linux distributions and has been in production use across a wide range of systems for many years. However, at present SSHFS does not have any active, regular contributors, and there are a number of known issues (see the bugtracker).
I’m think I’m going to stick with rclone
for now. If you have any other flags I could use to improve performance, I’d love to try those out. Feel free to reach out to me on hi @ this domain.