Originally published on peateasea.de.
I’ve been using lftp
’s reverse mirror feature for years to upload files to my blog. I’d never worked out how to avoid repeated file uploads. Until now.
A bit of background
Jekyll creates and populates a directory called _site
when building the production version of a static site. To upload the files in the _site
directory to my hosting provider, I’ve been using an lftp
script like this:1
open sftp://<username>:not-a-password@ftp.<domain-name>
mirror -v --delete --reverse _site/ /public_html/
The open
command opens a connection2 to the FTP server that my hosting service provides for my domain and logs me in. Since the protocol is SFTP, the connection uses SSH and I can use an SSH key for authentication. This is great because then I don’t need to use a password.
Note that the not-a-password
component is an important placeholder: it’s there because a password is still expected before the @
symbol even though authentication uses public key encryption. The authentication mechanism thus ignores it. I chose this placeholder value to remind me that it isn’t a password and that I shouldn’t ever put one here.
The mirror
command normally downloads files from an upstream (usually remote) source to a local system. Thus the usual mirror process pulls remote files from upstream. Using the --reverse
option swaps the sense of the mirror mechanism and files are instead uploaded (i.e. pushed) to the upstream system.
In the case I describe here, I push all files from within the _site/
directory to the /public_html/
directory on my hosting service.
When reading the documentation you will see terms like “source” and “target”. When mirroring to a local system then the “source” is the remote system and the “target” is the local system. With the --reverse
option, these are swapped and the local system is now the “source” and the upstream system is the “target”.
The --delete
option removes any files from the target system which are not present in the source. In our case, this is anything within the /public_html/
directory tree. This ensures that if I delete or rename a file, it isn’t still floating around on the production system, which might confuse someone in the future.
The -v
option turns on the first level of verbosity so that I can get feedback about what’s happening when mirroring the files to the upstream system.
The problem
So what’s the issue? Well, each time I mirror the site to production, the lftp
script re-transfers all my files. In particular, lftp
removes each file from upstream before uploading it again. This happens even if the files haven’t changed. Here’s what I mean:
$ lftp -f deploy_site.lftp
Removing old file `feed.xml'
Transferring file `feed.xml'
Removing old file `index.html'
Transferring file `index.html'
Removing old file `sitemap.xml'
Transferring file `sitemap.xml'
Removing old file `about/index.html'
Transferring file `about/index.html'
Removing old file `add-favicon-to-mm-jekyll-site/index.html'
Transferring file `add-favicon-to-mm-jekyll-site/index.html'
<snip>
Not only is this annoying, but it’s a waste of network resources and time. I’d tried to get lftp
to only upload changed files in the past, but never seemed to have found the right incantation. Until today. Today, I finally found the information I needed to make this work.
The solution
If you read the lftp
man page, you’ll find in the mirror
section the --only-newer
option. Adding this option to the mirror
command mentioned earlier, we get
mirror -v --only-newer --delete --reverse _site/ /public_html/
Using this command you’ll find that it still transfers all files upstream. Gah! Why doesn’t this work?
Today I managed to stumble upon why this is so. An answer to the StackOverflow question Why lftp mirror –only-newer does not transfer “only newer” file? mentions a subtlety noted on Matthieu Bouthours’ blog and seemingly not mentioned anywhere else:
When uploading, it is not possible to set the date/time on the files uploaded, that’s why
--ignore-time
is needed.
Therefore, as mentioned in the StackOverflow answer:
[I]f you use the flag combination
--only-newer
and--ignore-time
you can achieve decent backup properties, in such a way that all files that differ in size are replaced. Of course it doesn’t help if you really need to rely on time-synchronization but if it is just to perform a regular backup of data, it’ll do the job.
Updating the mirror
command like so:
mirror -v --only-newer --ignore-time --delete --reverse _site/ /public_html/
fixes the issue and only uploads new or newly changed files, which is the desired behaviour. Yay! 🥳
Implementing this change in my build scripts reduced build and deployment times from 5.5 minutes to 2 minutes. That’s more than halved the time! Brilliant!
A word of caution
There’s a caveat here, though. If a file is changed and just so happens to be of the same size as its counterpart upstream, it won’t be transferred. One needs to bear this in mind.
To be honest, I’d prefer to use rsync
because it generates checksums of the files to detect file changes. Then I could be more certain that my scripts upload only newer files and don’t transfer older ones unnecessarily. However, until I have that option, this will do the job nicely.
Unfortunately my hosting service doesn’t allow
rsync
(at least not at my service level) and hence I can’t use a more sophisticated synchronisation mechanism. ↩Thank you, Captain Obvious! ↩
Top comments (0)