I developed a version control software for graphic designers and 2D/3D artists called Snowtrack in Angular and Electron. In this blog post, I will cover some technical challenges about file locks which I faced during the development of Snowtrack.
What is Snowtrack?
Snowtrack is an intuitive, easy-to-use, and super-fast version control software for graphic projects. Its purpose is to make version control accessible to graphic designers and 2D/3D artists with a non-technical workflow.
To get a better understanding of Snowtracks user interface check out the following screenshot:
What I used to build it
For the UI application I used a combination of Angular and Electron. The underlying version control engine is called SnowFS, an open-source project I developed as a fast and simple alternative to Git and Git LFS. Feel free to check it out on GitHub. A few months ago I wrote a blog post about it here on dev.to.
Technical challenge no. 1
Graphic projects can differ in size tremendously. From a single Photoshop file up to a 50 GB file set of 3D scenes, textures, and assets. These project types come with their own set of problems. In the following I want to clear up some misconceptions about the topic around file locking.
File Locking
Take a look at the code snippet below.
// Process 1
fd = fs.openSync("~/foo", "w");
// Process 2
fd = fs.openSync("~/foo", "w");
Imagine more than one process wants to open the same file at the same time. What do you think will happen?
Answer: It depends on the OS and if you're the maintainer of all processes.
When you call fs.openSync
NodeJS will forward the call behind the scenes to an OS function as you can see from this C code
static ssize_t uv__fs_open(uv_fs_t* req) {
return open(req->path, req->flags | O_CLOEXEC, req->mode);
}
The function open(..)
is an OS function and available in all operating systems. But the internals of this function differ between Windows, Linux and macOS so I will cover them separately.
macOS/Linux
Technically, neither macOS nor Linux have true file-locking mechanisms. Although you can read or write-lock a file using a function called fcntl
, only programs which use this function regard and respect the file lock. This means, any other process which doesn't use fcntl
and directly wants to open a file can acquire a file handle and manipulate the content as long as the file permissions allow it. What a bummer.
That's why file locking on macOS and Linux is also called "advisory file locking".
Windows
Windows is more complicated in that matter. Windows offers two functions to open a file. Either through the Windows API function called CreateFile (yes, that's really the name to open files),...
...or through open(..)
. But the open(..)
function on Windows is a POSIX extension and uses CreateFile
internally as well.
As we've seen above NodeJS uses open(..)
, but since we know that this is just a wrapper for CreateFile
, let's check out that function:
// The low-level open function of Windows.
HANDLE CreateFile(
LPCSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dwShareMode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE hTemplateFile
);
CreateFile
has a parameter called dwShareMode
. A file that is opened with dwShareMode=0
cannot be opened again until its handle has been closed.
So if you use open(..)
on a file that was already open by another process with CreateFile(…, dwShareMode=0)
you receive this error message:
The process cannot access the file because it is being used by another process
On the other hand, if you use fs.openSync
in NodeJS, or open(..)
in C/C++, to open a file that hasn't been opened yet, you cannot prevent another application from modifying it*.
* Unless you you use file permissions as a workaround, but that’s not really a file lock.
To prove this, you will see that our fs.openSync
call executes CreateFile
with the read/write shared flags to comply with the POSIX standard.
This means on Windows you cannot prevent another application from opening and modifying your file if you don't use CreateFile
.
What does this have to do with Snowtrack?
Imagine a user saving a big file in a graphic application and while the file is still being written to disk, the user attempts to commit the file change. How does Snowtrack deal with this?
As we learned, open(..)
has no file locking and most applications don't even follow the file protocol and Snowtrack cannot control how Photoshop, Blender, and co. open and write their files.
This means the only reliable chance of detecting if a file is still being written by another process is to check prior to a commit if any process on the system has a write handle on that file.
On Windows, I solved this with a custom helper process and and the Windows API of Restart Manager which is mainly used for installers to ensure the files it is about to replace are not open anymore.
On MacOS I invoke the system process
/usr/sbin/lsof
(list open files) with an inclusion of the working-directory to speed up the execution of this command.
What else?
The development of Snowtrack came with countless technical challenges and I would be happy to share more insights.
File locking, Electron/Angular race conditions, I/O saturation, build server, update mechanisms, edge cases, .. with this project I touched many subjects and I would be happy to write a follow-up blog post if you are interested. Let me know in the comments below.
If you want to support SnowFS, Snowtrack or me then feel free to join me on Twitter.
Thanks for reading :-)
TLDR
Don't get me started on file-locking.
Addendum: What about the "File In Use" dialog in Windows?
If you are a Windows user you might have seen this error message before:
Windows, or rather NTFS, behaves very different compared to other file systems like HFS+, APFS, ext3, ...
There is no equivalent to inodes in NTFS and therefore no garbage collection deletes the file if the last file handle to an already deleted file is closed. The File in Use dialog only indicates, that if any process has a file handle to a given file (no matter how it got opened), it cannot be renamed, moved, or deleted. That does not imply a file lock on the file content.
Top comments (11)
Could you implement something lower level based on union fs, or some other fancy file system? (eg what docker uses for its layers)
This problem can't be unique to your rcs. How have others solved it?
A simple, pragmatic (not 100% robust) solution may just be to dupe the file to a private pre commit version, then checksum the copied dupe and compare with the original file, after a few seconds, and if they still checksum match, commit (the private copy)?
I believe most file systems inherently have lock mechanisms for read/write operations- eg if you are reading a file to do a checksum and it is being written to, the writing operation for the fs will stall the checksum read operation until it is finished writing.
Thanks for your input! As far as I know there are no proper solutions yet. I’ve found countless of articles, blog posts, and threads on Stackoverflow and Reddit with the same problem.
For your suggestion I’m wondering, doesn’t the “private” copy have the same lock problem? While creating/copying this private file, another process could just modify the content of the original file.
And for file systems, they are mostly atomic on block level. Every write operation bigger than that is undefined afaik
I'm no expert on file systems. But in principle if all read operations on a file block write operations, the following would work:
do a checksum on the original, which would block writes during this period. Copy it, then checksum the private copied file. If the checksums match, commit the private file, else repeat.
But apparently at the os/kernel level, to which you alluded, one can get multiple write() operations on the same file during writing, at which point read() operations could interlace between the write ops. Ok, So we don't have full file atomicity.
So what about: checksum the original, copy to private, checksum the private, then checksum the original again (after a second or so), and compare all three - try again if all three don't match.
Would we have a reasonable heuristic for avoiding the need to use locks at all? It's like a high level compare and swap mechanism. Waiting a second or so on every file would be costly if commiting many files so you'd do this comparison op on groups of files, rather than one at a time.
Great idea, but taking the initial checksum already has the file locking problem as the file cannot be locked. Just for the sake of outlining a potential edge case: Imagine the write operation of a process on file "foo" being faster than the read operation which takes the checksum. So while a file is being overwritten by a process the file computed the checksum of the old file and the remaining part of the new content.
This could be solved of course, but only if you're the maintainer of all involved processes.
Let's say the read op begins first, as you describe. It gets half way through the file, when the write op comes in, which latter races through to the finish line first, causing, as you say, the reader to checksum the earlier part of the file with older content, the latter part of the file with the newer file content.
As the reader finishes last, we then determine the checksum (based on a corrputed snapshot of the file), copy the file, and do a checksum on the private file. The private file will be a fully written file, so this latter checksum will be for the entire updated content, thus mismatching with the first checksum. This would cause your commit process to retry (or fail if the command line arg said "fail on detecting update during commit"). I suggested a third checksum back on the original file, in my heuristic, just for a sanity check.
As an example of why I suggested the third check - say a write operation comes in first. It gest someway through and then stalls. The read op comes in, for the original checksum, then it copies the file (assuming the stalled write process allows it), which will match the original (Despite the file being partially written when copied). The third checksum on the original is intentionally delayed a second or two to allow a stall to "resume" and a mismatch to be detected at this last hurdle. That's why I said it is not 100% fool proof. The stall could last multiple seconds or more. But that would be a very broken os...
That makes absolutely sense. I've approached this more with speed and performance in mind since project files can become quite large. During the beta test it happened too often that users continued working and saving while the app was still running in the background
But you're approach is pretty good, I like it. As in to use a "pre-staging area" as a safety net
I meant to add an additional comment - it seems that if you write to a file it can sometimes take a long time to write - imagine a browser is downloading a file - the write stall may depend on the network conditions, and could take as long as you like.
I think your approach of checking for open file handles, combined with the checksum approach I suggested will solve this issue.
So
i) read and checksum original file
ii) check any write handles on original file, as you were doing (fail/abort if any)
iii) copy original file.
iv) compare checksum of copied file with original (fail if mismatch)
These four steps would seem to suffice to me, avoiding race conditions, as well as a partially written, stalling write operations just before the commit operation has begun.
Hey, sorry for the late reply. That's a really good idea you have there, I'll have to let that sink in. In case you're interested, you are more than welcome to contribute your ideas to the project either on GitHub directly or on Discord as well. Door is always open!
I will clone and check the project out, it does sound interesting.