Well, this evening I had to laugh at myself with my programmer/developer hat on.
I'd been copying a bunch of of data CDs and DVDs to a hard drive - with the intention to then dispose of the discs, As a consequence, I wanted to be sure that the copies were perfectly complete. Well, as it happens, I had written myself a tool for exactly that function, the "Congruentry" part of my Foldatry tool. See: Foldatry
First of all, I copied each disc, and for that I used a program called Double Commander - no special reason, it's just my current stock file manager. It happily copied each disc, then after each I would run Foldatry and get it to prove the copy was perfect. All good so far.
Then, on one disc, a small number of files - say, about ten or so - failed to copy. What happened was that Double Commander gave an error for each of those files and allowed me to skip each and carry on. As I saw each name I took a guess that the problem was because I was writing to a Windows compatible drive (actually formatted as NTFS) which has more restrictions on filenames than other file systems, such as Linux from which the discs would have been made.
- actually Double Commander made a log of the skipped files, but with one thing and another I slipped up and didn't save that list anywhere.
Oh well, I thought, I'll now use Foldatry to tell me which files are on the disc but not on the target drive. While the main job of Congruentry is proving things identical, I also gave it options for reporting what differences it found. Except ... it didn't seem to work. It confirmed that the two locations were different but didn't show me what the differences were.
While I mused on why that might be, I started thinking ahead to how I would later need to manually copy the problem files - most likely by renaming them slightly. Assuming that I would do that, then I'd want a way to tell my program about that, so that it will recognise they are the same ones while doing the larger congruency check.
Clearly that was going to require some thinking, so I wrote some documentation on what this feature might be like. Not that you need or want to read them but they're at congruentry.md near the end under "Considerations"
With those notes written I opened my program in the development environment and tried to debug why it was failing to list the differences. Congruentry had been the very first part of Foldatry that I wrote, so while the apparent bug was a surprise the idea that there might be one probably wasn't.
By inspection I could find no apparent fault in the program code, and a test on dummy folders seemed to work flawlessly. So I put in the (source) disc and (target) drive and ran again on the files that caused the fault. Sure enough, it then didn't work. What was going on? Cue about an hour of more code and log inspection.
Strangely, while the log written to a text file showed nothing about the different files, the on-screen log display showed a bunch of blank lines at that point. Now I had spotted that from the outset but not thought anything of it - blank is blank after all. This time though I decided to get the count of those blank lines by using the mouse to clip them from the on-screen box to a text editor. Lo. As I dragged the mouse over the lines, various bits of filenames briefly displayed. Depending on how I dragged the mouse I'd see various things fleetingly under the moving highlight range.
That's when I laughed.
There was no bug in what the program was doing. It was correctly finding the files which had failed to copy - because of odd things in their filenames. It just wasn't displaying them nicely - because of odd things in their filenames.
When I was typing up my considerations - which were mainly about things like which chracters were illegal in Windows filenames - I had also written myself a note to later think more about multilingual / international characters and how to handle those.
Well, guess what the story was here!
It turns out that just in this little case to hand, I was encountering four different outcomes for odd things in these filenames:
- in Double Commander they display with a symbol in them - a white on black question mark in a black diamond (its way of showing an unrecognised character)
- in the Foldatry log file the filenames didn't print at all
- in Foldatry on-screen they do this strange there-but-not-visible-except-when-you-highlight-them thing
- in a Linux terminal window, after the "ls" command those files display like ''$'\374''024.jpg' (where I'm guessing that's a way of indicating character code point 374)
Clearly I'm going to have to tackle the unusual character issue first, rather than later. And what a rabbit-hole that proved to be. More on this later.