Have you ever had an issue with line endings, because the file was created on another OS? I actually had line endings cause a weird and hard-to-troubleshoot bug in one of my dependencies that used
txt files. No, I'm serious. I ended up using a HEX editor to keep my sanity from going bananas. That's what you could end up doing after a long and tiresome coding session, so go ahead and make yourself a favor by reading the rest of this post (jk) 🤣
In this article I am going to briefly discuss this problem, as well as why different operating systems sometimes use different line endings - DOS/Windows uses
EOL character, while UNIX uses only
LF. Line endings, as any other character, are encoded in numbers which are in turn encoded in bytes. The ASCII character encoding standard is responsible for mapping those characters to numbers. Technically,
CRLF is more accurate, but more on this later.
Line ending characters (
LF) are control characters, meaning that they are not visible. Their role is to inform the computer about certain actions and are of no visual value to the user. The fact that they are not visible can cause anger and frustration while observing two seemingly identical files which could have different line-endings. The only way you could tell the difference is to compare them byte-by-byte using a HEX editor. Finally, it is important to note that Git uses "native" as a default value for
core.eol configuration which means that it will use whatever line-ending is specified by the OS.
LF stands for "Line Feed". The well-known newline escape sequence is
\n. This control character indicates a new line and tells the computer to move the cursor one line below.
CR stands for "Carriage Return". The escape sequence is
\r and it informs the computer to return to the beginning of the current line.
You might've guessed it by now: Combining those two characters would result in what one expects when pressing the "Enter" key in a word processor.
- Return the cursor to the beginning of the line.
- Move one line to the bottom.
That is actually how early typewriters worked.
The problem with line-endings is that, by default, Git WILL convert them to
CRLF if using Windows and
LF if using UNIX/OS X, no matter what. In case you haven't configured Git to handle line-endings in a certain way, pulling from a repo which doesn't have a
.gitattributes file to enforce a specific
EOL character is going to cause you headaches. That's exactly what I encountered when a dependency, which happened to be committed in a different OS, was reading
txt files from the filesystem and splitting them by
LF). I was using a Windows machine, so Git was quietly converting them to
\r\n). Thus the application would produce weird bugs.
What was even more frustrating is that
git-diff would not show any differences between the original files and my local copies. The files were seemingly identical! In the past, I had messed with line-endings, but not to this extent. Having a brain-fog due to long hours of coding, I decided to just open a HEX editor and inspect the bytes. Boom! I started to notice differences right away. So, I was at least sure that the files were indeed different. By taking a closer look, one could see that the bytes would differ after each word (it was a stopwords file, so there were many words separated by new lines). And that's the point when everything started to make sense.
After inspecting the original files'
EOL which was
LF, I immediately added a
.gitattributes file enforcing
LF line-endings on
* text=auto eol=lf
This rule is going utilize Git's filetype detection algorithm to resolve
txt files and enforce
LF line-endings on both checkout and commit.
Any file committed before the addition of the
.gitattributes file, should be renormalized. This is done by running the following command:
git add --renormalize .
This command will update the line-endings in Git index, effectively staging the files for commit. Current working tree WILL NOT be updated- you have to re-
fetch. To ensure that all files are using
LF you can run:
git ls-files --eol # This will print: # i/lf w/crlf attr/text=auto eol=lf file.txt # # Where: # i = Index line-endings # w = Working tree line-endings # attr = Any .gitattributes rules that may apply
While researching the issue, I found some really interesting historical context which I really encourage you to read, in order to understand the topic more thoroughly.
As stated in the Introduction part of this post,
EOL characters were inspired by early typewriters. Back in the day, when typing on such a machine, the typist needed to feed a paper sheet to the machine. On each keystroke the machine would print the letter onto the sheet and move the carriage so that the next letter would appear to the right of the previous one. Additionally, the carriage needed to be reset when inserting a new line. So, the process of writing into a new line involved resetting the carriage, as well as rotating it to move the sheet upwards. Resetting the carriage amounts to
CR and rotating the carriage to
LF. Who would've though, eh?
Now, regarding Windows vs UNIX (
LF), it's obvious that, while redundant nowadays,
CRLF represents the whole process of "inserting a new line", abiding to the original convention. To me, as long as everything is configured correctly, I don't really care at all 😅