I had a repo that suffered an unconventional fork, in other works, someone decided copy it and work outside Git —or any other version control system . After that, the fork was set as a Subversion repo and store in a local server. To bring those changes back to the original repo, I had to set it up as a non related branch of the original repo, as I explain here, so I can proceed and merge it later.
In spite of that, and as if dealing with the changes to merge weren’t enough, line endings on the subversion version —sorry for the redundancy— of the repo were showing <0x0d>
when I performed a git diff
of the two branches. Line endings are handled in a different way on Unix-like systems —like Linux or macOS— and on Windows systems. While Windows uses for line endings the CR/LF
, convention that is also know as ASCII 0x0d
or \r
and a newline \n
which ASCII is 0x0a
, Unix-like system uses just LF
, also called \n
or in ASCII 0x0a
.
Some people argue that Windows way is more correct, because it uses a new line and carriage return, while Unix it’s just new line, which could be just that, new line but not the carriage return to the left most part of the line and in consequence you’d end at the end of a new line, instead of the beginning. The reality is, in Unix they decided to have just one character, so they could save some space in memory —which has all the sense in the world in the old days— and that’s it. Since Git is a Unix developed app, if you have files with Widows line-endings and things haven’t been configured properly, you end up with a bunch of files with something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
-export(addconversionfactorandprice)<0x0d>
-export(adddot2logfile)<0x0d>
-export(addmissingmirrorflow)<0x0d>
-export(addpartnerflow)<0x0d>
-export(addregion)<0x0d>
-export(calculatediscrepancies)<0x0d>
-export(changecolumntype)<0x0d>
-export(changeflowmessage)<0x0d>
-export(checkdbcolumns)<0x0d>
-export(clean)<0x0d>
-export(clean2excel)<0x0d>
-export(cleancomext)<0x0d>
-export(cleancomextmonthly)<0x0d>
Nice… a lot of visual garbage around. And not even just visual garbage, but it also seems that git diff
it’s piking those differences in lines endings as real differences between files. So all the files that have been edited seem to have all those lines endings on all the lines and Git is assuming that the whole file was changed.
Where does the problem come from?
I really don’t know where the problem come front, exactly, but I can have a guess. Most probably the version-control-less fork was developed under Windows were changes to the line endings probably happened. However, that shouldn’t be a problem since it isn’t the first time I’m working on a repo that is used, and edited, on Windows and Unix-like systems and I never ever I have an issue like this. I guess that Git, on its normal operation on Windows, it’s configured to strip those extra characters from the files when they are added to Git database. So, I guess, the problem comes from the Subversion repo and its transformation to a Git repo with the git-svn
command. Probably Subversion doesn’t strip those characters off and I didn’t have the correct configuration on Git to deal with this case the line endings weren’t normalized.
How to fix?
Just a simple tip, don’t use Windows , and the life of everyone would be much easier. Ok, I just kidding —or not— but since this is not possible let’s see if we can find a real solution.
Git configuration
First of all, you have to configure Git accordingly, just in case.
1
2
3
4
5
# for Windows
git config --global core.autocrlf true
# for Linux & macOS
git config --global core.autocrlf input
You can also establish configuration at repo level with the .gitattributes
file with
1
* text=auto
Transform current files
However, the above configuration will only prevent new files from having those line endings. How can you transform the previous files? You just can follow GitHub tutorial about it.
- You just first save your changes, in case you had any:
1
2
git add . -u
git commit -m "Saving files before refreshing line endings"
- Use Git to renormalize everything in your repo
1
git add --renormalize .
- Now you can see the renormailized files and commit them.
1
2
git status
git commit -m "Normalize all the line endings"
Please be sure you are normalizing the correct branch. I had a lot of trouble because I thought I was normalizing the offending branch —I was not— and even after the normalization I was still seeing the line-endings garbage.
You can also check this post on stack over flow if you want to know more.
dos2unix & unix2dos
If you don’t want to use Git for this task, you can always rely in dos2unix
and on unix2dos
commands on terminal. If you don’t have them, you can easily install with brew.
1
brew install dos2unix
and if you are in Linux, use whatever method is used in your distro.
However, be careful using this apps because perhaps you transform something that you aren’t supposed to. Like for example, binaries. I did it and I had to restore them because I left then useless.
Bottom line
Guys, seriously, use Git, or subversion, or whatever version control you like to version your work and make it easy to others —and to yourself— to figure out what was going on your work and on your code. I really don’t understand why this is not a common practice when people is teaching / learning to program in whatever language. It’s one of the things that make your work look like a pro.
Also be mindful about the system / OS you are using and how it works. Not everyone is using the same OS and we need to make things work cross-platform.
Leave a comment