DEV Community

Cover image for ⛔ Squash commits considered harmful ⛔
Manuel Odendahl
Manuel Odendahl

Posted on

⛔ Squash commits considered harmful ⛔

A recurring conversation in developer circles is if you should use git --squash when merging or do explicit merge commits. The short answer: you shouldn't.

People have strong opinions about this. The thing is that my opinion is the correct one. Squashing commits has no purpose other than losing information. It doesn't make for a cleaner history. At most it helps subpar git clients show a cleaner commit graph, and save a bit of space by not storing intermediate file states.

Let me show you why.

Git tracks contents, not diffs

In many ways you can just see git as a filesystem.
– Linus (in 'Re: more git updates..' - MARC)

Git is in many ways a very dumb graph database. When you check in code, it actually stores the content of all the tracked files in your repository.

The content of each file is stored as a "blob" node in the database. The filenames are stored separately in a "tree" node: If you rename a file, no new content node will be created. Only a new tree node will be created.

Commits are store as "commit" nodes. A commit object points to a tree, and adds metadata: author, committer, message and parent commits. A merge commit has multiple parents.

Here is a visualization from Scott Chacon's Git Internals:

Image description

Looking at a real git repository

Enough theory, we have work to get done. Let's create a simple git repository:

> mkdir squash-merges-considered-harmful
> cd squash-merges-considered-harmful 
> git init
> echo hello > foo.txt
> git add foo.txt
> git commit -m "Initial commit"
[main (root-commit) 02a154b] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 foo.txt
> echo more >> foo.txt
> git add foo.txt
> git commit -m "Add more" 
[main 16660f8] Add more
 1 file changed, 1 insertion(+)
Enter fullscreen mode Exit fullscreen mode

We can now look at the contents of the objects we created:

# initial commit
❯ git cat-file -p 02a154b
tree f269b7cd59094d5365ef6b5618098cbcbeee0c43
author Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400

Initial commit
# initial tree
❯ git cat-file -p f269b7cd59094d5365ef6b5618098cbcbeee0c43
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    foo.txt
# initial foo.txt
❯ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello

# second commit
❯ git cat-file -p 16660f8
tree 5a0c4a660a13c0ada7611651399abb362756f83e
parent 02a154bc4f0fa9bca567676d45d136619c076a95
author Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400

Add more
# second tree
❯ git cat-file -p 5a0c4a660a13c0ada7611651399abb362756f83e
100644 blob 2227cddb7f6318ea735a1c4adb52f5cd36c5783c    foo.txt
❯ git cat-file -p 2227cddb7f6318ea735a1c4adb52f5cd36c5783c
hello
more

Enter fullscreen mode Exit fullscreen mode

Branches, tags (and branches, tags on remote repositories) are just pointers to commit nodes.

cat .git/refs/heads/main         
16660f8b1d1538ed1b55d8533b3ee7feb68e474c
Enter fullscreen mode Exit fullscreen mode

But we still use diffs and merges

But Manuel, you ask, how does git diff and git merge and all that funky stuff work?

When you run git diff, git actually uses different diff algorithm to compare the state of two trees, every time.

When you do a rebase, git computes the diff for each commit of the branch before rebase, and then applies those diffs to the destination, thus "moving" the branch over to the destination, with fresh tree and commit nodes.

When you do a merge, git first searches for the common parent of both branches to be merged (this can be a bit more involved depending on your graph). It computes the diff of each branch to that original commit, and then merges both diffs in what is called a three-way merge.

The resulting commit has multiple parent fields. The parent fields don't really mean anything except for informational purposes, the tree the merge commit points to is what actually counts. Once a three-way merge has been computed and applied, git doesn't really care how the resulting tree was computed.

This is literally all there is to git, and the mental model that I use every day, even as I'm doing the most advanced git surgery.

What is a squash merge?

So what is a squash merge? A squash merge is the same as a normal merge, except that it doesn't record only parent commit. It basically slices off a whole part of the git graph, which will later be garbage collected if not referenced anymore. You're basically losing information for no reason.

Let's look at this in practice. Let's create a few commits on top of the ones we have, and then do both a squash merge and a non-squash merge, and look at the results.

> git checkout -B work-branch
Switched to a new branch 'work-branch'echo "Add more" >> foo.txt
❯ git add foo.txt && git commit -m "Add more"
[main 4b84cfe] Add more
 1 file changed, 1 insertion(+)echo "Add more" >> foo.txt                 
❯ git add foo.txt && git commit -m "And more"
[main 1836f1c] And more
 1 file changed, 1 insertion(+)
❯ git checkout -B no-squash-merge main
Switched to a new branch 'no-squash-merge'
❯ git merge --no-squash --no-ff work-branch
Merge made by the 'ort' strategy.
 foo.txt | 2 ++
 1 file changed, 2 insertions(+)
❯ git checkout -B squash-merge main
Switched to a new branch 'squash-merge'
❯ git merge --squash --ff work-branch
Updating 16660f8..1836f1c
Fast-forward
Squash commit -- not updating HEAD
 foo.txt | 2 ++
 1 file changed, 2 insertions(+)
❯ git commit
[squash-merge 150c57d] Squashed commit of the following:
 1 file changed, 2 insertions(+) 
Enter fullscreen mode Exit fullscreen mode

Let's look at the resulting graph and commits.

❯ git log --graph --pretty=oneline --abbrev-commit --all
* 150c57d (HEAD -> squash-merge) Squashed commit of the following:
| * 535b740 (no-squash-merge) Merge branch 'work-branch' into no-squash-merge
|/| 
| * 1836f1c (work-branch) And more
| * 4b84cfe Add more
|/  
* 16660f8 (main) Add more
* 02a154b Initial commit
❯ git cat-file -p no-squash-merge
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
parent 1836f1c53221ae701a038bf5ae380770ea911665
author Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400

Merge branch 'work-branch' into no-squash-merge

* work-branch:
  And more
  Add more

squash-merges-considered-harmful on  squash-merge on ☁️  ttc (us-east-1) 
❯ git cat-file -p squash-merge   
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
author Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400

Squashed commit of the following:

commit 1836f1c53221ae701a038bf5ae380770ea911665
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date:   Mon May 23 07:11:08 2022 -0400

    And more

commit 4b84cfe11aa51da994448e602e1bc4cc6083d691
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date:   Mon May 23 07:11:03 2022 -0400

    Add more

Enter fullscreen mode Exit fullscreen mode

You can see that save that both squash-merge and no-squash-merge point to the exact same tree. The only changed thing is the commit message, and the missing parent in the squash merge.

To read more about the underpinnings of git, I can recommend just experimenting with the git command line, and the following resources:

But the history!

But Manuel, you say, the history is so much cleaner!

To which I counter that it is actually not. If you want to hide the link to the right parent of the non-squash merge (as it is called, the left parent being main ), all you need to do is to hide it. If you use the command-line or a proper tool, use the option to only show first parents. If you only look at the first parent, and configure your git tool to fill in a full log history of the branch into the merge commit message (I personally use the github CLI gh or some git-commit hooks to do it), the squash merge commit is identical to the non squash merge commit.

A favorite git log command of mine to quickly look at the history of the main branch, and create a changelog:

> git log --pretty=format:'# %ad %H %s' --date=short --first-parent --reverse
# 2022-05-23 02a154bc4f0fa9bca567676d45d136619c076a95 Initial commit
# 2022-05-23 16660f8b1d1538ed1b55d8533b3ee7feb68e474c Add more
# 2022-05-23 535b740f42e331175f3766c1374116e329a78f7e Merge branch 'work-branch' into no-squash-merge
Enter fullscreen mode Exit fullscreen mode

When using github and pull requests, this will show author, branch name (which would contain ticket name and short description in my case) and date on a single line. Here's a slightly more complex real world example (anonymized)

# 2021-12-15 123 Merge pull request #5937 from garbo/TK-234/feature-1
# 2021-12-16 234 Merge pull request #5938 from bongo/TK-235/feature-2
# 2021-12-16 456 Merge pull request #5939 from gingo/TK-236/feature-3
Enter fullscreen mode Exit fullscreen mode

But why?

But Manuel, why keep all those commits lying around when we have all we need in the commit message?

One comes down to just preference. I like to see the actual log of what a person did on their branch. Did they do many small commits? On which days (this might make looking up documents or slack conversations related to the work easier)? Did they merge other branches into their work (useful when resolving merge conflicts and other boo boos)?

I have done a lot of git cleanup work, and while they are not supposed to exist, big merges with thousands of lines happen, and having a single monolithic commit that contains 80 different changes is a nightmare.

The other one actually makes the side history extremely useful. When hunting down for a bug, I often use git bisect. I first use git bisect --first-parent to jump from main commit to main commit. But once I found which pull request led to the bug, I bisect on the original branch. Instead of having to figure out which line in the pull-request merge might cause the bug, I have a much more granular path. Often, it surfaces a single line commit, and leads to a painless and immediate bugfix.

As you can drive your bisect with your unit tests, you often have no work to do at all, given sufficiently atomic and small commits on side branches. Losing that capability would seriously impact my sanity when I have to fix bugs.

Conclusion

And that is why squashing history is harmful. It's literally just deleting information from the git graph by losing a single parent entry into the merge commit.

Top comments (75)

Collapse
 
simeg profile image
Simon Egersand 🎈

I assume you are referring to the "Squash and merge" option on GitHub? If so, yes I 100% agree with you.

On the other hand, if you mean devs should not squash and rebase before pushing a PR, then I disagree.

PS. Some of the formatting in your post is off :) Around the last example with git

Collapse
 
lukas1 profile image
lukas1

It's not as bad. Provided you keep reference to the PR number in the commit message. Luckily Github includes PR number into the commit message when merging automatically and in the github history it will even create a link directly to the PR. That way one does not lose the history of the PR itself, should anyone really need it.

It worked well with one team I was involved in.

Teaching good commit practices and using git to its full potential is doable, when majority of the team is already good with it and only some developers need help, if the whole team has problems with that, it's not so easy, the option to squash merge saves a lot of time.

Also helps to get rid of nasty merge commits of merging main branch into feature branch, if github is setup so that it requires the feature branch to have latest changes from main branch (which should be required). Rebasing would be preferable, but it's not as comfortable, because it will require new approval from your team, if your protected branches rules require an approval before merge (which it should).

Collapse
 
wesen profile image
Manuel Odendahl

by squash you mean collapse all commits into a single one? because i think that's wrong :)

Collapse
 
wesen profile image
Manuel Odendahl

I do think spending some time in git rebase --interactive (or magit in my case) makes a lot of sense, however.

Thread Thread
 
simeg profile image
Simon Egersand 🎈

Yeah, totally agree!

Collapse
 
simeg profile image
Simon Egersand 🎈

No, that's not what I mean. I was confused if that was what you meant. I guess we're on same page :D

Collapse
 
memark profile image
Magnus Markling • Edited

If squashing means loosing too much information, then your PRs are probably too big to begin with. Imho it's a code (or process) smell that should be brought to attention asap.

As for looking at what the developer did in their branch, I tend to think the PR should speak for itself. How we got there is not important. Unless you're also prepared to spend a lot of time cleaning up your branches before sending PRs. (Time you could possibly spend making many small PRs instead.)

Nice trick for the CLI tools with "first parent"! I was not aware it even existed. Unfortunately it's not available in most graphical tools that I'm aware of, so those users will be stuck with the "ugly" history.

Collapse
 
jackmellis profile image
Jack

I used to insist devs squashed/rebased/etc. their commits before opening a PR and then use rebase-merge to merge the PR into main.
Over time I've learned the value of a squash merge. If a PR is too big to be able to describe in one commit message, or too complicated to understand from looking at the diff, then you're doing too much in one go.
Squash merging PRs is absolutely fine if your branch has nothing but work-in-progress commits. If you feel like you're losing something by squashing, then you need to rethink your process...

Collapse
 
wesen profile image
Manuel Odendahl

So you are saying to only do pull request that have the size of a single commit?

Thread Thread
 
jackmellis profile image
Jack

As a (very) general rule, yes. You should be able to understand a change based on a single commit message, yes.
Obviously sometimes you have a big feature that can't be released piece-by-piece. In that case I would have a feature branch, and then individual branches off that. You PR (with squash commits) each smaller piece of work into the feature branch, and then at the end merge (not squash) the feature branch in. You have a history of all the pieces of work done, but not all the useless wip commits that don't actually tell any kind of story...

Thread Thread
 
wesen profile image
Manuel Odendahl

That makes sense. I see a pattern emerging here. I think that usually, when we "argue", we often are actually solving the same problem, often in the same way, but with different words.

I operate under the premise that your branch history is meaningful, and has relevant commits. If you do a ton of WIP commits, I would question why you would do a "WIP" commit in the first place, because squash merge or not, you are robbing yourself of helpful history while developing your feature already. I also heavily use interface staging (staging individual hunks), both for "pseudo review", and to split up my work in proper chunks, with git commit hooks validating at every step of the way that my tests run. If I still manage to make a mess (say, I'm tired, or in a rush, or just frustrated), I will often spend the time to go back with interactive rebase and eliminate the junk either with squash / revert+squash or plain delete, more rarely split up bigger commits into smaller ones.

What you are describe in your workflow above to me is basically what I am achieving by keeping side branch history. I would say, as someone who often had to merge dirty crap branches, I do like to keep the WIP commits anyway, because they give me an insight into what someone was trying to do, what their cognitive style is, what they were struggling with, to be able to assist them better.

But let's let things speak. I recently merged a "big" commit, setting out to build a feature that led me to start introducing typescript annotations. We are fairly fast moving 2 dev team and reasonably trust each other, so the other dev was fine with keeping both the typescript introduction and the actual feature in the same PR. Here's my history in this case:

# 2022-05-17 597679dd482ca990cffb5fe73bbd91108163d4cc :art: Psalm fixes for Sql and OrdersSplits in tadmin
# 2022-05-17 9f67f64c17d0f4fb9e7f9f86d04a1c56770b5ad8 :art: Start adding some API typescript to tadmin
# 2022-05-17 51dedf0360364369bd873d65476185fab8e4e97c :sparkles: :zap: Faster items summary query (still not instant)
# 2022-05-17 a776a961952511ea756a999949d8a5e49fbcc37c :zap: Make it even a bit faster. Computing links is slow.
# 2022-05-17 025ef30f90e429cf4cc3d882c8b56b4dca9cd7f6 :zap: Make productQuantities computation even faster by getting managestock/isVirtual up front
# 2022-05-18 dc4d4b4584f954fbc8c3bb8633afd7c74e45b37c :art: Fix intellij code style at least roughly
# 2022-05-18 c2f5e98983121fcd7c8e42f944f5eba62de2ca69 :art: Use transients for image and permalink in getOrdersSummary
# 2022-05-18 9bba2793b0b8eaa503233948298c8d0f5a4a832b :art: Introduce RowType for Table/useTable, split out into files
# 2022-05-18 96eb7fd43777a2f74d3fe333e31d6fdd4b8254b8 :art: Fighting some odd import weirdnesses, gonna stop tweaking for now
# 2022-05-18 143fde48f4783296123a308578c1ce320ea9c575 :art: Start adding type to ManageOrders useTable
# 2022-05-19 1782ef761a5cc82e884ff7fbec33654f8b19b805 :tractor: Move api types to shared code
# 2022-05-19 dd3bdbadc69c0f5932d24fb09c467415d9b1289e :sparkles: Add more typing to useTableApi
# 2022-05-19 c66c30129283b995420f1f69a4113d0f19a53b4d :art: :tractor: Split out the different types of summary bars
# 2022-05-19 93f3ee27d111ebe6f1a114620e3e722db1f85ecf :ambulance: Fix proper backend query, remove permalink from sideview
# 2022-05-20 37578d82a18da86b42fe2932a72c52dfff3f8604 :art: First attempt at latching on to triggerSearch
# 2022-05-20 d57be0a0e88c8af0a8c9f2a7c2dabfcecfaeaf7a :art: Remove error_log
# 2022-05-20 63a77c1515e8a866dcc3c4d9c1322a2cd3b43035 :art: Remove error_log
# 2022-05-20 5420e6a64499cb812fd44566205cf8abebebfdd9 :sparkles: Proper orders summary debouncing
# 2022-05-20 883ecc39b38ed34ce47ead9036f993a263050bf3 :art: Cleanup dependency handling to avoid expensive backend calls
# 2022-05-20 d24292be1df37cba702c54f982f1630cb2f3a42a :ambulance: Fix useEffect eslint check
# 2022-05-20 cf70860c5f46e2725c0139e33ddcada656ff3112 :art: Fix prettier changes
# 2022-05-20 e067d0ddbf508501970ad0578fdd9e60b37a68e3 :ambulance: Fix type annotations
# 2022-05-20 54c70047ad5e3f6858799c6faaefe027d112b330 :sparkles: Measure and return performance measures
# 2022-05-20 0731046f579e441c5ed96292199e82df0c4d1c1a :poop: Bunch of performance measurement and debug logging
# 2022-05-20 37ae6f0afccf868d829103e45acfee2f069df744 :ambulance: Fix eslint warnings
# 2022-05-20 da06324156213dc6524adc53fc06708e3b1e5073 :ambulance: Fix PHP initialization of start_time
# 2022-05-20 858db0142afc2c21ee90b3a08fb557938877bc33 :art: Fix loading indicator
# 2022-05-23 7731a0a694b617585c51325442048491beea8998 :sparkles: :lipstick: Add checkbox to enable getting all orders summary
# 2022-05-23 2e89e73f96a772de978cd8228b44a34532794904 :art: Undo unnecessary stuff
# 2022-05-23 22c850563b8937951d1b3a46d19d6857222f14be :art: Use a single php-cs-fixer config
# 2022-05-23 8aa2b0ed201a55b196fb169e8be921d30ce9aa0e :zap: :art: Cache thumbnails for 24h
# 2022-05-23 cbd6fce9f072bb091197cc2fcec9063c7207d5db :pencil: Slight whitespace adjusts
# 2022-05-23 e74df0d3a02679f2c98b2f24c0245245e5956be6 :ambulance: Fix php-cs-fixer
# 2022-05-23 bd1e34d466495d86c1f53510be0936df909445f8 :art: Make linkbutton clickable, fix markup
# 2022-05-23 fafaeb6644f61e2b6ed1df475a2b6266e2eaf425 :art: Remove logging entries
Enter fullscreen mode Exit fullscreen mode

Those are all valuable commits to me that I would like to keep for the long run. Maybe I'll figure out that the reason a certain DB query doesn't work anymore is not say, the API change, but actually the "Fix php-cs-fixer" commit. Of course I could make a separate PR for "fix php-cs-fixer", and then again for "Make linkbutton clickable", and then again for "Remove logging entries", but then we end up where we started, except with a lot more PRs and CI runs.

Thread Thread
 
wesen profile image
Manuel Odendahl • Edited

You'll note I use gitmoji, which I also find very useful, as I can at a glance recognize what the reason is behind commits. To show the graphical view: foo

Collapse
 
wesen profile image
Manuel Odendahl • Edited

I had a long conversation about that with other developers, it was very interesting, and I plan to write about "big evil merges" in the future.

Situations where big branch merges might happen (for valid reasons, imo):

  • merging relatively independent projects (in the context of a monorepo, for example)
  • wide "rip off the bandaid" refactor (especially type-system / compiler driven refactors)
  • having to merge shitty code from someone who left / from external contributors over whom you only have so much control
  • slow PR / merge cycles (can have many reasons: reviewers are scarce, QA is a bottleneck)
  • overall politics: management thinks PRs are a waste of time, crunch time

In general, I'm not a fan of "in a perfect world you wouldn't need more information" arguments.

In my experience, even small clean PRs can benefit from having a granular history, say when git blaming something 3 years down the road.

As for UI tools, I use magit / sourcetree / intellij's history browser, I'm sorry if other tools don't support it :/

I wish all tools supported --first-parent, because the (valid, because tool friction matters) reason is "my tool doesn't know how to display the information i want, thus i have to lose context for it", arguing that "merges make the history sloppy" is just a cop-out, it's just not true. I think one reason for that is that many developers don't know how git internally works, and thus have a warped understanding of what the history is. Git's CLI tooling really doesn't help here.

Collapse
 
wparad profile image
Warren Parad • Edited

100% agreed. A "clean history" does not mean that every PR should have the commit history of the work process wiped out.

It's also really harmful to code reviews, where seeing the differences between revisions of the pull request is impossible with GIT, and near impossible with most git servers. GitLab supports this FWIW, I still don't think GitHub does.

There are lots of things that benefit from having separate commits, like a rename and then a diff, or a refactor inside of a PR. PLEASE DO NOT MERGE A REFACTOR WITH A FUNCTIONAL CHANGE. I don't want to see it in the same PR, let alone in the same commit.

No one that says you should squash, as ever had to a do a difficult task regarding git history. Had to find a bug with git bisect, sub directory migrations, find when a critical feature changed, or refactor gone wrong, you would know. I'll die on that hill.

That isn't to say you shouldn't clean up your commits before your open your PR. The only thing I don't want to see is "updates from pull request review", but anyone bringing to the table a conversation about squashing having to do with commit messages, is barking up the wrong tree. Fix your commit messages, squashing isn't a solution to that problem, it's a patch.

Collapse
 
wesen profile image
Manuel Odendahl

This is a good point.

Which makes me think, how can you give someone who hasn't experienced larger git pains the context in which some of those decisions are made? Or in general, workflow/code hygiene steps that might seem like red tape until you've experienced some of the nightmares that can ensure.

It's a paradoxical kind of thing, because if these systems are in place, by definition you will not encounter the reason why these systems were put in place (similar to any kind of preventative measure that works). I ran into really heated discussions where my point of view was basically "it caused me much pain in the past, trust me, we should do X", which is not a great argument.

Collapse
 
wparad profile image
Warren Parad

Honestly it goes both ways. If they are allowed to say "I like it better this way", then you are allowed to say "have my wisdom rather than learning through the experience of failure". "It looks prettier" is less of an argument. This is where:

Here's what we are going to do, this time we are going to do it my way, until we find a concrete problem that affects the business. It's one of the few times a year I'm going to pull the experience veto, but it's my job to do that.
If we learn that this was a mistake, it's a great story for your next promotion, or interview somewhere else when they ask you about a time that you disagreed with a solution but still went with it.
There will be other times that we disagree and we'll go with your solution, and take the same approach. If you think this happens where you are unfairly being treated, let's keep track of these, and we'll evaluate. We won't always agree and having a solution for the decision making in those situations is critical.
If you say that it has to be your way this time, then it's easy to infer that it always has to be your way, and that's something that prevents us from effectively together.

I use that in those circumstances, and it has yet to come back to me. In most cases, years later I have these same engineers coming and telling me "OMG remember this problem, I was totally wrong, and I tried to convey that same point to others, but they also didn't get it" Followed up by: "What am I supposed to do with these 'seniors'?"

Collapse
 
manuartero profile image
Manuel Artero Anguita 🟨

Hi Manuel, I'm Manuel.

you got me with this:

The thing is that my opinion is the correct one

Above all considering that my opinion is the correct one. Kidding.

Nice post, I just disagree, squashing is fine. But I can see your points. In the end it's a tool and sometimes will be handy sometimes won't.

Collapse
 
wesen profile image
Manuel Odendahl

Do you use squashing because you want to have a "clean" history per default? Or do you have other reasons?

Collapse
 
manuartero profile image
Manuel Artero Anguita 🟨

IMO too much information leads to disinformation. Checking the actual "WIP" commits from a feature branch is a "thin grain info" I've never-ever required.

Cleaner history is , yep, the main reason.

But actually I've faced another issue in the past; there was this repo 15+ years old in my company, with hundreds of committers through the ages, and commits in the order of n * 100.000. Dealing with this repo was a challenge actually! too much useless info at .git/ folder. What I'm trying to say is that "thin grain" info do weigh. Of course you need to reach those numbers.

Thread Thread
 
wesen profile image
Manuel Odendahl

A lot of people bring up "WIP" commits. Do you often do WIP commits? I personally rarely do (I do get frustrated and use the 💩 emoji, as I use gitmoji, but still make meaningful commits). But the point of the article was that you can easily hide all that information and focus on what you need.

As for historical gits, I wonder if people here ever did a git "cleanup" where most of the ancient state gets culled, and just the last few years are kept. Cruft does indeed accumulate.

Thread Thread
 
ludamillion profile image
Luke Inglis

Just adding a little perspective, I don't think I've ever worked on a team with anyone who didn't use WIP commits. I work on a small team and there is a lot of context switching that needs to happen and 'finishing' a commit before switching to something else just isn't an option.

Thread Thread
 
wesen profile image
Manuel Odendahl

Interesting. I have the opposite experience. I use git stash in those cases, or do you git rebase --interactive to clean things up later.

Thread Thread
 
lukens profile image
lukens

I don't like the idea of most of the ancient state being culled.

The codebase at my current job has a cutoff from when it was moved to git, and there's even less hope of finding out why something was done for code that predates that than there is for the rest of the codebase.

Maybe I've always worked at the wrong places, but I've never been in a place where I wish there were fewer commits in the history, but a lot of the time I have wished there were more commits (often when trying to review code), so that I had a finer grain insight into why a particular line of code was written, and what else was changed for the same purpose.

I'm 100% with you that losing this information is nothing but a bad thing. I find that even the worst git commits tend to provide the best and most accurate and up-to date documentation of the code; it amazes me that so many people choose to throw that away!

Collapse
 
frankweindel profile image
Frank Weindel

What no one is mentioning is that the squash feature on GitHub PRs preserves the original commits that were squashed. If you REALLY need to go back and examine the granular history of a PR than you can still do so. On teams I've been on, we strive for PRs that are not over scoped and where that squashed message tells you exactly what feature / fix was added by those line changes. We also often use Conventional Commits, which help in a big way with release note automation. When I look at a main branch history like this:

fix: No long errors when pressing enter (#29) [tag: v1.1.0]
feat(ModuleA): Add metrics logging (#27)
feat(ModuleB): Support extra query string params in routes (#20)
chore: Update node to 16.13.1 (#24)
test: Require at least one assertion with every test (#21) [tag: v1.0.0]
Enter fullscreen mode Exit fullscreen mode

I see VERY clearly what features and fixes have gone in between versions 1.0.0 and 1.1.0. I also have easy links to the PRs that were squashed to produce those commits if I need to drill down any further. If a feature needs to be reverted it's a very easy reversion (no extra parameters).

If you end up with a PR that has a very large scope, there are two things one can/should do:

  • Split the PR in to multiple easier to review dependent PRs (preferred)
  • Carefully massage and curate the commits in a PR and use the Rebase Merge option in the PR
    • Making sure the PR number gets appended to each commit's title for backtracing.

I'll agree that just having a "linear commit history" shouldn't be the only reason for doing squash commits. But if it simplifies your team's workflow, reduces cognitive load, and makes understanding exactly what is included in a release easier to find then I say it's worth doing.

That said, I feel there is a "best of both worlds" place we could get to if we strived for it.

First, a merge commit is really a squash but with an extra parent link to a branch where the squash originated. I wish this was driven home more however the standard commit title for these PR merge commits is always something like these:

Merge pull request #43 from foo/fix-bar
Merge pull request #42 from foo/add-bar
Enter fullscreen mode Exit fullscreen mode

Those titles don't help me. It has the PR number, which I have to individually click on and look up, and a branch name, which could easily be too brief, poorly written or even irrelevant.

Nothing in Git, from what I understand, prevents those titles being similar to the squash titles I shared above. So if GitHub produced them you'd suddenly have the clarity you have with squash merges.

Second, the Git CLI commands and various Git UIs default to showing/working with the full branch out history of everything. You need to know special parameters or set certain settings in order to see and work with a simplified linear view. If these tools defaulted to a linear history and required special parameters in order to drill down into merged branches I feel that would improve the developer experience a bunch. You get an easy to understand summarized linear history and the ability to go deeper when you need to.

Of course, outside of the GUIs maybe, any changes in how people work with Git are very hard pushes from what I understand.

Third, GitHub could allow the PR author to set the merging strategy to be used in advance. Since each developer may have their own style, some with very intentful and effortful PR commits like your own @wesen, others who commit WIP things quickly and often, and some with a mix. This gives the author the ability to decide how they will ultimately formulate their PR. Obviously certain projects can still limit what PR merge strategies are available, and admins could still override the author's preset wishes. But the author at least has a chance to influence how the commits in the PR will be laid into the base branch.

But just to wrap up my argument, I think like any other tool squashing vs merging vs rebasing are options that teams can consider and make a decision on using given whatever their needs and circumstances are. There is no one size fits all approach to it.

Collapse
 
lyndonhughey profile image
Lyndon Hughey

Yep. Not squashing your commit is like a submitting your rough draft for publishing to an editor. It's unnecessary verbosity.

Thread Thread
 
wesen profile image
Manuel Odendahl

I'm not sure I understand. If you don't want to see the individual commits, you don't have to look at them?

Collapse
 
destynova profile image
Oisín • Edited

In the past I've argued against squashing commits from the POV of making bisects easier later. That said, if your bisect is broken by intermediate buggy commits, that could throw you off too.
Probably my main reason to dislike squashing commits is that I like to review bigger PRs by going commit by commit, so I can better grasp the author's original intention and thought process as they evolved it.
I don't buy the idea of "tidying up" previous commits and I'm not sure who would really benefit from that. Seeing misconceptions and things that doesn't work out, and what you did to arrive at the current solution, that's valuable for my understanding.

Collapse
 
lukens profile image
lukens

I'm so glad that someone mentioned this, because I was thinking the exact same, and didn't understand this bizarro upside-down world, where people seem to think pull requests were easier to review if all the commits were squashed beforehand.

Yes, it's good to do a bit of tidy up with an interactive rebase first, if you have too many WIP commits, or "oops, missed this bit" commits, but I'd generally prefer more rather than fewer commits.

Collapse
 
wesen profile image
Manuel Odendahl

Makes total sense. I "tidy" up commits because I often do partial commits (staging individual hunks) in the moment, and then I go back to make sure everything builds properly at intermediate steps. I agree that bisecting on non-building side branches is a serious pain. There's ways to address things, but nonetheless, not the best experience.

Collapse
 
ingosteinke profile image
Ingo Steinke

Tell that to GitHub, they seem to have made squash commits the new default. Not possible to make merge commits in their web UI anymore, and that sucks!

Collapse
 
wesen profile image
Manuel Odendahl

I think you can with an option?

Collapse
 
ingosteinke profile image
Ingo Steinke

It used to be possible, but in practice, it is always grayed out, and this does not seem to result of a conscious decision by the project maintainers.
screenshot of GitHub merge options

  • Create a merge commit: Not enabled for this repository
  • Squash and merge
  • Rebase and merge: Not enabled for this repository
Thread Thread
 
wesen profile image
Manuel Odendahl

I think it has to be enabled in the repo settings. But now that git bisect and git blame all support --first-parent, I really don't see the point anymore. Maybe save some space because the intermediary blobs are garbage collected, but that kind of only makes sense on big public repositories like linux. And even then, people maintain different repositories that still keep the individual history.

Thread Thread
 
lukens profile image
lukens

My understanding is that GitHub also has some kind of hidden tag on the PR branch, so you can still view it on GitHub after it is squashed, so it presumably doesn't save any space for them.

 
wesen profile image
Manuel Odendahl

How do you mean you can't not look at it? In which context? I usually use git log --first-parent when doing release work, and only look at side histories when I need to get in there.

Collapse
 
simeg profile image
Simon Egersand 🎈

Yeah, me too. This is one of (if not the most) important practices you should do to make the PR review process quick. A quick PR review process in turn speeds up the shipping of code => business moves faster => greater chances of success for the company.

Always make your commits nice before asking for a PR review!

Thread Thread
 
wesen profile image
Manuel Odendahl

do you look at individual commits when doing a review? because the diff view shown is just the comparison of the trees, the history itself is irrelevant.

Thread Thread
 
simeg profile image
Simon Egersand 🎈

I do look at individual commits of the PR, yeah. Sometimes it makes sense to split up a task into multiple commits, or include refactor work, and that work should not be combined IMO.

Thread Thread
 
wesen profile image
Manuel Odendahl

I agree. I'm not the greatest at this (often solo dev on things), but it's a good skill to know.

Collapse
 
dadyasasha profile image
Alex Pushkarev

I really liked the way you explained the way squash is working. But it seems that the main argument against squashing is that it just drops the history?

That's fair, but it doesn't mean squash is bad, it means one just need to know the cost, correct?

Collapse
 
wesen profile image
Manuel Odendahl

Yes. But most people argue that it "cleans up" history, because they are unaware that you can easily hide the right parent when printing out logs, for example. I find losing history a very high cost to avoid using a pretty-print flag.

Collapse
 
dadyasasha profile image
Alex Pushkarev

That's a very good argument. The other perspective to consider is that more and more people don't use git from command line so they see whatever they git tool shows, which may be unable to do pretty-print in a firts place

Collapse
 
kevingranade profile image
Kevin Granade

As with all blanket statements, you're wrong :)

I do agree with a caveat, which is that as your contributor maturity increases, merges becone the dominant option.

IF a PR is composed of well-formed, meaningful commits, you should probably merge.

If a PR is composed of point-in-time or fix-fix-fix commits, and/or the individual commits don't build and pass tests, then you should squash them.

The point you're missing is that individual commits within a branch might be data, or they might be noise.

Collapse
 
wesen profile image
Manuel Odendahl

Agree, I ask people to put some effort into their own git histories. It not only provides better information for the rest of the team, but it also helps structure your own development workflow, and help you debug / bisect your own stuff.

That said, I like having a bunch of fix fix wtf wip wip commits still, because it still provides insight into what people struggled with, what their thinking process was, and gives a starting point when looking for some hairy stuff.

Collapse
 
christiankozalla profile image
Christian Kozalla

I'm using GitLab at work and we are using the Squash-and-Merge option of GitLab.. I don't know how that compares to the way GitHub is doing it. But I also don't know how Squash-and-Merge compares to manually squashing, pushing and then opening a PR

We, as a team, are using Squash-and-Merge because one single feature will be mapped to one single commit. But I suppose the same is possible with unsquashed merge commits..

Collapse
 
wesen profile image
Manuel Odendahl

Only one way to know! look at the graph, and use git cat-file to look into the internals.

Collapse
 
matthewpersico profile image
Matthew O. Persico

A number of commentors have made the distinction of squashing before the PR is submitted and after it is submitted. I contend it's an irrelevant distinction.

If you are in the squash-before-but-not-after crowd, I counter that once a PR starts being reviewed, and updated and re-reviewed, you're going to get a whole list of commits that you'll events up wanting to squash anyway.

My criteria for squashing is this: for each particular commit, if you cannot roll back that commit and have a working functional system, then there is no point in having that commits in your history; squash it out.

Now, if you think there is value in the various conversations surrounding those commits, then keep them around, off the main branch like this:

  • make a copy of the branch the PR is sitting on (please tell me you're not modifying your main branch directly...), naming the copy it archive/branchname
  • squash branchbname
  • merge the PR, putting a reference to archive/branchname in the PR's comments.
Collapse
 
arthurolga profile image
arthurolga

I don't like Squash if you are working with good commits they should be few per PR, if you have something around 10 commits on a feature, it probably should be separated into smaller tasks.

If the developer wants to aggregate two or more commits, like if they refactored some part of the code, BEFORE MAKING THE PR, then I totally like Squash.

If you have small commits with good names, it probably is better to just Merge than Squash and Merge, e.g.

Commit 1: Add DatePicker component to App
Commit 2: Make API call for Date Service on UserScreen
Commit 3: Make API call for Date Service on PostScreen

If something is breaking, it will probably be easier to see in which commit, also allows for better cherry picking.