DEV Community

Cover image for ⛔ Squash commits considered harmful ⛔
Manuel Odendahl
Manuel Odendahl

Posted on

⛔ Squash commits considered harmful ⛔

A recurring conversation in developer circles is if you should use git --squash when merging or do explicit merge commits. The short answer: you shouldn't.

People have strong opinions about this. The thing is that my opinion is the correct one. Squashing commits has no purpose other than losing information. It doesn't make for a cleaner history. At most it helps subpar git clients show a cleaner commit graph, and save a bit of space by not storing intermediate file states.

Let me show you why.

Git tracks contents, not diffs

In many ways you can just see git as a filesystem.
– Linus (in 'Re: more git updates..' - MARC)

Git is in many ways a very dumb graph database. When you check in code, it actually stores the content of all the tracked files in your repository.

The content of each file is stored as a "blob" node in the database. The filenames are stored separately in a "tree" node: If you rename a file, no new content node will be created. Only a new tree node will be created.

Commits are store as "commit" nodes. A commit object points to a tree, and adds metadata: author, committer, message and parent commits. A merge commit has multiple parents.

Here is a visualization from Scott Chacon's Git Internals:

Image description

Looking at a real git repository

Enough theory, we have work to get done. Let's create a simple git repository:

> mkdir squash-merges-considered-harmful
> cd squash-merges-considered-harmful 
> git init
> echo hello > foo.txt
> git add foo.txt
> git commit -m "Initial commit"
[main (root-commit) 02a154b] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 foo.txt
> echo more >> foo.txt
> git add foo.txt
> git commit -m "Add more" 
[main 16660f8] Add more
 1 file changed, 1 insertion(+)
Enter fullscreen mode Exit fullscreen mode

We can now look at the contents of the objects we created:

# initial commit
❯ git cat-file -p 02a154b
tree f269b7cd59094d5365ef6b5618098cbcbeee0c43
author Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400

Initial commit
# initial tree
❯ git cat-file -p f269b7cd59094d5365ef6b5618098cbcbeee0c43
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    foo.txt
# initial foo.txt
❯ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello

# second commit
❯ git cat-file -p 16660f8
tree 5a0c4a660a13c0ada7611651399abb362756f83e
parent 02a154bc4f0fa9bca567676d45d136619c076a95
author Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400

Add more
# second tree
❯ git cat-file -p 5a0c4a660a13c0ada7611651399abb362756f83e
100644 blob 2227cddb7f6318ea735a1c4adb52f5cd36c5783c    foo.txt
❯ git cat-file -p 2227cddb7f6318ea735a1c4adb52f5cd36c5783c
hello
more

Enter fullscreen mode Exit fullscreen mode

Branches, tags (and branches, tags on remote repositories) are just pointers to commit nodes.

cat .git/refs/heads/main         
16660f8b1d1538ed1b55d8533b3ee7feb68e474c
Enter fullscreen mode Exit fullscreen mode

But we still use diffs and merges

But Manuel, you ask, how does git diff and git merge and all that funky stuff work?

When you run git diff, git actually uses different diff algorithm to compare the state of two trees, every time.

When you do a rebase, git computes the diff for each commit of the branch before rebase, and then applies those diffs to the destination, thus "moving" the branch over to the destination, with fresh tree and commit nodes.

When you do a merge, git first searches for the common parent of both branches to be merged (this can be a bit more involved depending on your graph). It computes the diff of each branch to that original commit, and then merges both diffs in what is called a three-way merge.

The resulting commit has multiple parent fields. The parent fields don't really mean anything except for informational purposes, the tree the merge commit points to is what actually counts. Once a three-way merge has been computed and applied, git doesn't really care how the resulting tree was computed.

This is literally all there is to git, and the mental model that I use every day, even as I'm doing the most advanced git surgery.

What is a squash merge?

So what is a squash merge? A squash merge is the same as a normal merge, except that it doesn't record only parent commit. It basically slices off a whole part of the git graph, which will later be garbage collected if not referenced anymore. You're basically losing information for no reason.

Let's look at this in practice. Let's create a few commits on top of the ones we have, and then do both a squash merge and a non-squash merge, and look at the results.

> git checkout -B work-branch
Switched to a new branch 'work-branch'echo "Add more" >> foo.txt
❯ git add foo.txt && git commit -m "Add more"
[main 4b84cfe] Add more
 1 file changed, 1 insertion(+)echo "Add more" >> foo.txt                 
❯ git add foo.txt && git commit -m "And more"
[main 1836f1c] And more
 1 file changed, 1 insertion(+)
❯ git checkout -B no-squash-merge main
Switched to a new branch 'no-squash-merge'
❯ git merge --no-squash --no-ff work-branch
Merge made by the 'ort' strategy.
 foo.txt | 2 ++
 1 file changed, 2 insertions(+)
❯ git checkout -B squash-merge main
Switched to a new branch 'squash-merge'
❯ git merge --squash --ff work-branch
Updating 16660f8..1836f1c
Fast-forward
Squash commit -- not updating HEAD
 foo.txt | 2 ++
 1 file changed, 2 insertions(+)
❯ git commit
[squash-merge 150c57d] Squashed commit of the following:
 1 file changed, 2 insertions(+) 
Enter fullscreen mode Exit fullscreen mode

Let's look at the resulting graph and commits.

❯ git log --graph --pretty=oneline --abbrev-commit --all
* 150c57d (HEAD -> squash-merge) Squashed commit of the following:
| * 535b740 (no-squash-merge) Merge branch 'work-branch' into no-squash-merge
|/| 
| * 1836f1c (work-branch) And more
| * 4b84cfe Add more
|/  
* 16660f8 (main) Add more
* 02a154b Initial commit
❯ git cat-file -p no-squash-merge
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
parent 1836f1c53221ae701a038bf5ae380770ea911665
author Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400

Merge branch 'work-branch' into no-squash-merge

* work-branch:
  And more
  Add more

squash-merges-considered-harmful on  squash-merge on ☁️  ttc (us-east-1) 
❯ git cat-file -p squash-merge   
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
author Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400

Squashed commit of the following:

commit 1836f1c53221ae701a038bf5ae380770ea911665
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date:   Mon May 23 07:11:08 2022 -0400

    And more

commit 4b84cfe11aa51da994448e602e1bc4cc6083d691
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date:   Mon May 23 07:11:03 2022 -0400

    Add more

Enter fullscreen mode Exit fullscreen mode

You can see that save that both squash-merge and no-squash-merge point to the exact same tree. The only changed thing is the commit message, and the missing parent in the squash merge.

To read more about the underpinnings of git, I can recommend just experimenting with the git command line, and the following resources:

But the history!

But Manuel, you say, the history is so much cleaner!

To which I counter that it is actually not. If you want to hide the link to the right parent of the non-squash merge (as it is called, the left parent being main ), all you need to do is to hide it. If you use the command-line or a proper tool, use the option to only show first parents. If you only look at the first parent, and configure your git tool to fill in a full log history of the branch into the merge commit message (I personally use the github CLI gh or some git-commit hooks to do it), the squash merge commit is identical to the non squash merge commit.

A favorite git log command of mine to quickly look at the history of the main branch, and create a changelog:

> git log --pretty=format:'# %ad %H %s' --date=short --first-parent --reverse
# 2022-05-23 02a154bc4f0fa9bca567676d45d136619c076a95 Initial commit
# 2022-05-23 16660f8b1d1538ed1b55d8533b3ee7feb68e474c Add more
# 2022-05-23 535b740f42e331175f3766c1374116e329a78f7e Merge branch 'work-branch' into no-squash-merge
Enter fullscreen mode Exit fullscreen mode

When using github and pull requests, this will show author, branch name (which would contain ticket name and short description in my case) and date on a single line. Here's a slightly more complex real world example (anonymized)

# 2021-12-15 123 Merge pull request #5937 from garbo/TK-234/feature-1
# 2021-12-16 234 Merge pull request #5938 from bongo/TK-235/feature-2
# 2021-12-16 456 Merge pull request #5939 from gingo/TK-236/feature-3
Enter fullscreen mode Exit fullscreen mode

But why?

But Manuel, why keep all those commits lying around when we have all we need in the commit message?

One comes down to just preference. I like to see the actual log of what a person did on their branch. Did they do many small commits? On which days (this might make looking up documents or slack conversations related to the work easier)? Did they merge other branches into their work (useful when resolving merge conflicts and other boo boos)?

I have done a lot of git cleanup work, and while they are not supposed to exist, big merges with thousands of lines happen, and having a single monolithic commit that contains 80 different changes is a nightmare.

The other one actually makes the side history extremely useful. When hunting down for a bug, I often use git bisect. I first use git bisect --first-parent to jump from main commit to main commit. But once I found which pull request led to the bug, I bisect on the original branch. Instead of having to figure out which line in the pull-request merge might cause the bug, I have a much more granular path. Often, it surfaces a single line commit, and leads to a painless and immediate bugfix.

As you can drive your bisect with your unit tests, you often have no work to do at all, given sufficiently atomic and small commits on side branches. Losing that capability would seriously impact my sanity when I have to fix bugs.

Conclusion

And that is why squashing history is harmful. It's literally just deleting information from the git graph by losing a single parent entry into the merge commit.

Discussion (71)

Collapse
simeg profile image
Simon Egersand 🎈

I assume you are referring to the "Squash and merge" option on GitHub? If so, yes I 100% agree with you.

On the other hand, if you mean devs should not squash and rebase before pushing a PR, then I disagree.

PS. Some of the formatting in your post is off :) Around the last example with git

Collapse
natescode profile image
Nathan Hedglin

This ^ . They're two different things. I always squash my commits before pushing a PR and tell my devs to do the same.

Collapse
lyndonhughey profile image
Lyndon Hughey

Yep. Not squashing your commit is like a submitting your rough draft for publishing to an editor. It's unnecessary verbosity.

Thread Thread
wesen profile image
Manuel Odendahl Author

I'm not sure I understand. If you don't want to see the individual commits, you don't have to look at them?

Thread Thread
natescode profile image
Nathan Hedglin

you like looking and scrolling through 30 commits PER developer instead of one commit per feature? It is more to read. You can't NOT look at it.

Thread Thread
wesen profile image
Manuel Odendahl Author

How do you mean you can't not look at it? In which context? I usually use git log --first-parent when doing release work, and only look at side histories when I need to get in there.

Thread Thread
natescode profile image
Nathan Hedglin

oh nice! I've never used that flag before. It seems like it works well as long as you never fast-forward, and you're viewing commandline. Ugly history in online viewing tools or other GIT clients.

Found a good blog on the pros and cons. davidchudzicki.com/posts/first-par...

Collapse
simeg profile image
Simon Egersand 🎈

Yeah, me too. This is one of (if not the most) important practices you should do to make the PR review process quick. A quick PR review process in turn speeds up the shipping of code => business moves faster => greater chances of success for the company.

Always make your commits nice before asking for a PR review!

Thread Thread
wesen profile image
Manuel Odendahl Author

do you look at individual commits when doing a review? because the diff view shown is just the comparison of the trees, the history itself is irrelevant.

Thread Thread
simeg profile image
Simon Egersand 🎈

I do look at individual commits of the PR, yeah. Sometimes it makes sense to split up a task into multiple commits, or include refactor work, and that work should not be combined IMO.

Thread Thread
wesen profile image
Manuel Odendahl Author

I agree. I'm not the greatest at this (often solo dev on things), but it's a good skill to know.

Collapse
csgeek profile image
csgeek

Oh good. I was going to say.. half of my commits are 'saving crap', 'working.. I think'. Not sure if what values those have.

Merging branches is different.

Thread Thread
natescode profile image
Nathan Hedglin

Lol oh you too?! Good stuff.

Collapse
wesen profile image
Manuel Odendahl Author

by squash you mean collapse all commits into a single one? because i think that's wrong :)

Collapse
wesen profile image
Manuel Odendahl Author

I do think spending some time in git rebase --interactive (or magit in my case) makes a lot of sense, however.

Thread Thread
simeg profile image
Simon Egersand 🎈

Yeah, totally agree!

Collapse
simeg profile image
Simon Egersand 🎈

No, that's not what I mean. I was confused if that was what you meant. I guess we're on same page :D

Collapse
lukas1 profile image
lukas1

It's not as bad. Provided you keep reference to the PR number in the commit message. Luckily Github includes PR number into the commit message when merging automatically and in the github history it will even create a link directly to the PR. That way one does not lose the history of the PR itself, should anyone really need it.

It worked well with one team I was involved in.

Teaching good commit practices and using git to its full potential is doable, when majority of the team is already good with it and only some developers need help, if the whole team has problems with that, it's not so easy, the option to squash merge saves a lot of time.

Also helps to get rid of nasty merge commits of merging main branch into feature branch, if github is setup so that it requires the feature branch to have latest changes from main branch (which should be required). Rebasing would be preferable, but it's not as comfortable, because it will require new approval from your team, if your protected branches rules require an approval before merge (which it should).

Collapse
memark profile image
Magnus Markling • Edited on

If squashing means loosing too much information, then your PRs are probably too big to begin with. Imho it's a code (or process) smell that should be brought to attention asap.

As for looking at what the developer did in their branch, I tend to think the PR should speak for itself. How we got there is not important. Unless you're also prepared to spend a lot of time cleaning up your branches before sending PRs. (Time you could possibly spend making many small PRs instead.)

Nice trick for the CLI tools with "first parent"! I was not aware it even existed. Unfortunately it's not available in most graphical tools that I'm aware of, so those users will be stuck with the "ugly" history.

Collapse
wesen profile image
Manuel Odendahl Author • Edited on

I had a long conversation about that with other developers, it was very interesting, and I plan to write about "big evil merges" in the future.

Situations where big branch merges might happen (for valid reasons, imo):

  • merging relatively independent projects (in the context of a monorepo, for example)
  • wide "rip off the bandaid" refactor (especially type-system / compiler driven refactors)
  • having to merge shitty code from someone who left / from external contributors over whom you only have so much control
  • slow PR / merge cycles (can have many reasons: reviewers are scarce, QA is a bottleneck)
  • overall politics: management thinks PRs are a waste of time, crunch time

In general, I'm not a fan of "in a perfect world you wouldn't need more information" arguments.

In my experience, even small clean PRs can benefit from having a granular history, say when git blaming something 3 years down the road.

As for UI tools, I use magit / sourcetree / intellij's history browser, I'm sorry if other tools don't support it :/

I wish all tools supported --first-parent, because the (valid, because tool friction matters) reason is "my tool doesn't know how to display the information i want, thus i have to lose context for it", arguing that "merges make the history sloppy" is just a cop-out, it's just not true. I think one reason for that is that many developers don't know how git internally works, and thus have a warped understanding of what the history is. Git's CLI tooling really doesn't help here.

Collapse
jackmellis profile image
Jack

I used to insist devs squashed/rebased/etc. their commits before opening a PR and then use rebase-merge to merge the PR into main.
Over time I've learned the value of a squash merge. If a PR is too big to be able to describe in one commit message, or too complicated to understand from looking at the diff, then you're doing too much in one go.
Squash merging PRs is absolutely fine if your branch has nothing but work-in-progress commits. If you feel like you're losing something by squashing, then you need to rethink your process...

Collapse
wesen profile image
Manuel Odendahl Author

So you are saying to only do pull request that have the size of a single commit?

Thread Thread
jackmellis profile image
Jack

As a (very) general rule, yes. You should be able to understand a change based on a single commit message, yes.
Obviously sometimes you have a big feature that can't be released piece-by-piece. In that case I would have a feature branch, and then individual branches off that. You PR (with squash commits) each smaller piece of work into the feature branch, and then at the end merge (not squash) the feature branch in. You have a history of all the pieces of work done, but not all the useless wip commits that don't actually tell any kind of story...

Thread Thread
wesen profile image
Manuel Odendahl Author

That makes sense. I see a pattern emerging here. I think that usually, when we "argue", we often are actually solving the same problem, often in the same way, but with different words.

I operate under the premise that your branch history is meaningful, and has relevant commits. If you do a ton of WIP commits, I would question why you would do a "WIP" commit in the first place, because squash merge or not, you are robbing yourself of helpful history while developing your feature already. I also heavily use interface staging (staging individual hunks), both for "pseudo review", and to split up my work in proper chunks, with git commit hooks validating at every step of the way that my tests run. If I still manage to make a mess (say, I'm tired, or in a rush, or just frustrated), I will often spend the time to go back with interactive rebase and eliminate the junk either with squash / revert+squash or plain delete, more rarely split up bigger commits into smaller ones.

What you are describe in your workflow above to me is basically what I am achieving by keeping side branch history. I would say, as someone who often had to merge dirty crap branches, I do like to keep the WIP commits anyway, because they give me an insight into what someone was trying to do, what their cognitive style is, what they were struggling with, to be able to assist them better.

But let's let things speak. I recently merged a "big" commit, setting out to build a feature that led me to start introducing typescript annotations. We are fairly fast moving 2 dev team and reasonably trust each other, so the other dev was fine with keeping both the typescript introduction and the actual feature in the same PR. Here's my history in this case:

# 2022-05-17 597679dd482ca990cffb5fe73bbd91108163d4cc :art: Psalm fixes for Sql and OrdersSplits in tadmin
# 2022-05-17 9f67f64c17d0f4fb9e7f9f86d04a1c56770b5ad8 :art: Start adding some API typescript to tadmin
# 2022-05-17 51dedf0360364369bd873d65476185fab8e4e97c :sparkles: :zap: Faster items summary query (still not instant)
# 2022-05-17 a776a961952511ea756a999949d8a5e49fbcc37c :zap: Make it even a bit faster. Computing links is slow.
# 2022-05-17 025ef30f90e429cf4cc3d882c8b56b4dca9cd7f6 :zap: Make productQuantities computation even faster by getting managestock/isVirtual up front
# 2022-05-18 dc4d4b4584f954fbc8c3bb8633afd7c74e45b37c :art: Fix intellij code style at least roughly
# 2022-05-18 c2f5e98983121fcd7c8e42f944f5eba62de2ca69 :art: Use transients for image and permalink in getOrdersSummary
# 2022-05-18 9bba2793b0b8eaa503233948298c8d0f5a4a832b :art: Introduce RowType for Table/useTable, split out into files
# 2022-05-18 96eb7fd43777a2f74d3fe333e31d6fdd4b8254b8 :art: Fighting some odd import weirdnesses, gonna stop tweaking for now
# 2022-05-18 143fde48f4783296123a308578c1ce320ea9c575 :art: Start adding type to ManageOrders useTable
# 2022-05-19 1782ef761a5cc82e884ff7fbec33654f8b19b805 :tractor: Move api types to shared code
# 2022-05-19 dd3bdbadc69c0f5932d24fb09c467415d9b1289e :sparkles: Add more typing to useTableApi
# 2022-05-19 c66c30129283b995420f1f69a4113d0f19a53b4d :art: :tractor: Split out the different types of summary bars
# 2022-05-19 93f3ee27d111ebe6f1a114620e3e722db1f85ecf :ambulance: Fix proper backend query, remove permalink from sideview
# 2022-05-20 37578d82a18da86b42fe2932a72c52dfff3f8604 :art: First attempt at latching on to triggerSearch
# 2022-05-20 d57be0a0e88c8af0a8c9f2a7c2dabfcecfaeaf7a :art: Remove error_log
# 2022-05-20 63a77c1515e8a866dcc3c4d9c1322a2cd3b43035 :art: Remove error_log
# 2022-05-20 5420e6a64499cb812fd44566205cf8abebebfdd9 :sparkles: Proper orders summary debouncing
# 2022-05-20 883ecc39b38ed34ce47ead9036f993a263050bf3 :art: Cleanup dependency handling to avoid expensive backend calls
# 2022-05-20 d24292be1df37cba702c54f982f1630cb2f3a42a :ambulance: Fix useEffect eslint check
# 2022-05-20 cf70860c5f46e2725c0139e33ddcada656ff3112 :art: Fix prettier changes
# 2022-05-20 e067d0ddbf508501970ad0578fdd9e60b37a68e3 :ambulance: Fix type annotations
# 2022-05-20 54c70047ad5e3f6858799c6faaefe027d112b330 :sparkles: Measure and return performance measures
# 2022-05-20 0731046f579e441c5ed96292199e82df0c4d1c1a :poop: Bunch of performance measurement and debug logging
# 2022-05-20 37ae6f0afccf868d829103e45acfee2f069df744 :ambulance: Fix eslint warnings
# 2022-05-20 da06324156213dc6524adc53fc06708e3b1e5073 :ambulance: Fix PHP initialization of start_time
# 2022-05-20 858db0142afc2c21ee90b3a08fb557938877bc33 :art: Fix loading indicator
# 2022-05-23 7731a0a694b617585c51325442048491beea8998 :sparkles: :lipstick: Add checkbox to enable getting all orders summary
# 2022-05-23 2e89e73f96a772de978cd8228b44a34532794904 :art: Undo unnecessary stuff
# 2022-05-23 22c850563b8937951d1b3a46d19d6857222f14be :art: Use a single php-cs-fixer config
# 2022-05-23 8aa2b0ed201a55b196fb169e8be921d30ce9aa0e :zap: :art: Cache thumbnails for 24h
# 2022-05-23 cbd6fce9f072bb091197cc2fcec9063c7207d5db :pencil: Slight whitespace adjusts
# 2022-05-23 e74df0d3a02679f2c98b2f24c0245245e5956be6 :ambulance: Fix php-cs-fixer
# 2022-05-23 bd1e34d466495d86c1f53510be0936df909445f8 :art: Make linkbutton clickable, fix markup
# 2022-05-23 fafaeb6644f61e2b6ed1df475a2b6266e2eaf425 :art: Remove logging entries
Enter fullscreen mode Exit fullscreen mode

Those are all valuable commits to me that I would like to keep for the long run. Maybe I'll figure out that the reason a certain DB query doesn't work anymore is not say, the API change, but actually the "Fix php-cs-fixer" commit. Of course I could make a separate PR for "fix php-cs-fixer", and then again for "Make linkbutton clickable", and then again for "Remove logging entries", but then we end up where we started, except with a lot more PRs and CI runs.

Thread Thread
wesen profile image
Manuel Odendahl Author • Edited on

You'll note I use gitmoji, which I also find very useful, as I can at a glance recognize what the reason is behind commits. To show the graphical view: foo

Collapse
manuartero profile image
Manuel Artero Anguita

Hi Manuel, I'm Manuel.

you got me with this:

The thing is that my opinion is the correct one

Above all considering that my opinion is the correct one. Kidding.

Nice post, I just disagree, squashing is fine. But I can see your points. In the end it's a tool and sometimes will be handy sometimes won't.

Collapse
wesen profile image
Manuel Odendahl Author

Do you use squashing because you want to have a "clean" history per default? Or do you have other reasons?

Collapse
manuartero profile image
Manuel Artero Anguita

IMO too much information leads to disinformation. Checking the actual "WIP" commits from a feature branch is a "thin grain info" I've never-ever required.

Cleaner history is , yep, the main reason.

But actually I've faced another issue in the past; there was this repo 15+ years old in my company, with hundreds of committers through the ages, and commits in the order of n * 100.000. Dealing with this repo was a challenge actually! too much useless info at .git/ folder. What I'm trying to say is that "thin grain" info do weigh. Of course you need to reach those numbers.

Thread Thread
wesen profile image
Manuel Odendahl Author

A lot of people bring up "WIP" commits. Do you often do WIP commits? I personally rarely do (I do get frustrated and use the 💩 emoji, as I use gitmoji, but still make meaningful commits). But the point of the article was that you can easily hide all that information and focus on what you need.

As for historical gits, I wonder if people here ever did a git "cleanup" where most of the ancient state gets culled, and just the last few years are kept. Cruft does indeed accumulate.

Thread Thread
ludamillion profile image
Luke Inglis

Just adding a little perspective, I don't think I've ever worked on a team with anyone who didn't use WIP commits. I work on a small team and there is a lot of context switching that needs to happen and 'finishing' a commit before switching to something else just isn't an option.

Thread Thread
wesen profile image
Manuel Odendahl Author

Interesting. I have the opposite experience. I use git stash in those cases, or do you git rebase --interactive to clean things up later.

Thread Thread
lukens profile image
lukens

I don't like the idea of most of the ancient state being culled.

The codebase at my current job has a cutoff from when it was moved to git, and there's even less hope of finding out why something was done for code that predates that than there is for the rest of the codebase.

Maybe I've always worked at the wrong places, but I've never been in a place where I wish there were fewer commits in the history, but a lot of the time I have wished there were more commits (often when trying to review code), so that I had a finer grain insight into why a particular line of code was written, and what else was changed for the same purpose.

I'm 100% with you that losing this information is nothing but a bad thing. I find that even the worst git commits tend to provide the best and most accurate and up-to date documentation of the code; it amazes me that so many people choose to throw that away!

Collapse
wparad profile image
Warren Parad • Edited on

100% agreed. A "clean history" does not mean that every PR should have the commit history of the work process wiped out.

It's also really harmful to code reviews, where seeing the differences between revisions of the pull request is impossible with GIT, and near impossible with most git servers. GitLab supports this FWIW, I still don't think GitHub does.

There are lots of things that benefit from having separate commits, like a rename and then a diff, or a refactor inside of a PR. PLEASE DO NOT MERGE A REFACTOR WITH A FUNCTIONAL CHANGE. I don't want to see it in the same PR, let alone in the same commit.

No one that says you should squash, as ever had to a do a difficult task regarding git history. Had to find a bug with git bisect, sub directory migrations, find when a critical feature changed, or refactor gone wrong, you would know. I'll die on that hill.

That isn't to say you shouldn't clean up your commits before your open your PR. The only thing I don't want to see is "updates from pull request review", but anyone bringing to the table a conversation about squashing having to do with commit messages, is barking up the wrong tree. Fix your commit messages, squashing isn't a solution to that problem, it's a patch.

Collapse
wesen profile image
Manuel Odendahl Author

This is a good point.

Which makes me think, how can you give someone who hasn't experienced larger git pains the context in which some of those decisions are made? Or in general, workflow/code hygiene steps that might seem like red tape until you've experienced some of the nightmares that can ensure.

It's a paradoxical kind of thing, because if these systems are in place, by definition you will not encounter the reason why these systems were put in place (similar to any kind of preventative measure that works). I ran into really heated discussions where my point of view was basically "it caused me much pain in the past, trust me, we should do X", which is not a great argument.

Collapse
wparad profile image
Warren Parad

Honestly it goes both ways. If they are allowed to say "I like it better this way", then you are allowed to say "have my wisdom rather than learning through the experience of failure". "It looks prettier" is less of an argument. This is where:

Here's what we are going to do, this time we are going to do it my way, until we find a concrete problem that affects the business. It's one of the few times a year I'm going to pull the experience veto, but it's my job to do that.
If we learn that this was a mistake, it's a great story for your next promotion, or interview somewhere else when they ask you about a time that you disagreed with a solution but still went with it.
There will be other times that we disagree and we'll go with your solution, and take the same approach. If you think this happens where you are unfairly being treated, let's keep track of these, and we'll evaluate. We won't always agree and having a solution for the decision making in those situations is critical.
If you say that it has to be your way this time, then it's easy to infer that it always has to be your way, and that's something that prevents us from effectively together.

I use that in those circumstances, and it has yet to come back to me. In most cases, years later I have these same engineers coming and telling me "OMG remember this problem, I was totally wrong, and I tried to convey that same point to others, but they also didn't get it" Followed up by: "What am I supposed to do with these 'seniors'?"

Collapse
ingosteinke profile image
Ingo Steinke

Tell that to GitHub, they seem to have made squash commits the new default. Not possible to make merge commits in their web UI anymore, and that sucks!

Collapse
wesen profile image
Manuel Odendahl Author

I think you can with an option?

Collapse
ingosteinke profile image
Ingo Steinke

It used to be possible, but in practice, it is always grayed out, and this does not seem to result of a conscious decision by the project maintainers.
screenshot of GitHub merge options

  • Create a merge commit: Not enabled for this repository
  • Squash and merge
  • Rebase and merge: Not enabled for this repository
Thread Thread
wesen profile image
Manuel Odendahl Author

I think it has to be enabled in the repo settings. But now that git bisect and git blame all support --first-parent, I really don't see the point anymore. Maybe save some space because the intermediary blobs are garbage collected, but that kind of only makes sense on big public repositories like linux. And even then, people maintain different repositories that still keep the individual history.

Thread Thread
lukens profile image
lukens

My understanding is that GitHub also has some kind of hidden tag on the PR branch, so you can still view it on GitHub after it is squashed, so it presumably doesn't save any space for them.

Collapse
destynova profile image
Oisín • Edited on

In the past I've argued against squashing commits from the POV of making bisects easier later. That said, if your bisect is broken by intermediate buggy commits, that could throw you off too.
Probably my main reason to dislike squashing commits is that I like to review bigger PRs by going commit by commit, so I can better grasp the author's original intention and thought process as they evolved it.
I don't buy the idea of "tidying up" previous commits and I'm not sure who would really benefit from that. Seeing misconceptions and things that doesn't work out, and what you did to arrive at the current solution, that's valuable for my understanding.

Collapse
lukens profile image
lukens

I'm so glad that someone mentioned this, because I was thinking the exact same, and didn't understand this bizarro upside-down world, where people seem to think pull requests were easier to review if all the commits were squashed beforehand.

Yes, it's good to do a bit of tidy up with an interactive rebase first, if you have too many WIP commits, or "oops, missed this bit" commits, but I'd generally prefer more rather than fewer commits.

Collapse
wesen profile image
Manuel Odendahl Author

Makes total sense. I "tidy" up commits because I often do partial commits (staging individual hunks) in the moment, and then I go back to make sure everything builds properly at intermediate steps. I agree that bisecting on non-building side branches is a serious pain. There's ways to address things, but nonetheless, not the best experience.

Collapse
kevingranade profile image
Kevin Granade

As with all blanket statements, you're wrong :)

I do agree with a caveat, which is that as your contributor maturity increases, merges becone the dominant option.

IF a PR is composed of well-formed, meaningful commits, you should probably merge.

If a PR is composed of point-in-time or fix-fix-fix commits, and/or the individual commits don't build and pass tests, then you should squash them.

The point you're missing is that individual commits within a branch might be data, or they might be noise.

Collapse
wesen profile image
Manuel Odendahl Author

Agree, I ask people to put some effort into their own git histories. It not only provides better information for the rest of the team, but it also helps structure your own development workflow, and help you debug / bisect your own stuff.

That said, I like having a bunch of fix fix wtf wip wip commits still, because it still provides insight into what people struggled with, what their thinking process was, and gives a starting point when looking for some hairy stuff.

Collapse
matthewpersico profile image
Matthew O. Persico

A number of commentors have made the distinction of squashing before the PR is submitted and after it is submitted. I contend it's an irrelevant distinction.

If you are in the squash-before-but-not-after crowd, I counter that once a PR starts being reviewed, and updated and re-reviewed, you're going to get a whole list of commits that you'll events up wanting to squash anyway.

My criteria for squashing is this: for each particular commit, if you cannot roll back that commit and have a working functional system, then there is no point in having that commits in your history; squash it out.

Now, if you think there is value in the various conversations surrounding those commits, then keep them around, off the main branch like this:

  • make a copy of the branch the PR is sitting on (please tell me you're not modifying your main branch directly...), naming the copy it archive/branchname
  • squash branchbname
  • merge the PR, putting a reference to archive/branchname in the PR's comments.
Collapse
christiankozalla profile image
Christian Kozalla

I'm using GitLab at work and we are using the Squash-and-Merge option of GitLab.. I don't know how that compares to the way GitHub is doing it. But I also don't know how Squash-and-Merge compares to manually squashing, pushing and then opening a PR

We, as a team, are using Squash-and-Merge because one single feature will be mapped to one single commit. But I suppose the same is possible with unsquashed merge commits..

Collapse
wesen profile image
Manuel Odendahl Author

Only one way to know! look at the graph, and use git cat-file to look into the internals.

Collapse
xmarkclx profile image
Mark Lopez

One reason I'm for the squash merge camp hasn't been mentioned, so I think I should mention it here so @wesen can correct me.

We use squash merge to make it easy to revert a whole PR since it's just a commit.
It can be reverted after some time has passed easily by just reverting that commit.

What do you recommend for this so I can leave the wrong squash merge camp and follow the righteous path oh great @wesen .

Collapse
wesen profile image
Manuel Odendahl Author

you can do exactly the same for a merge commit by using git revert -m1. The squash merge commit and the merge commit both point to the same tree hash, they only differ wrt the parent commits. With a squash merge, you only have 1, so git revert knows "ok well you just want to revert to the parent". With the merge commit, you have 2, so you have to tell it "please use the left parent (aka, the parent on the main branch) to revert to". easy peasy!

Collapse
arthurolga profile image
arthurolga

I don't like Squash if you are working with good commits they should be few per PR, if you have something around 10 commits on a feature, it probably should be separated into smaller tasks.

If the developer wants to aggregate two or more commits, like if they refactored some part of the code, BEFORE MAKING THE PR, then I totally like Squash.

If you have small commits with good names, it probably is better to just Merge than Squash and Merge, e.g.

Commit 1: Add DatePicker component to App
Commit 2: Make API call for Date Service on UserScreen
Commit 3: Make API call for Date Service on PostScreen

If something is breaking, it will probably be easier to see in which commit, also allows for better cherry picking.

Collapse
aalvarado profile image
aalvarado

How easy it is to do git reverts in unsquashed commit histories?

Collapse
wesen profile image
Manuel Odendahl Author

just as easy as a revert of a squashed commit. Pass in git revert -m1 commit, it will then use the "filesystem state" of the left parent node as the revert result, just as it would if you had a single squash commit.

Collapse
aalvarado profile image
aalvarado

Cool, I think that most people don't use merge + squash and it's something that with the --first-parent option doesn't really matter now if they do or not.

Collapse
sirepicmike profile image
Mike Martin

We occasionally need to revert a feature from a release branch if it fails UAT. While this isn't too common, It's much easier to do this if the feature is squashed into a single commit (pre-push). Yet to see much of a downside.

Collapse
wesen profile image
Manuel Odendahl Author

You can just use git revert -m1 to revert a merge commit to the first parent (aka, what git reverting the squashed commit would do). -m1 says "revert to the first parent, aka the one that git squash preserves.

git revert really just resets the checked out tree to a specific tree hash, and prepares a pretty commit message. It doesn't really have much to do with the history itself. You could pretty much get the same result by doing (haven't fully tried this out, just a sketch):

git reset --hard ${hashyouwanttoreverto}
git commit -m "Revert XXX"
Enter fullscreen mode Exit fullscreen mode
Collapse
tzwel profile image
tzwel

awesome article, at just the beggining it was obvious you know what you are doing, keep it up

Collapse
stewartjarod profile image
Jarod Stewart • Edited on

I think you assume that people use Git to a high standard and they commit with more intent always... often times in practice a PR/MR is filled with commits like wip, wtf is this?, fix, wip again. I highly encourage people to Squash & Merge those PRs ;d

If you are being diligent and maintain clean git history or cleanup your commits like a good lil developer, by all means, rebase those commits to main ;d

Collapse
wesen profile image
Manuel Odendahl Author

Is this because you think they don't have the times/skills to clean up their own history, and thus delegate it to merge time?

Collapse
stewartjarod profile image
Jarod Stewart

Haven't wanted to press the matter when other things seem more important. Easier to add husky and semantic commit linting with pre-commit hooks imo ;d

Collapse
wesen profile image
Manuel Odendahl Author • Edited on

If you have a good reason to squash commit, please post it here. But I don't think you have.

Collapse
ecyrbe profile image
ecyrbe

I have some pretty good reasons to squash my commits when working with a team:

  • Evoid rebase nightmare where i will need to fix the same conflicts for each commits.
  • Evoid revert nightmare when reverting a faulty merges
  • Remove easilly uneeded git History (essentially bug hunting commits, trial and errror commits). And don't tell anyone to not commit unfinished work. Git was created to allow and encourage it.
  • Your developpement process has nothing (brunch of test red, impl test green, refactor) to do with your product History. What matters is the feature History. And that's what i want to see in the product git History.
Collapse
wesen profile image
Manuel Odendahl Author

There's a couple of tricks to have an easier time rebasing:

  • You can avoid "rebase nightmare" by using git rerere. It records how you resolve conflicts and allows you to replay it.
  • For "reverting", you just need to checkout the tree to the state that you want to revert to, and make an appropriate commit message.

I always think "tree state" first, more so than caring about individual commits. I can always link up the graph by manually putting in a parent link when merging, if I do want to show what happened in the history.

Realizing that git only cares about file contents, not diffs or commit or patches, really freed up how I can navigate "complicated" issues.

For the last two points you raise, my approach is to use --first-parent or similar flags to just look at the part of the history i care about (usually, one commit per ticket on the main branch) and link it up to product features (the ticket themselves). No need to squash.

Collapse
jlamoreaux profile image
Jordan Lamoreaux

The thing is that my opinion is the correct one.

Well said. I will be using this. 😅

Collapse
manzapanza profile image
Massimo Rangoni • Edited on

IMHO the main problem when you do NOT squash your work (generally it's a feature or a fix) is that, sometimes or often (it depends on how many devs are working on the same repo and how disciplined they are) you have to fix more merge/rebase conflicts.

Let's say you send a PR with 5 commits, and you have made changes on a same line in 3 of those 5 commits. And after that you have to rebase your PR (because other PRs were accepted before yours) and unluckily you got a conflict on that line... you will have to fix 3 times a merge conflict. but if you squash your work you have to fix just 1 merge conflict. So all those little code refactoring made during the development of the same feature/fix could multiply these problems exponentially.

Another aspect is that all those sub-feature/fix commits make sense just for the author, but for the other devs generally not.

But what I'm saying depends on some assumptions I've made, for example if you're working alone in a repository these issues will probably never happen.

Anyway, although I don't agree with you, I liked the article for the discussion it generated.

Thank you!

Collapse
masonharper profile image
Mason Marper

Awesome post

Collapse
aaravrrrrrr profile image
Aarav Reddy

Really helpful.

Collapse
snelson1 profile image
Sophia Nelson

Thanks

Collapse
rzs401 profile image
Richard Smith

nice stuff

Collapse
andresbecker profile image
Andres Becker

Great post!

Collapse
kelseyjj profile image
Kelsey Jones

Awesome article.