DEV Community

Cover image for ⛔ Squash commits considered harmful ⛔

⛔ Squash commits considered harmful ⛔

Manuel Odendahl on May 23, 2022

A recurring conversation in developer circles is if you should use git --squash when merging or do explicit merge commits. The short answer: you sh...
Collapse
 
simeg profile image
Simon Egersand 🎈

I assume you are referring to the "Squash and merge" option on GitHub? If so, yes I 100% agree with you.

On the other hand, if you mean devs should not squash and rebase before pushing a PR, then I disagree.

PS. Some of the formatting in your post is off :) Around the last example with git

Collapse
 
lukas1 profile image
lukas1

It's not as bad. Provided you keep reference to the PR number in the commit message. Luckily Github includes PR number into the commit message when merging automatically and in the github history it will even create a link directly to the PR. That way one does not lose the history of the PR itself, should anyone really need it.

It worked well with one team I was involved in.

Teaching good commit practices and using git to its full potential is doable, when majority of the team is already good with it and only some developers need help, if the whole team has problems with that, it's not so easy, the option to squash merge saves a lot of time.

Also helps to get rid of nasty merge commits of merging main branch into feature branch, if github is setup so that it requires the feature branch to have latest changes from main branch (which should be required). Rebasing would be preferable, but it's not as comfortable, because it will require new approval from your team, if your protected branches rules require an approval before merge (which it should).

Collapse
 
wesen profile image
Manuel Odendahl

by squash you mean collapse all commits into a single one? because i think that's wrong :)

Collapse
 
wesen profile image
Manuel Odendahl

I do think spending some time in git rebase --interactive (or magit in my case) makes a lot of sense, however.

Thread Thread
 
simeg profile image
Simon Egersand 🎈

Yeah, totally agree!

Collapse
 
simeg profile image
Simon Egersand 🎈

No, that's not what I mean. I was confused if that was what you meant. I guess we're on same page :D

Collapse
 
memark profile image
Magnus Markling • Edited

If squashing means loosing too much information, then your PRs are probably too big to begin with. Imho it's a code (or process) smell that should be brought to attention asap.

As for looking at what the developer did in their branch, I tend to think the PR should speak for itself. How we got there is not important. Unless you're also prepared to spend a lot of time cleaning up your branches before sending PRs. (Time you could possibly spend making many small PRs instead.)

Nice trick for the CLI tools with "first parent"! I was not aware it even existed. Unfortunately it's not available in most graphical tools that I'm aware of, so those users will be stuck with the "ugly" history.

Collapse
 
jackmellis profile image
Jack

I used to insist devs squashed/rebased/etc. their commits before opening a PR and then use rebase-merge to merge the PR into main.
Over time I've learned the value of a squash merge. If a PR is too big to be able to describe in one commit message, or too complicated to understand from looking at the diff, then you're doing too much in one go.
Squash merging PRs is absolutely fine if your branch has nothing but work-in-progress commits. If you feel like you're losing something by squashing, then you need to rethink your process...

Collapse
 
wesen profile image
Manuel Odendahl

So you are saying to only do pull request that have the size of a single commit?

Thread Thread
 
jackmellis profile image
Jack

As a (very) general rule, yes. You should be able to understand a change based on a single commit message, yes.
Obviously sometimes you have a big feature that can't be released piece-by-piece. In that case I would have a feature branch, and then individual branches off that. You PR (with squash commits) each smaller piece of work into the feature branch, and then at the end merge (not squash) the feature branch in. You have a history of all the pieces of work done, but not all the useless wip commits that don't actually tell any kind of story...

Thread Thread
 
wesen profile image
Manuel Odendahl

That makes sense. I see a pattern emerging here. I think that usually, when we "argue", we often are actually solving the same problem, often in the same way, but with different words.

I operate under the premise that your branch history is meaningful, and has relevant commits. If you do a ton of WIP commits, I would question why you would do a "WIP" commit in the first place, because squash merge or not, you are robbing yourself of helpful history while developing your feature already. I also heavily use interface staging (staging individual hunks), both for "pseudo review", and to split up my work in proper chunks, with git commit hooks validating at every step of the way that my tests run. If I still manage to make a mess (say, I'm tired, or in a rush, or just frustrated), I will often spend the time to go back with interactive rebase and eliminate the junk either with squash / revert+squash or plain delete, more rarely split up bigger commits into smaller ones.

What you are describe in your workflow above to me is basically what I am achieving by keeping side branch history. I would say, as someone who often had to merge dirty crap branches, I do like to keep the WIP commits anyway, because they give me an insight into what someone was trying to do, what their cognitive style is, what they were struggling with, to be able to assist them better.

But let's let things speak. I recently merged a "big" commit, setting out to build a feature that led me to start introducing typescript annotations. We are fairly fast moving 2 dev team and reasonably trust each other, so the other dev was fine with keeping both the typescript introduction and the actual feature in the same PR. Here's my history in this case:

# 2022-05-17 597679dd482ca990cffb5fe73bbd91108163d4cc :art: Psalm fixes for Sql and OrdersSplits in tadmin
# 2022-05-17 9f67f64c17d0f4fb9e7f9f86d04a1c56770b5ad8 :art: Start adding some API typescript to tadmin
# 2022-05-17 51dedf0360364369bd873d65476185fab8e4e97c :sparkles: :zap: Faster items summary query (still not instant)
# 2022-05-17 a776a961952511ea756a999949d8a5e49fbcc37c :zap: Make it even a bit faster. Computing links is slow.
# 2022-05-17 025ef30f90e429cf4cc3d882c8b56b4dca9cd7f6 :zap: Make productQuantities computation even faster by getting managestock/isVirtual up front
# 2022-05-18 dc4d4b4584f954fbc8c3bb8633afd7c74e45b37c :art: Fix intellij code style at least roughly
# 2022-05-18 c2f5e98983121fcd7c8e42f944f5eba62de2ca69 :art: Use transients for image and permalink in getOrdersSummary
# 2022-05-18 9bba2793b0b8eaa503233948298c8d0f5a4a832b :art: Introduce RowType for Table/useTable, split out into files
# 2022-05-18 96eb7fd43777a2f74d3fe333e31d6fdd4b8254b8 :art: Fighting some odd import weirdnesses, gonna stop tweaking for now
# 2022-05-18 143fde48f4783296123a308578c1ce320ea9c575 :art: Start adding type to ManageOrders useTable
# 2022-05-19 1782ef761a5cc82e884ff7fbec33654f8b19b805 :tractor: Move api types to shared code
# 2022-05-19 dd3bdbadc69c0f5932d24fb09c467415d9b1289e :sparkles: Add more typing to useTableApi
# 2022-05-19 c66c30129283b995420f1f69a4113d0f19a53b4d :art: :tractor: Split out the different types of summary bars
# 2022-05-19 93f3ee27d111ebe6f1a114620e3e722db1f85ecf :ambulance: Fix proper backend query, remove permalink from sideview
# 2022-05-20 37578d82a18da86b42fe2932a72c52dfff3f8604 :art: First attempt at latching on to triggerSearch
# 2022-05-20 d57be0a0e88c8af0a8c9f2a7c2dabfcecfaeaf7a :art: Remove error_log
# 2022-05-20 63a77c1515e8a866dcc3c4d9c1322a2cd3b43035 :art: Remove error_log
# 2022-05-20 5420e6a64499cb812fd44566205cf8abebebfdd9 :sparkles: Proper orders summary debouncing
# 2022-05-20 883ecc39b38ed34ce47ead9036f993a263050bf3 :art: Cleanup dependency handling to avoid expensive backend calls
# 2022-05-20 d24292be1df37cba702c54f982f1630cb2f3a42a :ambulance: Fix useEffect eslint check
# 2022-05-20 cf70860c5f46e2725c0139e33ddcada656ff3112 :art: Fix prettier changes
# 2022-05-20 e067d0ddbf508501970ad0578fdd9e60b37a68e3 :ambulance: Fix type annotations
# 2022-05-20 54c70047ad5e3f6858799c6faaefe027d112b330 :sparkles: Measure and return performance measures
# 2022-05-20 0731046f579e441c5ed96292199e82df0c4d1c1a :poop: Bunch of performance measurement and debug logging
# 2022-05-20 37ae6f0afccf868d829103e45acfee2f069df744 :ambulance: Fix eslint warnings
# 2022-05-20 da06324156213dc6524adc53fc06708e3b1e5073 :ambulance: Fix PHP initialization of start_time
# 2022-05-20 858db0142afc2c21ee90b3a08fb557938877bc33 :art: Fix loading indicator
# 2022-05-23 7731a0a694b617585c51325442048491beea8998 :sparkles: :lipstick: Add checkbox to enable getting all orders summary
# 2022-05-23 2e89e73f96a772de978cd8228b44a34532794904 :art: Undo unnecessary stuff
# 2022-05-23 22c850563b8937951d1b3a46d19d6857222f14be :art: Use a single php-cs-fixer config
# 2022-05-23 8aa2b0ed201a55b196fb169e8be921d30ce9aa0e :zap: :art: Cache thumbnails for 24h
# 2022-05-23 cbd6fce9f072bb091197cc2fcec9063c7207d5db :pencil: Slight whitespace adjusts
# 2022-05-23 e74df0d3a02679f2c98b2f24c0245245e5956be6 :ambulance: Fix php-cs-fixer
# 2022-05-23 bd1e34d466495d86c1f53510be0936df909445f8 :art: Make linkbutton clickable, fix markup
# 2022-05-23 fafaeb6644f61e2b6ed1df475a2b6266e2eaf425 :art: Remove logging entries
Enter fullscreen mode Exit fullscreen mode

Those are all valuable commits to me that I would like to keep for the long run. Maybe I'll figure out that the reason a certain DB query doesn't work anymore is not say, the API change, but actually the "Fix php-cs-fixer" commit. Of course I could make a separate PR for "fix php-cs-fixer", and then again for "Make linkbutton clickable", and then again for "Remove logging entries", but then we end up where we started, except with a lot more PRs and CI runs.

Thread Thread
 
wesen profile image
Manuel Odendahl • Edited

You'll note I use gitmoji, which I also find very useful, as I can at a glance recognize what the reason is behind commits. To show the graphical view: foo

Collapse
 
wesen profile image
Manuel Odendahl • Edited

I had a long conversation about that with other developers, it was very interesting, and I plan to write about "big evil merges" in the future.

Situations where big branch merges might happen (for valid reasons, imo):

  • merging relatively independent projects (in the context of a monorepo, for example)
  • wide "rip off the bandaid" refactor (especially type-system / compiler driven refactors)
  • having to merge shitty code from someone who left / from external contributors over whom you only have so much control
  • slow PR / merge cycles (can have many reasons: reviewers are scarce, QA is a bottleneck)
  • overall politics: management thinks PRs are a waste of time, crunch time

In general, I'm not a fan of "in a perfect world you wouldn't need more information" arguments.

In my experience, even small clean PRs can benefit from having a granular history, say when git blaming something 3 years down the road.

As for UI tools, I use magit / sourcetree / intellij's history browser, I'm sorry if other tools don't support it :/

I wish all tools supported --first-parent, because the (valid, because tool friction matters) reason is "my tool doesn't know how to display the information i want, thus i have to lose context for it", arguing that "merges make the history sloppy" is just a cop-out, it's just not true. I think one reason for that is that many developers don't know how git internally works, and thus have a warped understanding of what the history is. Git's CLI tooling really doesn't help here.

Collapse
 
manuartero profile image
Manuel Artero Anguita 🟨

Hi Manuel, I'm Manuel.

you got me with this:

The thing is that my opinion is the correct one

Above all considering that my opinion is the correct one. Kidding.

Nice post, I just disagree, squashing is fine. But I can see your points. In the end it's a tool and sometimes will be handy sometimes won't.

Collapse
 
wesen profile image
Manuel Odendahl

Do you use squashing because you want to have a "clean" history per default? Or do you have other reasons?

Collapse
 
manuartero profile image
Manuel Artero Anguita 🟨

IMO too much information leads to disinformation. Checking the actual "WIP" commits from a feature branch is a "thin grain info" I've never-ever required.

Cleaner history is , yep, the main reason.

But actually I've faced another issue in the past; there was this repo 15+ years old in my company, with hundreds of committers through the ages, and commits in the order of n * 100.000. Dealing with this repo was a challenge actually! too much useless info at .git/ folder. What I'm trying to say is that "thin grain" info do weigh. Of course you need to reach those numbers.

Thread Thread
 
wesen profile image
Manuel Odendahl

A lot of people bring up "WIP" commits. Do you often do WIP commits? I personally rarely do (I do get frustrated and use the 💩 emoji, as I use gitmoji, but still make meaningful commits). But the point of the article was that you can easily hide all that information and focus on what you need.

As for historical gits, I wonder if people here ever did a git "cleanup" where most of the ancient state gets culled, and just the last few years are kept. Cruft does indeed accumulate.

Thread Thread
 
ludamillion profile image
Luke Inglis

Just adding a little perspective, I don't think I've ever worked on a team with anyone who didn't use WIP commits. I work on a small team and there is a lot of context switching that needs to happen and 'finishing' a commit before switching to something else just isn't an option.

Thread Thread
 
wesen profile image
Manuel Odendahl

Interesting. I have the opposite experience. I use git stash in those cases, or do you git rebase --interactive to clean things up later.

Thread Thread
 
lukens profile image
lukens

I don't like the idea of most of the ancient state being culled.

The codebase at my current job has a cutoff from when it was moved to git, and there's even less hope of finding out why something was done for code that predates that than there is for the rest of the codebase.

Maybe I've always worked at the wrong places, but I've never been in a place where I wish there were fewer commits in the history, but a lot of the time I have wished there were more commits (often when trying to review code), so that I had a finer grain insight into why a particular line of code was written, and what else was changed for the same purpose.

I'm 100% with you that losing this information is nothing but a bad thing. I find that even the worst git commits tend to provide the best and most accurate and up-to date documentation of the code; it amazes me that so many people choose to throw that away!

Collapse
 
wparad profile image
Warren Parad • Edited

100% agreed. A "clean history" does not mean that every PR should have the commit history of the work process wiped out.

It's also really harmful to code reviews, where seeing the differences between revisions of the pull request is impossible with GIT, and near impossible with most git servers. GitLab supports this FWIW, I still don't think GitHub does.

There are lots of things that benefit from having separate commits, like a rename and then a diff, or a refactor inside of a PR. PLEASE DO NOT MERGE A REFACTOR WITH A FUNCTIONAL CHANGE. I don't want to see it in the same PR, let alone in the same commit.

No one that says you should squash, as ever had to a do a difficult task regarding git history. Had to find a bug with git bisect, sub directory migrations, find when a critical feature changed, or refactor gone wrong, you would know. I'll die on that hill.

That isn't to say you shouldn't clean up your commits before your open your PR. The only thing I don't want to see is "updates from pull request review", but anyone bringing to the table a conversation about squashing having to do with commit messages, is barking up the wrong tree. Fix your commit messages, squashing isn't a solution to that problem, it's a patch.

Collapse
 
wesen profile image
Manuel Odendahl

This is a good point.

Which makes me think, how can you give someone who hasn't experienced larger git pains the context in which some of those decisions are made? Or in general, workflow/code hygiene steps that might seem like red tape until you've experienced some of the nightmares that can ensure.

It's a paradoxical kind of thing, because if these systems are in place, by definition you will not encounter the reason why these systems were put in place (similar to any kind of preventative measure that works). I ran into really heated discussions where my point of view was basically "it caused me much pain in the past, trust me, we should do X", which is not a great argument.

Collapse
 
wparad profile image
Warren Parad

Honestly it goes both ways. If they are allowed to say "I like it better this way", then you are allowed to say "have my wisdom rather than learning through the experience of failure". "It looks prettier" is less of an argument. This is where:

Here's what we are going to do, this time we are going to do it my way, until we find a concrete problem that affects the business. It's one of the few times a year I'm going to pull the experience veto, but it's my job to do that.
If we learn that this was a mistake, it's a great story for your next promotion, or interview somewhere else when they ask you about a time that you disagreed with a solution but still went with it.
There will be other times that we disagree and we'll go with your solution, and take the same approach. If you think this happens where you are unfairly being treated, let's keep track of these, and we'll evaluate. We won't always agree and having a solution for the decision making in those situations is critical.
If you say that it has to be your way this time, then it's easy to infer that it always has to be your way, and that's something that prevents us from effectively together.

I use that in those circumstances, and it has yet to come back to me. In most cases, years later I have these same engineers coming and telling me "OMG remember this problem, I was totally wrong, and I tried to convey that same point to others, but they also didn't get it" Followed up by: "What am I supposed to do with these 'seniors'?"

Collapse
 
frankweindel profile image
Frank Weindel

What no one is mentioning is that the squash feature on GitHub PRs preserves the original commits that were squashed. If you REALLY need to go back and examine the granular history of a PR than you can still do so. On teams I've been on, we strive for PRs that are not over scoped and where that squashed message tells you exactly what feature / fix was added by those line changes. We also often use Conventional Commits, which help in a big way with release note automation. When I look at a main branch history like this:

fix: No long errors when pressing enter (#29) [tag: v1.1.0]
feat(ModuleA): Add metrics logging (#27)
feat(ModuleB): Support extra query string params in routes (#20)
chore: Update node to 16.13.1 (#24)
test: Require at least one assertion with every test (#21) [tag: v1.0.0]
Enter fullscreen mode Exit fullscreen mode

I see VERY clearly what features and fixes have gone in between versions 1.0.0 and 1.1.0. I also have easy links to the PRs that were squashed to produce those commits if I need to drill down any further. If a feature needs to be reverted it's a very easy reversion (no extra parameters).

If you end up with a PR that has a very large scope, there are two things one can/should do:

  • Split the PR in to multiple easier to review dependent PRs (preferred)
  • Carefully massage and curate the commits in a PR and use the Rebase Merge option in the PR
    • Making sure the PR number gets appended to each commit's title for backtracing.

I'll agree that just having a "linear commit history" shouldn't be the only reason for doing squash commits. But if it simplifies your team's workflow, reduces cognitive load, and makes understanding exactly what is included in a release easier to find then I say it's worth doing.

That said, I feel there is a "best of both worlds" place we could get to if we strived for it.

First, a merge commit is really a squash but with an extra parent link to a branch where the squash originated. I wish this was driven home more however the standard commit title for these PR merge commits is always something like these:

Merge pull request #43 from foo/fix-bar
Merge pull request #42 from foo/add-bar
Enter fullscreen mode Exit fullscreen mode

Those titles don't help me. It has the PR number, which I have to individually click on and look up, and a branch name, which could easily be too brief, poorly written or even irrelevant.

Nothing in Git, from what I understand, prevents those titles being similar to the squash titles I shared above. So if GitHub produced them you'd suddenly have the clarity you have with squash merges.

Second, the Git CLI commands and various Git UIs default to showing/working with the full branch out history of everything. You need to know special parameters or set certain settings in order to see and work with a simplified linear view. If these tools defaulted to a linear history and required special parameters in order to drill down into merged branches I feel that would improve the developer experience a bunch. You get an easy to understand summarized linear history and the ability to go deeper when you need to.

Of course, outside of the GUIs maybe, any changes in how people work with Git are very hard pushes from what I understand.

Third, GitHub could allow the PR author to set the merging strategy to be used in advance. Since each developer may have their own style, some with very intentful and effortful PR commits like your own @wesen, others who commit WIP things quickly and often, and some with a mix. This gives the author the ability to decide how they will ultimately formulate their PR. Obviously certain projects can still limit what PR merge strategies are available, and admins could still override the author's preset wishes. But the author at least has a chance to influence how the commits in the PR will be laid into the base branch.

But just to wrap up my argument, I think like any other tool squashing vs merging vs rebasing are options that teams can consider and make a decision on using given whatever their needs and circumstances are. There is no one size fits all approach to it.

Collapse
 
lyndonhughey profile image
Lyndon Hughey

Yep. Not squashing your commit is like a submitting your rough draft for publishing to an editor. It's unnecessary verbosity.

Thread Thread
 
wesen profile image
Manuel Odendahl

I'm not sure I understand. If you don't want to see the individual commits, you don't have to look at them?

Collapse
 
ingosteinke profile image
Ingo Steinke

Tell that to GitHub, they seem to have made squash commits the new default. Not possible to make merge commits in their web UI anymore, and that sucks!

Collapse
 
wesen profile image
Manuel Odendahl

I think you can with an option?

Collapse
 
ingosteinke profile image
Ingo Steinke

It used to be possible, but in practice, it is always grayed out, and this does not seem to result of a conscious decision by the project maintainers.
screenshot of GitHub merge options

  • Create a merge commit: Not enabled for this repository
  • Squash and merge
  • Rebase and merge: Not enabled for this repository
Thread Thread
 
wesen profile image
Manuel Odendahl

I think it has to be enabled in the repo settings. But now that git bisect and git blame all support --first-parent, I really don't see the point anymore. Maybe save some space because the intermediary blobs are garbage collected, but that kind of only makes sense on big public repositories like linux. And even then, people maintain different repositories that still keep the individual history.

Thread Thread
 
lukens profile image
lukens

My understanding is that GitHub also has some kind of hidden tag on the PR branch, so you can still view it on GitHub after it is squashed, so it presumably doesn't save any space for them.

Collapse
 
destynova profile image
Oisín • Edited

In the past I've argued against squashing commits from the POV of making bisects easier later. That said, if your bisect is broken by intermediate buggy commits, that could throw you off too.
Probably my main reason to dislike squashing commits is that I like to review bigger PRs by going commit by commit, so I can better grasp the author's original intention and thought process as they evolved it.
I don't buy the idea of "tidying up" previous commits and I'm not sure who would really benefit from that. Seeing misconceptions and things that doesn't work out, and what you did to arrive at the current solution, that's valuable for my understanding.

Collapse
 
lukens profile image
lukens

I'm so glad that someone mentioned this, because I was thinking the exact same, and didn't understand this bizarro upside-down world, where people seem to think pull requests were easier to review if all the commits were squashed beforehand.

Yes, it's good to do a bit of tidy up with an interactive rebase first, if you have too many WIP commits, or "oops, missed this bit" commits, but I'd generally prefer more rather than fewer commits.

Collapse
 
wesen profile image
Manuel Odendahl

Makes total sense. I "tidy" up commits because I often do partial commits (staging individual hunks) in the moment, and then I go back to make sure everything builds properly at intermediate steps. I agree that bisecting on non-building side branches is a serious pain. There's ways to address things, but nonetheless, not the best experience.

Collapse
 
matthewpersico profile image
Matthew O. Persico

A number of commentors have made the distinction of squashing before the PR is submitted and after it is submitted. I contend it's an irrelevant distinction.

If you are in the squash-before-but-not-after crowd, I counter that once a PR starts being reviewed, and updated and re-reviewed, you're going to get a whole list of commits that you'll events up wanting to squash anyway.

My criteria for squashing is this: for each particular commit, if you cannot roll back that commit and have a working functional system, then there is no point in having that commits in your history; squash it out.

Now, if you think there is value in the various conversations surrounding those commits, then keep them around, off the main branch like this:

  • make a copy of the branch the PR is sitting on (please tell me you're not modifying your main branch directly...), naming the copy it archive/branchname
  • squash branchbname
  • merge the PR, putting a reference to archive/branchname in the PR's comments.
Collapse
 
christiankozalla profile image
Christian Kozalla

I'm using GitLab at work and we are using the Squash-and-Merge option of GitLab.. I don't know how that compares to the way GitHub is doing it. But I also don't know how Squash-and-Merge compares to manually squashing, pushing and then opening a PR

We, as a team, are using Squash-and-Merge because one single feature will be mapped to one single commit. But I suppose the same is possible with unsquashed merge commits..

Collapse
 
wesen profile image
Manuel Odendahl

Only one way to know! look at the graph, and use git cat-file to look into the internals.

Collapse
 
simeg profile image
Simon Egersand 🎈

Yeah, me too. This is one of (if not the most) important practices you should do to make the PR review process quick. A quick PR review process in turn speeds up the shipping of code => business moves faster => greater chances of success for the company.

Always make your commits nice before asking for a PR review!

Thread Thread
 
wesen profile image
Manuel Odendahl

do you look at individual commits when doing a review? because the diff view shown is just the comparison of the trees, the history itself is irrelevant.

Thread Thread
 
simeg profile image
Simon Egersand 🎈

I do look at individual commits of the PR, yeah. Sometimes it makes sense to split up a task into multiple commits, or include refactor work, and that work should not be combined IMO.

Thread Thread
 
wesen profile image
Manuel Odendahl

I agree. I'm not the greatest at this (often solo dev on things), but it's a good skill to know.

Collapse
 
dadyasasha profile image
Alex Pushkarev

I really liked the way you explained the way squash is working. But it seems that the main argument against squashing is that it just drops the history?

That's fair, but it doesn't mean squash is bad, it means one just need to know the cost, correct?

Collapse
 
wesen profile image
Manuel Odendahl

Yes. But most people argue that it "cleans up" history, because they are unaware that you can easily hide the right parent when printing out logs, for example. I find losing history a very high cost to avoid using a pretty-print flag.

Collapse
 
dadyasasha profile image
Alex Pushkarev

That's a very good argument. The other perspective to consider is that more and more people don't use git from command line so they see whatever they git tool shows, which may be unable to do pretty-print in a firts place

 
wesen profile image
Manuel Odendahl

How do you mean you can't not look at it? In which context? I usually use git log --first-parent when doing release work, and only look at side histories when I need to get in there.

Collapse
 
kevingranade profile image
Kevin Granade

As with all blanket statements, you're wrong :)

I do agree with a caveat, which is that as your contributor maturity increases, merges becone the dominant option.

IF a PR is composed of well-formed, meaningful commits, you should probably merge.

If a PR is composed of point-in-time or fix-fix-fix commits, and/or the individual commits don't build and pass tests, then you should squash them.

The point you're missing is that individual commits within a branch might be data, or they might be noise.

Collapse
 
wesen profile image
Manuel Odendahl

Agree, I ask people to put some effort into their own git histories. It not only provides better information for the rest of the team, but it also helps structure your own development workflow, and help you debug / bisect your own stuff.

That said, I like having a bunch of fix fix wtf wip wip commits still, because it still provides insight into what people struggled with, what their thinking process was, and gives a starting point when looking for some hairy stuff.

Collapse
 
csgeek profile image
csgeek

Oh good. I was going to say.. half of my commits are 'saving crap', 'working.. I think'. Not sure if what values those have.

Merging branches is different.

Collapse
 
xmarkclx profile image
Mark Lopez

One reason I'm for the squash merge camp hasn't been mentioned, so I think I should mention it here so @wesen can correct me.

We use squash merge to make it easy to revert a whole PR since it's just a commit.
It can be reverted after some time has passed easily by just reverting that commit.

What do you recommend for this so I can leave the wrong squash merge camp and follow the righteous path oh great @wesen .

Collapse
 
wesen profile image
Manuel Odendahl

you can do exactly the same for a merge commit by using git revert -m1. The squash merge commit and the merge commit both point to the same tree hash, they only differ wrt the parent commits. With a squash merge, you only have 1, so git revert knows "ok well you just want to revert to the parent". With the merge commit, you have 2, so you have to tell it "please use the left parent (aka, the parent on the main branch) to revert to". easy peasy!

Collapse
 
arthurolga profile image
arthurolga

I don't like Squash if you are working with good commits they should be few per PR, if you have something around 10 commits on a feature, it probably should be separated into smaller tasks.

If the developer wants to aggregate two or more commits, like if they refactored some part of the code, BEFORE MAKING THE PR, then I totally like Squash.

If you have small commits with good names, it probably is better to just Merge than Squash and Merge, e.g.

Commit 1: Add DatePicker component to App
Commit 2: Make API call for Date Service on UserScreen
Commit 3: Make API call for Date Service on PostScreen

If something is breaking, it will probably be easier to see in which commit, also allows for better cherry picking.

Collapse
 
alucas profile image
Antoine LUCAS • Edited

It feels so good to read a tech blog not written by a junior enthusiast with a degree trying to skip that 5 years learning period.
Thanks mate.
I despise git for always going against my way, and I am looking at alternatives CVS workflows. The whole fact that there is room for arguing is so toxic.

Collapse
 
sirepicmike profile image
Mike Martin

We occasionally need to revert a feature from a release branch if it fails UAT. While this isn't too common, It's much easier to do this if the feature is squashed into a single commit (pre-push). Yet to see much of a downside.

Collapse
 
wesen profile image
Manuel Odendahl

You can just use git revert -m1 to revert a merge commit to the first parent (aka, what git reverting the squashed commit would do). -m1 says "revert to the first parent, aka the one that git squash preserves.

git revert really just resets the checked out tree to a specific tree hash, and prepares a pretty commit message. It doesn't really have much to do with the history itself. You could pretty much get the same result by doing (haven't fully tried this out, just a sketch):

git reset --hard ${hashyouwanttoreverto}
git commit -m "Revert XXX"
Enter fullscreen mode Exit fullscreen mode
Collapse
 
aalvarado profile image
aalvarado

How easy it is to do git reverts in unsquashed commit histories?

Collapse
 
wesen profile image
Manuel Odendahl

just as easy as a revert of a squashed commit. Pass in git revert -m1 commit, it will then use the "filesystem state" of the left parent node as the revert result, just as it would if you had a single squash commit.

Collapse
 
aalvarado profile image
aalvarado

Cool, I think that most people don't use merge + squash and it's something that with the --first-parent option doesn't really matter now if they do or not.

Collapse
 
stewartjarod profile image
Jarod Stewart • Edited

I think you assume that people use Git to a high standard and they commit with more intent always... often times in practice a PR/MR is filled with commits like wip, wtf is this?, fix, wip again. I highly encourage people to Squash & Merge those PRs ;d

If you are being diligent and maintain clean git history or cleanup your commits like a good lil developer, by all means, rebase those commits to main ;d

Collapse
 
wesen profile image
Manuel Odendahl

Is this because you think they don't have the times/skills to clean up their own history, and thus delegate it to merge time?

Collapse
 
stewartjarod profile image
Jarod Stewart

Haven't wanted to press the matter when other things seem more important. Easier to add husky and semantic commit linting with pre-commit hooks imo ;d

Collapse
 
tzwel profile image
tzwel

awesome article, at just the beggining it was obvious you know what you are doing, keep it up

Collapse
 
wesen profile image
Manuel Odendahl • Edited

If you have a good reason to squash commit, please post it here. But I don't think you have.

Collapse
 
ecyrbe profile image
ecyrbe

I have some pretty good reasons to squash my commits when working with a team:

  • Evoid rebase nightmare where i will need to fix the same conflicts for each commits.
  • Evoid revert nightmare when reverting a faulty merges
  • Remove easilly uneeded git History (essentially bug hunting commits, trial and errror commits). And don't tell anyone to not commit unfinished work. Git was created to allow and encourage it.
  • Your developpement process has nothing (brunch of test red, impl test green, refactor) to do with your product History. What matters is the feature History. And that's what i want to see in the product git History.
Collapse
 
wesen profile image
Manuel Odendahl

There's a couple of tricks to have an easier time rebasing:

  • You can avoid "rebase nightmare" by using git rerere. It records how you resolve conflicts and allows you to replay it.
  • For "reverting", you just need to checkout the tree to the state that you want to revert to, and make an appropriate commit message.

I always think "tree state" first, more so than caring about individual commits. I can always link up the graph by manually putting in a parent link when merging, if I do want to show what happened in the history.

Realizing that git only cares about file contents, not diffs or commit or patches, really freed up how I can navigate "complicated" issues.

For the last two points you raise, my approach is to use --first-parent or similar flags to just look at the part of the history i care about (usually, one commit per ticket on the main branch) and link it up to product features (the ticket themselves). No need to squash.

Collapse
 
jlamoreaux profile image
Jordan Lamoreaux

The thing is that my opinion is the correct one.

Well said. I will be using this. 😅

Collapse
 
missrahee profile image
Adam Misrahi • Edited

Absolutely true that merge is the heart of git and squashing is a kind of perversion. It just so happens that it's exactly the right kind of perverted for some team workflows.

In my company we have a pretty strict rule about squashing your commits on your private branch into a neat, minimal history before you merge your PR. This makes sense for keeping shared history manageable but causes many problems.

  1. Each time the author thinks they're about to get approved, they squash and then get more requests for revisions. I don't need to tell you what a mess force pushing on an active PR can get you into, particularly if another contributor suddenly adds a commit.
  2. Right now I have a branch where I made a commit, then merged trunk, then added a tiny commit for a quick fix. Squashing that commit is kind of tricky because if I type git rebase -i HEAD~2 I don't get the last 2 commits, I get all the commits that were merged in. That's one of the reason our company's policy is to rebase on trunk, not merge it in. You can see how far we're getting from the promised land of merge-only purity here, and toward the greater pain of fixing merge conflicts with rebase.
  3. Rewriting history is tricky at the best of times. It's much easier to make a mistake than handling conflicts after a merge. It's stressful and bugs frequently result.

And there's also a much more important problem. With every PR, the last commit has passed through an extensive CI pipeline of checks. Ensuring every commit in the PR passes all checks infeasible. As a result, all those other commits must be assumed to be broken (especially when they're artificial Frankenstein commits created flyby in interactive rebase). These unsafe commits lie around as hand grenades with loose pins left strewn all around. Release managers must be extremely careful to avoid them when trying to assemble a good release. As a rule of thumb that means ignore everything except merge commits, but what if there's a fast-forward merge? We could get a workaround on that but just why should a large team of variously skilled developers be exposed to a history that is mostly made of unsafe commits?

A solution

  1. Ban force pushing everywhere
  2. Make squash-merge the only allowable place to rewrite history and encourage its use.

Outcome

  • Destructive and hard to manage bullshit during PR authorship ends
  • Trunk becomes a safe place with well-rounded commits to cherry-pick or revert without racking brains about
  • The PR is bolstered in a role as the smallest atom of meaningful change, which is the way most organisations are aiming to use it. This helps encourage the practice of keeping them small.

Drawbacks

  • git blame is less helpful for getting granular detail. If we were using git the way its free software advocates intended, as a complete and self-contained source of truth for codebase history, then this would be a dealbreaker. But that's not how we're using it. We have a self-hosted Gitlab and the PRs with all their attached comments and CI runs are a far richer source of historical information than the git repo alone. When I need to understand why something was done that's always where I first look. The purism of avoiding vendor lock-in is nice, but I've never seen it amount to something for repo hosting.
  • Banning all rebasing and squashing everywhere would go with the grain even better and allow lots of graph traversing techniques you mention. But I haven't yet worked somewhere that accepted the noisy work-in-progress git history that would usually entail.
  • You have to be strict about your branching workflow. Feature branch commits and squashed commits don't mix. If you want to base a PR not on trunk but on some other PR, then be prepared for that second PR to list a lot of nonsense commits that will confuse a reviewer and you'll have to delete all trace of before you merge in the resulting squashed message.

Conclusion

I agree with these people that the benefit far outweighs the gotchas in practice.

Collapse
 
manzapanza profile image
Massimo Rangoni • Edited

IMHO the main problem when you do NOT squash your work (generally it's a feature or a fix) is that, sometimes or often (it depends on how many devs are working on the same repo and how disciplined they are) you have to fix more merge/rebase conflicts.

Let's say you send a PR with 5 commits, and you have made changes on a same line in 3 of those 5 commits. And after that you have to rebase your PR (because other PRs were accepted before yours) and unluckily you got a conflict on that line... you will have to fix 3 times a merge conflict. but if you squash your work you have to fix just 1 merge conflict. So all those little code refactoring made during the development of the same feature/fix could multiply these problems exponentially.

Another aspect is that all those sub-feature/fix commits make sense just for the author, but for the other devs generally not.

But what I'm saying depends on some assumptions I've made, for example if you're working alone in a repository these issues will probably never happen.

Anyway, although I don't agree with you, I liked the article for the discussion it generated.

Thank you!

Collapse
 
ah profile image
Adrian H

For any developers who actually code professionally in a big team, this is horrible advice, nobody has time to care about your commit history it's just noise. People don't spend time investigating where this original issue happened in which sub-commit of which branch, blah blah, you are busy finding solutions and moving on to the next task.

Pedantic un-pragmatic advice, ignore this article and start coding, you are not an academic or a historian.

Collapse
 
masonharper profile image
Mason Marper

Awesome post

Collapse
 
kelseyjj profile image
Kelsey Jones

Awesome article.

Collapse
 
andresbecker profile image
Andres Becker

Great post!

Collapse
 
rzs401 profile image
Richard Smith

nice stuff

Collapse
 
aaravrrrrrr profile image
Aarav Reddy

Really helpful.

Collapse
 
snelson1 profile image
Sophia Nelson

Thanks

Collapse
 
istibekesi profile image
Békési István

“Squashing commits has no purpose other than losing information.”
Cleaning your house has no purpose other than losing dirt. Pretty neat tho…