loading...

Auto-generating a commit message

michaelcurrin profile image Mike ・4 min read

Asking for help and ideas on my new project.

TL;DR

I am tired of writing commit messages which are bigger than the code changes and which could be programmatically and precisely generated. I am writing my own Python script as a pre-commit message hook. Does anyone know of an existing tool or script to do this?

Aim

I started a new personal project recently to automagically create a description commit message for me in one line.

I'll still write manual messages for important changes, but 80% of my commits are so small and frequent that writing a message feels tedious and adds friction. And the message that I end up writing by hand is usually something can be be written by a machine - but I usually simplify by using the filename rather than path or I don't bother saying how many files changed because it's too much effort.

Research on existing tools

There are plenty of tools for validating or linting commit messages.

I have found how to prefix a message with branch name - see gist.

If you want to manually configure a value like Jira ticket number in a text file in your repo, you can use that to prefill the commit message - see Commit Message Template guide.

I've even found efforts to generate a commit message using machine learning.

Though, these are based on more the content of the file diffs and the context of the code, rather than describing which files changed.

So I am having a hard time auto-generating a new message which captures the changes in one sentence in a high-level way that gives enough detail.

Git hooks

Git provides "hooks" which are run on certain conditions. There are sample files in .git/hooks/ when you setup a repo and you can rename one to remove .sample extension to make it active. Note - this is not in version control, so you'll need to do more work to persist it and activate it on a fresh clone.

Check these resources:

There is a sample git hook file which looks like this:

It provides logic for adding a multi-line list of files changed - see the commented perl section which removes the # comment symbol from part of the diff message.

Example output is something like this:

My message

added:  README.md
modified: foo/baz.md
modified: foo/buzz.md
deleted: fizz.txt

This looks like the right mechanism to use, but the output is too long.

What existing solutions can solve my need?

I want something more succinct (one line) and worded in a way a human would write.

Are there any other tools out there that do this already? Can someone point me in the write direction or share what they made or found?

Especially if there are 20 files which were modified. Or more commonly for me, one line changed or one character changed in one or two files and explaining is the commit is of low value (and longer than the actual diff).

My project

Planning and ideas so far on my new Auto Commit Msg project.

Here is my repo:

The project is still in early development and probably going to be in Python. So far there's a spec in the Wiki and I have a minimal prototype in Bash which I haven't committed.

Example output of status.

$ git status --short
 M foo/baz.md
 M foo/buzz.md

When I commit with my tool, the commit message will be something like:

Update 2 files in foo/

Or something more complex:

Add 2 files in foo/ and delete 5 files in bar/

If requirements.txt or Gemfile or package.json changes:

chore: Update dependency file

Or Create or Delete.

If _config.yml changes:

chore: Update config

Or

style: Whitespace changes in fizz/buzz/foo.txt

VS Code integration

Another thought - how can I prepare a commit message in VS Code easily?

This is still something I am looking into with VS Code settings (not fruitful) or using an extension, maybe writing my own.

The prepare-commit-msg hook works fine for command-line commits. But how do I make sure my auto message gets added to the message bar in VS Code's Git integration? Should it regenerate every second? Should it generate on stage action (in UI and command-line)? Should it generate a message after you press commit with an empty message and then preview the message in a pop-up?

Semantic commits

On a related idea - I am also working on adding semantic messages to the start of the commit where possible, which I mixed into examples above. Maybe it can controlled through a command or a flag where it's not possible for a script to tell the difference between a fix and a feature, but a chore or doc changes are easier to deduce.

feat: Message
fix: Message
chore: Message
docs: Message

Conclusion

Thanks for reading. I hope I can get some ideas of existing projects to guide my approach, or to build on them.

Posted on by:

michaelcurrin profile

Mike

@michaelcurrin

I'm a self-taught dev focused on websites and Python development. My friends call me the "Data Genie". When I get bored, I find tech to read about, write about and build things with.

Discussion

markdown guide
 

Hi there! Interesting idea, I've been thinking something like that for a long time, but my chosen IDE solved it most of the times for me. I was working with Eclipse and JIRA in the past on Java projects. I could read the issues inside Eclipse, activate one at a time and then when I clicked commit, it auto-filled the message with the subject of the task and the task Id. Found it very useful until it was supported, as I had only one source of task descriptions and they got reflected in everywhere else.

As for your approach, I would think twice how much information it has to read "Updated two files" in the commit message when any decent git client shows you this information when comparing commits. I'd rather connect it to the issue manager to fetch the subjects automatically.

Edit: I like to read the "why" in the commit message, not the "what".

 

Thanks for your feedback.

I found an approach native to git for prefilling a ticket number in a commit message and wrote about it here if you are interested.

My issue is not so much in the viewing, as I can get to the diff or list of files changed after a commit. It's more about the friction of writing.

I get that you prefer the "why", but the case I am solving for is where the why is immaterial or self-evident. For example when I am making doc changes - I am happy to lose the nuance of whether I fixed a link or added an empty line or changed a heading level because it's not code. Or it's a change made to follow a style. Or renaming a file because I like the new name better.

 

"It's more about the friction of writing." - I totally understand why it can be tedious to write meaningful commit messages every time. I admit I'm guilty myself of writing messages like "fixed". By writing "fixed formatting" we get more context than one file changed.

When you will come back to your code sometime later, you will scratch your head, and you will have an extra round every time comparing two versions of the file to capture what happened and why. If you feel the need to reflect something that is already captured by git then just leave the comment empty. That's a bad habit as well, but sometimes you are really just doing things like: "fix: type-o".