DEV Community

Katherine Kelly
Katherine Kelly

Posted on • Updated on

Changing Your Repo's Language in GitHub

I recently organized my pinned repositories on GitHub and noticed that the language shown for one of my repositories didn't quite seem right. It indicated HTML but I was expecting JavaScript because it was a vanilla JavaScript frontend and there were more lines of JavaScript code than HTML.

gif

To really set the scene, here's a screenshot of my pinned repos with the incorrectly labeled repo (IMO) in question:
repo-html

I did some digging to figure out how GitHub determines the language for the repository as well as looking at how I can change the language shown.

GitHub and the Linguist Library

GitHub indicates it uses the open source Linguist Library to determine the file language for syntax highlighting and repository statistics.

Once you push changes to a repository on GitHub, the Linguist does its thing with a low-priority background job that will go through all of the files to determine the language of each file. Some things to note:

  • all of the languages it knows about are listed in languages.yml
  • excluded files include binary data, vendored code, generates code, documentation, files with either data (ie SQL) or prose (ie Markdown) languages, and explicit language overrides.

To determine the language for each remaining file, the Linguist employs the seven strategies listed below, done in the same order. Each step will either identify the exact language or will reduce the number of possible languages that get passed down to the next strategy.

  • Vim or Emacs modeline
  • commonly used filename
  • shell shebang
  • file extension
  • XML header
  • heuristics
  • naïve Bayesian classification

The results are then used to produce the language stats bar that shows the languages and its respective percentages that make up the repository. The percentage is determined by the bytes of code for each language as indicated by the List Languages API. The language shown for all of my pinned repos up top is the majority language.

Also, I was today years old when I found about the language stats bar. If you’re wondering where it is, it’s the colorful bar up at the top of your repository just under the commits/branches/etc. bar. Those colors indicate the languages that make up your repo, and click on it to get the full breakdown. 🤯

language stats bar

Changing the Repo Language Shown

Now that we know the background of how GitHub determines the repository language, I’ll show you how to change the language shown using gitattributes.

  1. Create a .gitattributes file in your repo at the top-level
  2. Edit the file and add the below line, subbing in the language(s) you want ignored denoted by its file extension before linguist-detectable=false. Since I want HTML ignored, I’ve included HTML below.

    *.html linguist-detectable=false
    
  3. Add, commit, and push the changes

And voila, the language is changed to JavaScript!

repo javascript

Resources
About Repository Languages
Linguist
How Do I Change the Category?

Discussion (4)

Collapse
raphink profile image
Raphaël Pinson

It looks like this was published a bit too early. I'd like to know how to do it though!

Collapse
katkelly profile image
Katherine Kelly Author

Hi, thanks for pointing that out! There was an issue with dev.to showing an old draft since the whole post was there when I initially published it. But it’s been updated so check it out!

Collapse
raphink profile image
Raphaël Pinson

Ah I've had this kind of issues, too. Very frustrating!

Collapse
httpanand profile image
Anand

Tysm !!!