Programming can be frustrating. Endless debugging, cryptic error messages, poor documentation—there's a lot of things that will make you collapse into tears, or stomp your feet, or pull out your hair...
...or say, "Fuck!"
When and why do programmer drop the F-bomb, though? Thanks to the public GitHub dataset on BigQuery, which includes hundreds of millions of commits and their corresponding messages, we can try to answer that question. The majority of commit messages are bland or useless, as Ramiro Gomez demonstrated with his analysis of the same dataset last year. But occasionally programmers take the opportunity to express themselves when recording their changes, and sometimes they use profanity for emphasis.
Of the 183 million commit messages in the dataset, about 33,000 contain some version of the word "fuck," or about one every five or six thousand commit messages (fewer than the 48,000 Sergey Abakumoff found in his F-bomb analysis when also including git subject lines). More interesting than pure totals are what exactly the programmer thinks is worth cursing about in the first place. This SQL query selects the word directly following the F-word in the commit message:
SELECT next_word, count(next_word) as n FROM ( SELECT commit, LOWER(REGEXP_EXTRACT(FIRST(message),r'(?:\w*fuck\S*\s)(\w+)')) AS next_word FROM [bigquery-public-data:github_repos.commits] WHERE REGEXP_MATCH(message, r'\w*fuck\S*\s\w+') GROUP BY commit) GROUP BY next_word ORDER BY n DESC LIMIT 100;
Citing actual commit messages from the dataset makes reviewing these results a little more fun, so examples will be included in parentheses.
The runaway winner is "up," which you might use when taking blame (i fucked up something by moving ui.js to lib/), doling it out to colleagues (fixed fuck up by peter), family members (Fixed the fuck up by dad), or even insentient beings (The bot now doesn't fuck up the silent_review greeting)
Also popular were general words to complete phrases of exasperation like "it" (fuck it we are going to use a subprocess), "this" (fuck this, the statemanager is way too complicated in an async environment, getting rid of it), "that" (csv parser? fuck that: regex), and "off" (fuck off firefox noone likes you).
The very process of Git came up regularly via terms like "commit," "merge," "git," and "github." Certain languages often drew users' ire, like "php" (fuck this fucking php shit: convert bools into integers because fuck you thats why) "c++" (initialiser_list can be coerced to primitive types? what the fuck c++, what the actual, legitimate fuck) and "css" (fucking css modified).
And so did widely used services like "heroku"(fuck heroku port) and "travis"(fucking travis tests are failing, works like a charm locally, fuck this).
Of course, we shouldn't dwell on only the negative. Users celebrated successes with "yes" (fuck yes, finally figured out how to do the configuration stuff) and "yeah" (fuck yeah centered login/register correctly), as well as "fix" (Finally fucking fixed bootstrap carousel bug, was caused by smooth scrolling JS code) and "works" (it fucking works. Inefficient as santa on drugs but it works).
In that sense, our results accurately portray the programming experience: plenty of frustration, but also some worthwhile victories if you stick to it.