Removing Accidentally Committed Files From Remote History

#git #beginners #learningmoment #showdev

Seems like a rite of passage to accidentally push some api keys or node_modules or sensitive data into a public repo. Then in a panic I git rm -r the file and commit that next (...but that simply unstages it) and in a flight of terror, force push that edited and rebased history. But a look through my github commit history will show in diff, and the incriminating files remain firmly implanted in my remote repo.

I've also become somewhat paranoid about my hyperactive git history. Pushing a commit or three to fix every little thing seems like a bad look now since I can't claim to be fresh out of bootcamp :/

What I learned after resetting and reverting back and forth for half an hour:

Selecting drop in a list of previous commits during a rebase (i.e. git rebase -i HEAD~n) doesn't remove old commits if they've already been pushed to a remote repo!
Recursively removing a file with git rm -r doesn't remove it from my local filesystem. It removes the file from the working directory history.
git rm --cached <filename> doesn't remove a file from the working tree. It simply unstages the file and a subsequent commit ensures it is untracked by git. It removes the paths from file to the index. RELATED: Stop tracking and start ignoring by svijakoushik.
git reset HEAD <filename> unstages a file.
git revert HEAD <filename> has no effect on a file level.
git reset --hard <commitHash> discards your local commits coming after a specified commit hash, but only if it hasn't been pushed yet.
If it's already been pushed, git revert HEAD~n or git revert will create a new commit, undo existing commits, and compare and apply changes to the project in the new commit. -After all of the above, selecting squash in a list of previous commits doesn't remove commits with old files/folders in them.

AHHHHHH!

At this point, if it's still fairly new, I delete the remote repo, make a new one on github, then git remote add origin <new github ssh> with my "rewritten" git history. (The how-to is below)

"Treat every commit like you're telling a story." - Former manager.

Oh former-manager-senior-dev, I miss thee. We had many a great pairing sessions. The cool thing about you was that you'd show me the way you do it without giving disclaimers that might clog my mind and support me when I learned where I screwed up. I'm not being sarcastic by the way.

After poking around StackOverflow, I came across this user's answer on using git filter-branch. This is a condensed explanation of their amazing commands:

git filter-branch --index-filter "git rm --cached -f -r --ignore-unmatch filenameOrFolderName" --tag-name-filter cat -- --all

--index-filter runs against the index at each step in the history. The command git rm -r --cached -f --ignore-unmatch <filename> deletes the file when it is present, and recursively.

git update-ref -d refs/original/refs/heads/master
Some googling tells me that this command deletes the reference heads in the original master branch, and updates the reference to the value of the top reflog entry.
git reflog expire --expire=now --all
"Reference logs", or reflogs, record when the tips of branches and other references were updated in your local repo. In tandem with expire and --all, all older and current entries to reflog will be removed.
git gc --prune=now
Apparently gc is like mass compression and garbage picking. It makes the whole git history smaller in size.
git push
Push this back up to the remote repository!

Please feel free to correct me if I've interpreted these commands incorrectly. I usually try anything til it works and this one seems to have worked best for me.

I'd also love to hear about how you remove unwanted info that's previously committed locally or remotely!

<3 git <3

Sources:
Removing (un-pushed) Files from a Repository's History | Github
Removing sensitive data from a repository | Github
Delete Commits from A Branch in Git
Resetting, checking out & reverting | Atlassian Bitbucket
Git | git-gc