Skip to main content
Uncategorized

git-filter-repo: Remove Sensitive Information from Git History with

By February 23, 2024March 19th, 2024No Comments

In modern software development, it’s crucial to maintain data privacy and security. However, inadvertently, sensitive information might find its way into your Git history. That’s where git-filter-repo comes in, a powerful tool designed to scrub sensitive data from your Git history. Whether it’s accidentally committing passwords, private keys, or other confidential information, git-filter-repo is your solution to maintaining a secure codebase. In this article you will find a step-by-step guide in how to use this command without messing up your git based project!

Let’s see all the steps with a mock example

For now, let’s say we have a project folder named myfolder with a lot of fantasy. First, we initialize the folder using the git init command.

Now let’s say we add a mypassword.txt file with some very secret information:

Now, we did a huge mistake and we actually told git to track this file ad we commited it. Thing got worse, we also added some other confidential information and commited the file again:

How can we now hide the “password” field in the git history?

git-filter-repo: our Savior

git-filter-repo is a powerful and flexible tool used to rewrite the history of a Git repository. It is specifically designed for tasks like removing sensitive information from a project’s history, simplifying a repository, or splitting a large repository into smaller, more manageable pieces. With git-filter-repo, you can seamlessly restructure and clean up your repository without losing valuable information or metadata.

How to download it

You can easily download this tool by using python. It is enough to type in your powerShell or command prompt (I’m assuming a windows distribution for you system):

pip install git-filter-repo

How to use it

In this blog post, we’ll be focusing on a specific application of git-filter-repo – namely, removing sensitive information from a Git repository’s history. However, it’s important to note that git-filter-repo is a powerful and versatile tool with many more capabilities. While we’ll be diving into one use case here, there’s much more to explore when it comes to this Python library, you can search for them in the documentation.

Let’s do not lose the focus yet. We want to mask the two lines of code with passwords data. We can do this in a bunch of ways by specifying in an external file all the occurence we want to hide or remove. The external file sould be similar to this:

git-filter-repo syntax for replacement

With this syntax we are telling git-filter-repo to look for every line containing the word password and to replace it with ***REMOVED***. We could have specify another word to replace the line with as follows:

Then we need to type in the shell or prompt where expression.txt should be replaced by the path to the file where you specified what to mask:

 git filter-repo --replace-text expressions.txt

Very likely you will run in this error:

Aborting: Refusing to destructively overwrite repo history since
this does not look like a fresh clone.
  (expected at most one entry in the reflog for HEAD)
Please operate on a fresh clone instead.  If you want to proceed
anyway, use --force.

Since this kind of actions is not something you should do light-heartedly, git try to prevent applying changes in the original folder. It is still possible to force rewriting the git history in the original repo adding a --force flag:

git filter-repo --replace-text expressions.txt --force

Afterwards you should see a message similar to this:

Parsed 2 commits
New history written in 0.06 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 09fa7dc SECOND WRONG COMMIT mypassword
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 20 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Completely finished after 0.24 seconds.

The result

If everything worked fine, now the git history is changed. You can see this looking at the latest version of your files since they are affected as well. In the previous example, if you try to open the mypassword.txt file, you will see this:

git-filter-repo results

You can also use regex to filter your document and files (again we link the documentation here).

BE CAREFUL

If you have some changes which have not been commited, very probably they will be purged. Moreover, if you need to rewrite the git history in a remote repo (such as in GitHub) it should be sufficient to push the latest commit after cleaning the history.

And do not forget to gitignore the files git should not track ????.

Auteur

  • Marco Scatassi

    I’m a 24-year-old Analytics Engineer trainee at the Nimbus Intelligence Academy! I’ve always been passionate about numbers and patterns, and this led me to pursue a Bachelor’s degree in Mathematics in Bologna followed by a Master’s degree in Data Science at Milano-Bicocca (which I’m about to conclude). Those experiences have been further enriched by a research traineeship experience in Norway. Nature enthusiastic, I love outdoor activities such as hiking, skiing, and playing beach volleyball as well as more relaxing ones as reading and playing guitar.

Marco Scatassi

I’m a 24-year-old Analytics Engineer trainee at the Nimbus Intelligence Academy! I’ve always been passionate about numbers and patterns, and this led me to pursue a Bachelor’s degree in Mathematics in Bologna followed by a Master’s degree in Data Science at Milano-Bicocca (which I’m about to conclude). Those experiences have been further enriched by a research traineeship experience in Norway. Nature enthusiastic, I love outdoor activities such as hiking, skiing, and playing beach volleyball as well as more relaxing ones as reading and playing guitar.

Leave a Reply