Git – How to handle complete project rewrite in git

gitversion control

I'm developing a software library that I began during my PhD and used in my thesis. I've since started a research position and, as an experiment, I wanted to try rewriting parts to see if I could simplify it. This turned out to be way more successful than I expected. I rewrote years of work in a matter of weeks, and the end result is much cleaner than what I had before.

Now I want to make my rewrite public, but I'm not sure what to do about the revision history. I reference specific commits in the old version in my thesis, so I'd like to have those preserved for posterity (and so I can laugh at my junk code later.) I can think of 3 options:

  1. Delete all the old code in one commit. Extract all the commits from the rewrite as patches and apply them one-by-one to the old repository.
  2. Make a tag branch for the old code, say my-lib-0.0.1, explaining that it was a nascent version of the code from when I was young and foolish. Add the rewrite as a remote for the old code, then git reset --hard rewrite/master. This will replace the revision history of the old code with that of the rewrite.
  3. Explain in the README for the old repo that it's deprecated, and keep the rewrite in an entirely new repo.

I don't like option 3, as I'd rather have everything in the same repository. Additionally, there is still a fair amount of shared code between the old and new versions. Option 2 has the virtue of keeping the revision history looking fairly clean (users won't have to clone 400+ commits of garbage from when I didn't know what I was doing) but keeping the history there in the unlikely event that someone does want to see it. On the other hand, I don't like the idea of "rewriting history", and to avoid cloning garbage commits I can always tell people to do shallow clones.

Which of these options is the least bad? Are there others that I haven't considered?

Best Answer

This is very typical situation in software development.

In my view, you should first tag the final version of your old code (no need to branch as you can always branch from the tag if needed). Also, the rewriting takes place incrementally and this means you commit changes gradually. When you reach your new version you tag again and so on. In every release (tagging) you include a release notes document.

What you may come across, though, are code braking changes. In this case, you communicate them by altering the version number and the release notes.

Related Topic