Optimize Git Repo – Handling Large Binary Files

binarygit

Our project is about 11GB, 10 of which are binary data (.png images). Consequently, a git diff or git status operations take up more than a minute. Fortunately all data files are separated into a folder with the wonderful name data. The assignment is "Avoid compressing, diffing and other costly operations on binary files."

  • It was considered splitting the project into two repos. Then data would be an external repo, that is checked out by the main source code repo. It was decided that the overhead of keeping the repos in sync would be too much, especially for the artists, who work with the data files.

  • Explicitly telling git those files are binary, excluding files from diffs were considered, but those seem like only a partial solution to the question.

I feel that git attributes is the solution, but how? Or is there a better architecture than a monolithic repo?

Best Answer

You can use git-lfs or similar tools (git-fat, git-annex, etc.). Those tools basically replace the binary files in your repo with small text file with hashes, and store the actual binary data in a non-git way - like a network share.

Makes diffs and everything superfast as only hashes get compared, and is - at least for git-lfs - transparent to the user (after installing once).

Afaik git-lfs is supported by github, gitlab, VisualStudio, and is open source.

Related Topic