For this kind of work, I suggest you to learn about Maven which deals with dependency compatibility (among a lot of other things).
With Maven, you could for example use one of these two ways:
Either the "standard" way: by creating a Nexus repository which you will feed with a compiled jar of each version of your common package and to fully exploit maven/nexus the version(s) of JDBC you use. Each time you build your project from any commit, the versioned config file will be read and will use the good revision of your library "common".
(I say Nexus, but there are others software doing the same thing, I've also heard of one named Artifactory).
Or, you could make a more complicated script which, when you build your server or client project, go download the source of your library on your "common" repository and build it.
Maven may seems a bit overkill and is (in my opinion) a pain to learn alone, but it's handy, powerful, and is the professional standard in Java and works seamlessly with Eclipse.
As for the Maven usage, the Nexus way is (depending on your scripting skill) a bit heavier to init, but more standard, easy to scale with your project, and can be used for all your other future projects.
If you're working in a professional context, I strongly advise to look for a colleague knowing it to help you mavenize your project.
Wow, that's a long question (and a complex problem). I'll try to have a
go at it.
I'm not sure I understand why each user has to have a full local
history when using git?
This is a central design decision with git. For the exact reasons you'd
need to ask the author (Linus Torvalds), but as far as I know, the main
reason is speed: Having everything local (on a fast disk or even cached
in RAM) makes operations on history much faster by avoiding network
access.
The biggest reason for the large repository size is that there are a
lot of binary documents being inputs to tests. These files vary
between .5mb and 30mb, and there are hundreds. They also have quite a
lot of changes.
That is the point I would think about first. Having so many constantly
changing binary files in source control seems problematic to me (even
with SVN). Can't you use a different approach? Ideas:
Unlike source code, a 3 MB binary file is probably not written by
hand. If some tool/process generates it, consider integrating that
into your build, instead of storing the data.
If that is not practical, binary files are typically better off in an
artifact repository (such as Artifactory for Maven & co.). Maybe that
is an option for you.
I have looked at submodules, git-annex etc, but having
the tests in a submodule feels wrong, as does having annex for many
files for which you want full history.
Actually, this looks like git-annex would fit perfectly. git-annex
basically allows you to store file contents outside a git repository
(the repository contains a placeholder instead). You can store the file
contents in a variety of ways (central git repo, shared drive, cloud storage...), and you can control which contents you want to have locally.
Did you maybe misunderstand how git-annex works? git-annex does store
full history for all the files it manages - it just lets you choose
which file contents you want to have locally.
Finally, about your questions:
What is a good best practice for using git with large repos containing
many binary files that you do want history for?
In my experience, the options usually are:
- avoid the need for binaries in the repo (generate them on demand,
store them elsewhere)
- use git-annex (or a similar solution, such as Git LFS)
- live with a big repo (not all git operations are affected by big
files, and if you have a fast computer and drive, it can be quite
workable)
Is shallow cloning usable as a normal mode of operation or is it a
"hack"?
That might be doable; however, I don't think this will solve your
problem:
- you'd lose of git's benefits that come from having full history, such
as quick searching of the history
- merges can become tricky, because AKAIK you must have at least the
history back to the branch point to merge
- users would need to re-clone periodically to keep the size of their
clone small
- it's just an uncommon way of using git, so you'd likely run into
problems with many tools
How big is "too big" for a git repository (on premises)? Should we
avoid switching if we can get it down to 4GB? 2GB?
That depends on the structure of the repo (few/many files etc.), on what
you want to do, on how beefy your computers are, and on your patience
:-).
To give you a quick idea: On my (newish, but low-spec) laptop,
committing a 500 MB file takes 30-60s. Just listing history (git log
etc.) is not affected by big files; things like "git log -S" which must
scan file content are very slow - however, the speed is mainly dominated
by I/O, so it's not really git's fault.
On a 3 GB repo with a handful of revisions, "git log -S" takes about a
minute.
So I'd say a couple of GB is ok, though not ideal. More than 10-20 GB is
probably pushing it, but it might be doable - you'd have to try it.
Best Answer
You can use git-lfs or similar tools (git-fat, git-annex, etc.). Those tools basically replace the binary files in your repo with small text file with hashes, and store the actual binary data in a non-git way - like a network share.
Makes diffs and everything superfast as only hashes get compared, and is - at least for git-lfs - transparent to the user (after installing once).
Afaik git-lfs is supported by github, gitlab, VisualStudio, and is open source.