How to transparently cache git clone

cachecontinuous integrationhttphttpssocks

I would like to offer a continuous integration service (I'm planning to use hudson, but the solution should work for others as well) with a web interface where a user will define a SCM URL (e.g. a git URL) and the workspace/source root which is used for building should be cleaned (at least optionally) before building. This requires are lot of repeated checkouts which I would like to cache (i.e. make them be read from local storage instead of being fetched from a remote resource).

Different SCMs (git, svn and mercurial/hg) use different protocols (HTTP, HTTPS, git, etc.), some of them can be cached (HTTP), others generally not (HTTPS without using a man-in-the-middle which is inacceptable for a trustworthy service imo – which I want to provide) or specifically not (I didn't find any git protocol cache servers).

Caching HTTP isn't a problem, but few git hoster support it or redirect to HTTPS. I would like to support one protocol which reliably caches checkouts and suggest the user to use it.

Redirection via a SOCKS proxy can be achieved for HTTP and git protocol, but that doesn't allow caching. Other protocols like IGD can't be used for caching neither.

Best Answer

Indeed the problem that you speak of exists and makes up for a lot of questions, like "for how long should I cache an answer? What if I have two projects whose commits rates are very different?". There are some proprietary solutions which do what you are looking form i.e. if you use Atlassian Stash, it has a built-in plugin which manages checkout answers caching in order to lower the load on the server.

The best solution anyway is different from what you want to do. The best and recommended solution is to use post-commit hooks, they exist in git, svn and I think in other vcs as well. Just have your repository trigger the build on your CI system, rather than the CI jobs polling continuosly. As you mentioned Jenkins (Hudson), the Git plugin for example already provides urls to perform this kind of activity.