Version Control – Monorepo vs Multi Repo for Large Projects

deliverysource codeversion control

I am looking into options for smoothing out our deliver and release pipeline, and would appreciate some advice on the best way to structure the source code.

This is a pretty large project, which consists of about 30 microservices. Microservices here is a very loose term, but for the sake of the question is it close enough. Currently all of this is stored in one repo.

The internal workflow is structured as follows:

1 month sprints where all code is committed to a dev branch.
At the end of the month all tasks that should be moved into testing are merged into the «next major release» branch and all tasks which are critical/bugs are merged into the «active delivery» branch and tested before being selectively rolled out to customers.

Not all customers want the latest and greatest, as they want as few changes as possible to mitigate risk associated with change.

So at the end of the month we have the following:
1 up to date dev branch
1 partly up to date test/next delivery branch
1 production branch

and this is then rolled out to customers like this:
Customer A
– Installed production branched version 11
* overvridden microservice 3 from production branch 11.4
* overvridden microservice 5 from production branch 11.1
* …

Customer B
– Installed production branched version 11.6
* overvridden microservice 7 from production branch 11.14
* overvridden microservice 15 from production branch 11.7

Customer C
– Installed production branched version 11.3

Would changing to a multi repo setup better cater to our clearly insane delivery pipeline?
I am considering one repo for each service, and then one repo for each customer.

Code repo:
-MicroService 1
-MicroService 2
-MicroService 3

Customer repos:
-Link to MicroService 1 – version 1.5
-Link to MicroService 2 – version 2.5
-Link to MicroService 3 – version 1.7
-Customer specific data files (they are not under version control at all now…)

I think this would greatly improve our workflow, and move us back to the sanety of just having one branch. It should give us better control of what is actually released to an customer, gives us the ability to actually reproduce their enviroment and most importantly it should remove the need for merge day once a month.

I would greatly appreachiate some feedback on this.

Edit:
The api is pretty stable between services, but the shared libs are a bit of a mixed bag. The product roughly 25 years old, so the core is stable, but as you can image in great need of some clean up as technical dept slowly has been building up.

I basically have two issues which I am trying to solve/improve

Merge day is hell, the production branch can be up to a year older then the dev branch, with only a subset of what is in the dev branched merge into it. If a new feature is added in dev, which should not be in the current version, this is not moved to production. All fixes in dev which uses code from this will then cause huge issues if they need to be moved. My idea is that having multiple smaller repos will make this easier.
Say there is a bug in a server for getting customer data. The bug is in some shared lib and I fix it, build the CustDataServer and all other servers which I THINK might be influenced by this, then deploy then on the customer/customers system. This means that next time someone does a bug fix, which happens to use my last bug fix, then they basically unintentionally release that into production without meaning to. Most of the time this is fine, but I feel like we have no control over what is actually running in production. ServerA is build with SharedLibA and SharedLibB, while ServerB is build with the same shared libs, but other versions. I am under impression that a multi repo approach would at least give more control in this case.

I am open for all kinds of ideas on how to clean up this mess.

Edit:
I appreciate all the though out responsens so far. I agree that selecting repo style is not the issue here.

Your summary is pretty spot on, appart from the layout repos. That was something I was considering to introduce. Currently client spesific files (db and config files mostly) are stored on the client server and are not under any form of source control.

Regarding the release cycle, I agree with you that releasing more often is the best idea. Also, dont do partial releases. All or nothing. The problem is that the release cycle is outside of my control. I can influence many things, but only things which are within the company. Externaly it is hard to make changes. The reason for the partial and infreqent releases is to mittigate the risk for clients. The idea is that the less that is changes, the smaller is the chance of new bugs. The software is very critical, and bugs could litteraly cost millions per day in damages.

What if we tried doing this the other way, I if I describe the system as something to be created with the contraints that exist, what would be the ideal way to structure it?

Large code base which is logically split up in several units, but are
(tightly) coupled
Yearly release of new features.
Features are a mix between local changes and system wide new features
Bugs need to be fixed asap, if critical within hours.
Customers to not want full releases, they want as little change as
possible. In fact, they often dont want the yearly release and drag
it out as much as possible
Critical software, bug are expensive

What would be the best way to structure the internal workflow to limit the number of pain points?

Best Answer

It seems to me that one vs many repos is not the root of your problem.

It's your branching strategy.

If you can, switch to a recognised branching strategy such as gitflow.
If you are writing a bug fix for a problem in production. Make a (hotfix) branch from production. Not your develop or next release branch.
If your 'next release' branch hasn't been merged for a year. Don't even attempt to merge it. Just let it become the new 'production' branch.
Stop cherry picking merges. Merge a whole branch or not at all. If your various branches have diverged as much as you imply then you probably need to write that bug fix separately for each branch.

In regards to customers deploying different versions of the micro-services. This should not be a problem and it certainly shouldn't be related to your source control.

Save your versioned deployments separately from your source control. Even if you don't compile to binaries, you need to distinguish between the source code and the product.

Don't deploy by checking out code from source control. Make a zip file with the software + config and a deployment script of some kind. Imagine you have to post it on a cd.

I would expect live environments to have multiple versions deployed to enable zero downtime deployments and the like.

As long as each version is tested and you state which other versions it is compatible with this should not be an issue.

Decide that you don't care about these warnings

That is, if you're happy with the fact that:

git checkout <ref> will check out refs/heads/<ref> over refs/tags/<ref> (see git-checkout)
other commands will use refs/tags/<ref> over refs/heads/<ref> (see gitrevisions)

For example, in this test repository, the v1.5.2 branch points to commit B, but the v1.5.2 tag points to commit A.

% git log --oneline --decorate
8060f6f (HEAD, v1.5.2, master) commit B
0e69483 (tag: v1.5.2) commit A

git checkout prefers branch names:

% git checkout v1.5.2
warning: refname 'v1.5.2' is ambiguous.
Switched to branch 'v1.5.2'
% git log --decorate --oneline -1
8060f6f (HEAD, v1.5.2, master) commit B

but git log will use the tag name:

% git log --decorate --oneline -1 v1.5.2
warning: refname 'v1.5.2' is ambiguous.
0e69483 (tag: v1.5.2) commit A

This could be confusing.

Train people to delete their local branches when they see a new tag

This might be hard/awkward depending on the size of your organisation.

Write a wrapper around "git pull" and "git fetch"

That is, write a wrapper that checks if there are any tags that shadow branch names, and warn about (or delete) those branches. This sounds painful, and it could be undesirable if the shadowed branch is currently checked out.

Unfortunately, it sounds like the easiest way to solve this problem might be to change the way you name your branches. The link you posted uses different naming schemes for tags and branches: if you're already mostly following that method, adopting its naming scheme might be the easiest solution.

Best Answer

Related Solutions

Branching Strategy for Test Environment

Git Version Control – Is Creating a Tag with a Deleted Branch Name a Bad Idea?

Decide that you don't care about these warnings

Train people to delete their local branches when they see a new tag

Write a wrapper around "git pull" and "git fetch"

Related Topic