Project Management – When to Separate a Project into Multiple Subprojects

gitmercurialorganizationproject-managementversion control

I'd like to know if it makes sense to divide the project I'm working on in two repositories instead of one.

From what I can say:

Frontend will be written in html+js
Backend in .net
The backend doesn't depend on frontend and the frontend doesn't depend on the backend
The frontend will use a restful api implemented in the backend.
The frontend could be hosted on any static http server.

As of now, the repository has this structure:

root:

frontend/*
backend/*

I think it's a mistake to keep both project in the same repository. Since both project do not have dependencies between each others, they should belong in individual repositories and if needed a parent repository that has submodules.

I've been told that it's pointless and that we won't get any benefit from doing that.

Here are some of my arguments:

We have two modules that don't depend between each others.
Having source history of both projects in the long term may complicate things (try searching in the history for something in the frontend while you have half of the commits that are completely unrelated to the bug you're looking for)
Conflict and merging (This shouldn't happen but having someone pushing to the backend will force other developer to pull backend changes to push frontend changes.)
One developer might work only on the backend but will always have to pull the frontend or the other way around.
In the long run, when it will be time to deploy. In some way, the frontend could be deployed to multiple static server while having one backend server. In every case, people will be forced to either clone the whole backend with it or to make custom script to push to all servers the frontend only or to remove the backend. Easier to just push/pull only the frontend or backend than both if only one is needed.
Counter argument (One person might work on both projects), Create a third repo with submodule and develop with it. History is kept seperated in individual modules and you can always create tags where version of backend/frontend do really work together in sync. Having both frontend/backend together in one repo doesn't mean that they will work together. It's just merging both history into one big repo.
Having frontend/backend as submodules will make things easier if you want to add a freelancer to the project. In some case, you don't really want to give full access to the codebase. Having one big module will make things harder if you want to restrict what the "outsiders" can see/edit.
Bug introduction and fixing bug, I inserted a new bug in the frontend. Then someone fix a bug in the backend. With one repository, rolling back before the new bug will also rollback the backend which could make it difficult to fix. I'd have to clone the backend in a different folder to have the backend working while fixing the bug in the frontend… then trying to remerge things up… Having two repository will be painless because moving the HEAD of one repo won't change the other. And testing against different version of backend will be painless.

Can someone give me more arguments to convince them or at least tell me why it is pointless (more complicated) to divide the project in two submodules. The project is new and the codebase is a couple of days old so it's not too soon to fix.

Best Answer

At my company, we use a separate SVN repository for every component of the system. I can tell you that it gets extremely frustrating. Our build process has so many layers of abstraction.

We do this with Java, so we have a heavy build process with javac compilation, JibX binding compilation, XML validation, etc.

For your site, it may not be a big deal if you don't really "build it" (such as vanilla PHP).

Downsides to splitting a product into multiple repositories

Build management - I can't just checkout code, run a self-contained build script and have a runnable / installable / deployable product. I need an external build system that goes out to multiple repos, runs multiple inner build scripts, then assembles the artifacts.
Change tracking - Seeing who changed what, when, and why. If a bug fix in the frontend requires a backend change, there are now 2 divergent paths for me to refer back to later.
Administration - do you really want to double the number of user accounts, password policies, etc. that need to be managed?
Merging - New features are likely to change a lot of code. By splitting your project into multiple repositories, you are multiplying the number of merges needed.
Branch creation - Same deal with branching, to create a branch, you now have to create a branch in each repository.
Tagging - after a successful test of your code, you want to tag a version for release. Now you have multiple tags to create, one in each repository.
Hard to find something - Maybe frontend/backend is straightforward, but it becomes a slippery slope. If you split into enough modules, developers may have to investigate where some piece of code lives in source control.

My case is a bit extreme as our product is split across 14 different repos and each repo is then divided into 4-8 modules. If I remember, we have somewhere around 80 or some "packages" which all need to be checked out individually and then assembled.

Your case with just backend/frontend may be less complicated, but I still advise against it.

Extreme examples can be compelling arguments for or against pretty much anything :)

Criteria I would use to decide

I would consider splitting a product into multiple source code repositories after considering the following factors:

Build - Do the results of building each component merge together to form a product? Like combining .class files from a bunch of components into a series of .jar or .war files.
Deployment - Do you end up with components that get deployed together as one unit or different units that go to different servers? For example, database scripts go to your DB server, while javascript goes to your web server.
Co-change - Do they tend to change frequently or together? In your case, they may change separately, but still frequently.
Frequency of branching/merging - if everybody checks into trunk and branches are rare, you may be able to get away with it. If you frequently branch and merge, this may turn into a nightmare.
Agility - if you need to develop, test, release and deploy a change on a moment's notice (likely with SaaS), can you do it without spending precious time juggling branches and repos?

Your arguments

I also don't agree with most of your arguments for this splitting. I won't dispute them all because this long answer will get even longer, but a few that stand out:

We have two modules that don't depend between each others.

Non-sense. If you take your backend away, will your frontend work? That's what I thought.

Having source history of both projects in the long term may complicate things (try searching in the history for something in the frontend while you have half of the commits that are completely unrelated to the bug you're looking for)

If your project root is broken into frontend/ and backend/, then you can look at the history of those hierarchies independently.

Conflict and merging (This shouldn't happen but having someone pushing to the backend will force other developer to pull backend changes to push frontend changes.) One developer might work only on the backend but will always have to pull the backend or the other way around.

Splitting your project into different repos doesn't solve this. A frontend conflict and a backend conflict still leaves you with 2 conflicts, whether it's 1 repository times 2 conflicts or 2 repositories times 1 conflict. Somebody still needs to resolve them.

If the concern is that 2 repos means a frontend dev can merge frontend code while a backend dev merges backend code, you can still do that with a single repository using SVN. SVN can merge at any level. Maybe that is a git or mercurial limitation (you tagged both, so not sure what SCM you use)?

On the other hand

With all this said, I have seen cases where splitting a project into multiple modules or repositories works. I even advocated for it once for a particular project where we integrated Solr into our product. Solr of course runs on separate servers, only changes when a changeset is related to search (our product does much more than search), has a separate build process and there are no code artifacts or build artifacts shared.

Best Answer

Related Solutions

GitHub – How to Create a Pull Request Without Forking?

Git – shared CD (Continuous Deployment) for multiple Git repositories

Related Topic