C++ – Thoughts on web development architecture through integrating C++ in the future to a web application

Architecturecimplementationsweb-development

I'm looking to build a website (it's actually going to be a commercial startup) I saw this question and it really shed some light on a few things that I was hoping to understand (kudos to the op). After seeing that, it would make sense that, unless the website were required to actually have millions of hits per day, it wouldn't be a viable solution to write a C++ backend on the server side.

But this got me thinking.

what if it in the (unlikely) events of the future, it does go that route? The problem is that, while I'm thinking of starting this all using .Net (in the beginning) just to get something quick and easy up without a lot of hassle (in terms of learning), and then moving towards something more Open Source (such as Python/Django or RoR) later to save money and to support OSS, I'm wondering IFF the website actually becomes big, will it be a good idea to integrate a C++ backend, and use Python ontop of C++ for a strong foundation, and then mitigate HTML/CSS/AJAX/etc ontop of the backend's foundation? I guess, what I'd like to know is that, given the circumstance, if this were to happen, would it be a proper approach in terms of architecture? I'd definitely be supporting MVC as that seems to be a great way to implement a website.

All in all, would one consider this rational, or are there other alternatives? I like .Net, and I'd like to use it in the beginning, because I have much more experience with that than, say, Python or PHP, and I prefer it in general, but I really do want to support OSS in the future. I suppose the sentence I'm looking for is, "is this pragmatic?"

Best Answer

No, because it's more effective to scale horizontally than vertically

Meaning, it's better to design an architecture where you can spread requests across a number of less powerful machines; as opposed to, building one super powerful machine running code that has been optimized at a lower level.

The drive behind horizontal scaling is due to the 'nature' of the web. Websites are very IO heavy not compute heavy. There are exceptions, such as YouTube where they're encoding massive amounts of data but that can be dealt with by setting up compute nodes and an effective task management system.

For instance, You could implement a reverse-proxy server that does nothing but distributes incoming requests to other servers to fetch content. It doesn't matter where the data is returned from as long as there is a common origin for requests (and even that can be extended using advanced DNS).

At the very minimum, it's common to split the 'data layer' (ex database) from the 'web layer' (http server). Even StackOverflow has a clear separation between the two by implementing C# on the 'web layer' and a separate REST API (also C#?) to access the database.

Even when you need to tap into the raw computing power of C, many 'glue languages' like python have the built-in capability to implement C/C++ extensions.

There are a lot of ground-breaking being made in the field of scalability and fortunately, the people who are breaking new ground like to share the specifics their approach.

Basically, it comes down to:

Architecture trumps raw computing power when it comes to the web

HighScalibility.com is my favorite site to read about scalable architecture development but there are plenty more to be found on the web.


Aside:

One of the greatest bottlenecks to handling web requests is the architecture of the HTTP server. Traditional servers like Apache fork a sub-process for every request. It works but every sub-process that's added to the stack needs it's own pool of memory and will inevitably increase the overall task-switching overhead on the server.

A DDOS attack is directed to take advantage of this weakness by overloading the HTTP server to the point where it can't physically handle the load any longer and crashes.

Multi-threading has been introduced as a stopgap measure but writing 'good' multi-threaded software is hard to write and multi-threaded bugs are notoriously hard to pinpoint.

Another 'school of thought' that has become popular recently is event-driven webservers like Node.js and nginx. They rely on a simpler single-threaded model and a programming style that mitigates waiting on a function returns by implementing a fire-and-forget model. It's more difficult to program in such a manner but as long as heavy compute tasks are passed on to servers specialized for it, the numbers (ie hits/sec) they can handle (on even commodity hardware) are impressive.

Related Topic