(Author's Note: This answer refers to RHEL 6 and prior versions. RHEL 7 now has a fully supported upgrade path from RHEL 6, the details of which are at the end.)
To start, I should note that there are two ways to do the in-place upgrade:
- Drop in the installation DVD (or use the DVD image via iLO/iDRAC), boot from it and choose Upgrade, e.g.
linux upgradeany
.
- Update the
redhat-release
RPM manually, run yum distro-sync
(this is oversimplified a bit) and reboot.
Method 1 is merely unsupported. Method 2 is for Real Cowboys. In addition to the recommended fresh installs, I have done both of these...
Do I need support?
Support has two complementary meanings in our world. The first is that a product has a given feature (e.g. "Postfix supports SMTP"). The second is that the vendor will talk to you about it. Which definition is meant is not always clear from context.
To accomplish a task, you obviously need support in the first sense. Where vendor support comes in is to assist you in resolving issues and giving the vendor feedback as to what features need to exist or be improved. Many sites pay a fortune for vendor support when they have the in-house expertise to resolve any issues that may arise, faster and even cheaper than the vendor could. Whether to buy vendor support is ultimately a business decision you will have to make (or advise management on).
Why not do an in-place upgrade?
This is what Red Hat says about it:
Red Hat does not support in-place upgrades between any major versions of Red Hat Enterprise Linux. A major version is denoted by a whole number version change. For example, Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6 are both major versions of Red Hat Enterprise Linux.
In-place upgrades across major releases do not preserve all system settings, services or custom configurations. Consequently, Red Hat strongly recommends fresh installations when upgrading from one major version to another.
They further warn:
However, note the following limitations before you choose to upgrade your system:
- Individual package configuration files may or may not work after performing an upgrade due to changes in various configuration file formats or layouts.
- If you have one of Red Hat's layered products (such as the Cluster Suite) installed, it may need to be manually upgraded after the Red Hat Enterprise Linux upgrade has been completed.
- Third party or ISV applications may not work correctly following the upgrade.
Of course, they then describe how to do an in-place upgrade via method 1, just in case you really want to do it. The feature exists and Red Hat puts development time into it, so it is supported in that the feature exists. But if something goes wrong, Red Hat will tell you to install fresh; they will not provide vendor support for things that break as a result of the upgrade.
For the record, I've never actually had a problem with an in-place upgrade of a RHEL/CentOS or Fedora system that I couldn't resolve myself. The typical problems come from renamed packages, third party repositories and the occasional version mismatch between the i386 and x86_64 architectures of a package. The installer is a bit better at handling these than yum
, I think.
How should I upgrade?
I generally warn people that they should plan on a maintenance window every 3-4 years to update RHEL systems from one major version to the next. While upgrades generally go smoothly, the unexpected can always happen.
For both of your environments, I expect an in-place upgrade would work, though I strongly recommend testing it thoroughly first. P2V a representative sample of the servers and run through the in-place upgrade on the virtual systems to see what problems you're going to run into. You can then plan the actual production upgrade based on better knowledge of what will happen.
For a large deployment such as you have here, consider using Limoncelli's "one-some-many" approach. Upgrade one machine, see what problems occur, solve them, then use lessons learned when upgrading a small batch of machines, repeat the lessons learned thing, then when you believe you have all the kinks worked out, upgrade large batches of them.
At a time like this, I also recommend taking a long hard look at your application deployment process. If it isn't sufficiently automated that you can kick it off with a single command and be reasonably sure that the app will be deployed correctly, then perhaps the developers need to get to work on that. Having such a deployment process would make it much easier to do a fresh installation of the newer version of EL and then deploy onto it.
Will switching distributions help?
Debian-based distributions do have a supported in-place upgrade method, and it mostly works, but it is not immune from problems. Lots of things broke for people upgrading from Ubuntu 10.04 LTS to 12.04 LTS via the supported method, for instance. It's not clear that Debian or Canonical are putting a sufficient amount of development time into "supporting" this feature, i.e., making sure it works. And you still actually have to buy vendor support for this distribution if you want someone to hold your hand. So I doubt you will gain much from switching to such a distribution.
You may gain by switching to a rolling-release distribution such as Gentoo or Arch. However, this also doesn't make you immune to problems; it just means you have to deal with the upgrade problems continuously over the life of the server (e.g. whenever you or the developers decide to update something on the system), rather than all at once at a well-planned distribution upgrade time. You also have no vendor to provide support.
What does the future hold?
The Fedora Project is working on a tool to improve in-place upgrades. They had a tool called preupgrade
which was abandoned and replaced with a new tool called fedup beginning with Fedora 18. This was added to RHEL7 and now in-place upgrades have full support, at least from RHEL 6 to RHEL 7. From my own experience I can say that while fedup
still has some kinks, it is shaping up to be a very useful tool.
CentOS is also experimenting with a rolling-release type of repository, but it only applies between minor versions (e.g. 6.3-6.4).
Best Answer
Will pre-openned communication socket between VMs ommit any steps in described list?
Pre-openned socket beween VMs/Containers will do a trick due to TCP handshake overhead; and even more, if there is a TLS.
Although it is accepted that handshake overhead is negligibly small, but when we speak of frequent communication, it starts to play significant role.
Having boundary state of M x N openned connections in case of mesh-like containers network is not very wise. Simple keep-alive solution with TTL based on your own containers communication statistics will be better.
Keep in mind that too many threads keeping TCP connections alive will cause another overhead, so make sure that you use epoll.
Does SDN somehow mitigate such problems or does it add even more overlays and extra headers to packets?
It does add more overlays, many are vendor-locked, but there is at least one pipework SDN related solution described below which is about Docker environment.
Do I really need described process to communicate between VM-1 and VM-2 or there is a special linux "less-secure-more-performance-use-on-your-own-risk" build?
I didn't find "special" linux build with enought-to-trust community and updates support, but problems with slow linux TCP stack are not new, and there are many options for kernel bypass. Cloudflare does that.
From articles I found, slow linux TCP stack is well-known and there is no option to drop-in linux server and win: you have to fine-tune that Torvald's child to solve your own problem this or that way every time.
Do I have to stick with linux at all? Any faster *BSD-like systems with docker support?
Have found no evidence where Windows, MacOS or *BSD-like system had better networking than latest linux with its slow TCP stack with kernel bypass applied.
What are best practices to mitigate that bottleneck to fit more VMs with micro-services on same host as result?
So, there are two bottlenecks: guest linux and host linux.
For host linux, in case if it is used not only for containers hosting, there is a kernel bypass strategy with big variety of options from descibed in Cloudflare blog and "Why do we use the Linux kernel's TCP stack?" article to writing your own application-focused TCP stack.
For guest linux Macvlan may be used to bypass Layer 3 and connect docker container directly to the NIC with its own MAC address. It is much better than bridge, because it mitigates a lot of both guest and host linux network bottlenecks, but make sure that you are ready to explode your router mac address table with another hundred or thousand records - most likely you will have to segment your network.
Also as per Virtual switching technologies and Linux bridge presentation there is a SR-IOV option which is even better that Macvlan. It is available for docker 1.9+ for Mellanox Ethernet Adapters as plugin, included as a mode in pipework SDN, has dedicated SRIOV plugin from Clear Containers - more than enough to start digging application-focused solution.
Do solutions like Project Calico help or it is more about lower level?
It is totally another level and will not have significant impact in comparison with SRIOV and Macvlan, but they help to simplify network managing with almost no overhead on top of bypass solution your will choose.
And yes...
Monitor your Docker closely, as it may do unexpected things. For example it modprobes
nf_nat
andxt_conntrack
, where there is no reason with Macvlan turned on, it leads to extra CPU ticks spending.