Maintaining your own distribution is a lot of work. Even if you maintain the backports, you will soon be overwhelmed by security issues to fix, and have to pull low-level libraries to keep updating your software, which might break other things (I maintain servers running 6-year-old distros, it's not fun).
Upgrading is generally a good solution. do-release-upgrade
is well made, and you should be able to upgrade without issues (especially if you only used official packages).
My favourite solution though might be the reinstall path. More specifically, your servers should be managed using a configuration management system such as Puppet, Cfengine or Chef. If all your configuration/package needs are specified using such a tool and your data are safe on a separate partition, it's much easier to reinstall quickly. You just install a new distribution without erasing the data partitions, and then run the configuration management tool to reset your packages/configurations. I believe this is the cleanest way to do, especially if you have several servers to manage.
If you are using non-official packages, you might want to identify them before you upgrade/reinstall. maintenance-check can help you identify the packages that are not officially maintained by Ubuntu:
$ bzr branch lp:ubuntu-maintenance-check
$ cd ubuntu-maintenance-check
$ ./maintenance-check -f n
If you want to reinstall, you can also export the list of installed packages:
$ dpkg --get-selections > myinstall.txt
and your debconf database:
$ debconf-get-selections > debconf.txt # from the debconf-utils package
As a note, since you're currently using Karmic, it might not be too violent to upgrade to Lucid, which is an LTS release, still supported until 2015 for the main server packages. This should leave you enough time to setup a viable automated installation for the future.
When you ask about Launchpad packages, I suppose you mean PPAs. There are tons of different PPAs. Some are experimental, some are stable. Some are maintained by official Ubuntu developers, some are maintained by people hardly know how to do a package properly. It's hard to say in general if packages you find on PPAs are good, there's no general rule. The best hint in this case might be too look at the owner of the PPAs to get an idea of the possible quality of their packages.
You're in the process of moving from "SMB-management" to "Enterprise management", which can be exiting enough in itself.
Most companies implement some kind of a maintenance window notion, where a maint window is a period of time that the given server is allowed to restart or/and perform maintenance tasks. By doing some careful planning, such as placing domain controllers/DNS Servers in separate maintenance window groups (same with cluster nodes), you should hopefully be able to design server groups where different maintenance window policies are assigned. Some companies use system managagement tools such as Microsoft System Center Config Manager to control both the maintenance windows and patch management, but I know a lot of large companies just relying on WSUS and controlling policies using GPO or registry. For one customer we built GPOs with AD Group filtering, so that sysadmins simply had a "day of week group" they could add their servers to. Servers in the "Monday" group would get patched every monday at 2330, and so on.
So, there's a lot of tooling out there but the first thing to do is to realize your now in the enterprise management business and plan accordingly.
Best Answer
Why not look at the concurrent usage of your system historically & determine what times of the day usage is at its lowest? Then stick your change right in the middle of that low usage period.
When working out how long the change will take include pre/post implementation testing and production verification testing. In addition work out how long the change will take to roll back if any testing fails.
IMHO your 'first users' shouldn't be guinea pigs. Having live users basically production verification test your changes is not a good thing. It destroys the end users confidence & the unexpected outcomes can mess up production which means not only do you have to roll back the change, but also roll back any 'damage' the change may have caused.
I don't know of any research papers, but take a look at any IT Service Management framework (ITSM) such as ITIL, you will find lots of standards & best practice on software release management. All systems are different so the extent of how many of the practices you adopt, and the formality, depends. ITSM standards have big systems in mind.