In short: From all the (rpm) packages installed I would like to identify the ones unused (for example since the last 6 months).
In long: I have number of machines with a respectable service record. Every time I upgrade from one release to another I'm surprised how well the upgrade procedure goes.
However over the years many packages were installed (via yum), a number of which I know are no longer used. I want to get rid of these as they have a negative impact on resource usage and the overall security of the system.
I'm looking for the best method to find unused packages.
One way would be to manually sift through the installed packages? The method works and I learn a lot, but its extremely time consuming.
So I'm looking for an automated way to identify unused packages so I can clean them manually.
I guess one way forward would be to monitor all used files on a server, link them to packages and see what's leftover. Is there anything available for this purpose?
Are there more inventive ways to accomplish this?
Best Answer
Given the nature of RPM's and shared libraries common to multiple packages, I would take the approach of building a list of packages that I actually use and diff that against a list of installed packages. There are benefits to removing unused packages, such as freeing up disk space, reducing packages that would facilitate privilege escalation, reducing the size of a checksum database i.e. OSSEC, aide, tripwire.
Assumption:
Disclaimer: This method has some risk you will need to consider. For example, if your server has been up for a couple of years there could be daemons running that use old files you have not accessed since the server/daemon start time. There are plenty of other risks to factor in, but you asked so here is one method I might start with. This still requires a human to determine what could safely be removed. You should not automate removal of packages using this method. This is for educational use only.
Build a list of all RPM's installed.
Build a list of recently accessed files and save a count. We are approaching the new year, so you might want to look at last year.
Copy our RPM database to the ram disk so we don't abuse the server. Ensure you have at least 100 MB free or so. e.g. df -Ph /dev/shm
Find the RPM's associated with our recent.txt list. This will take a while. I bet someone could find more efficient, faster and clever ways to do this step. I would do this in a screen session.
Remove from our list the files not owned by an RPM package from the findings.
Diff the output. Again, this is not completely useful by itself. You will need to determine why the files from these packages have not been accessed.
You can list the contents of an RPM with rpm -ql package. Here is the output on one of my VM's. As you can see, this is not entirely useful in my case.
I need to keep filesystem and basesystem around, despite the fact those files have not been accessed in a while. Note: At some point I enabled noatime
I removed dhcp-common and its associated dhclient package, since I will never need DHCP in my specific use case. I realize this method is not entirely efficient, but it should give you a starting point on each unique role of your servers. Happy new year!