Centos – Find (un)used packages on CentOS/Fedora

centosfedorapackage-managementyum

In short: From all the (rpm) packages installed I would like to identify the ones unused (for example since the last 6 months).

In long: I have number of machines with a respectable service record. Every time I upgrade from one release to another I'm surprised how well the upgrade procedure goes.

However over the years many packages were installed (via yum), a number of which I know are no longer used. I want to get rid of these as they have a negative impact on resource usage and the overall security of the system.

I'm looking for the best method to find unused packages.

One way would be to manually sift through the installed packages? The method works and I learn a lot, but its extremely time consuming.

So I'm looking for an automated way to identify unused packages so I can clean them manually.

I guess one way forward would be to monitor all used files on a server, link them to packages and see what's leftover. Is there anything available for this purpose?

Are there more inventive ways to accomplish this?

Best Answer

Given the nature of RPM's and shared libraries common to multiple packages, I would take the approach of building a list of packages that I actually use and diff that against a list of installed packages. There are benefits to removing unused packages, such as freeing up disk space, reducing packages that would facilitate privilege escalation, reducing the size of a checksum database i.e. OSSEC, aide, tripwire.

Assumption:

  • atime is enabled. If you are using a mount option of noatime, then the access times of files will not be updated and could not be used to determine what files are accessed. It is common for noatime to be set on a filesystem to avoid the write penalty.

Disclaimer: This method has some risk you will need to consider. For example, if your server has been up for a couple of years there could be daemons running that use old files you have not accessed since the server/daemon start time. There are plenty of other risks to factor in, but you asked so here is one method I might start with. This still requires a human to determine what could safely be removed. You should not automate removal of packages using this method. This is for educational use only.

Build a list of all RPM's installed.

rpm -qa | sort -n > /dev/shm/all.txt

Build a list of recently accessed files and save a count. We are approaching the new year, so you might want to look at last year.

YEAR=`date -d "one year ago" '+%Y'`
# YEAR=2014
OFS="$IFS";IFS=$'\n';stat --printf="%y %n\n" $(ls -tr $(find /bin /boot /etc /lib /lib64 /sbin /usr /var -type f ! -name "*~" ! -name "*.gz" ! -name "*.tar")) | grep ^${YEAR} | awk {'print $NF'} > /dev/shm/recent.txt;IFS="$OFS";
FILECOUNT=`egrep -c ^.+ /dev/shm/recent.txt`

Copy our RPM database to the ram disk so we don't abuse the server. Ensure you have at least 100 MB free or so. e.g. df -Ph /dev/shm

mkdir --mode=0700 /dev/shm/rpmdb
rsync -a /var/lib/rpm/. /dev/shm/rpmdb/.

Find the RPM's associated with our recent.txt list. This will take a while. I bet someone could find more efficient, faster and clever ways to do this step. I would do this in a screen session.

renice 19 -p $$ > /dev/null 2>&1
printf "${FILECOUNT} files to iterate through."
> /dev/shm/recent_packages.txt
for file in `cat /dev/shm/recent.txt`
do
rpm --dbpath /dev/shm/rpmdb -q --whatprovides ${file} >> /dev/shm/recent_packages.txt 2>/dev/null
# optional status indicator.
printf "."
done

Remove from our list the files not owned by an RPM package from the findings.

grep -v "not owned by" /dev/shm/recent_packages.txt | sort -n | uniq > /dev/shm/recent_sorted.txt

Diff the output. Again, this is not completely useful by itself. You will need to determine why the files from these packages have not been accessed.

diff -u /dev/shm/recent_sorted.txt /dev/shm/all.txt | grep '^+'

You can list the contents of an RPM with rpm -ql package. Here is the output on one of my VM's. As you can see, this is not entirely useful in my case.

+++ /dev/shm/all.txt    2014-12-31 20:50:06.521227281 +0000
+basesystem-10.0-4.el6.noarch
+dhcp-common-4.1.1-43.P1.el6.centos.x86_64
+filesystem-2.4.30-3.el6.x86_64
+rootfiles-8.1-6.1.el6.noarch

I need to keep filesystem and basesystem around, despite the fact those files have not been accessed in a while. Note: At some point I enabled noatime

I removed dhcp-common and its associated dhclient package, since I will never need DHCP in my specific use case. I realize this method is not entirely efficient, but it should give you a starting point on each unique role of your servers. Happy new year!