Ubuntu – Scripting installation of a virtual package provider on Debian/Ubuntu

aptdebiandpkgUbuntu

First, what's the simplest way to get a list of (real) packages that provide a particular virtual package? 'aptitude show' seems to include it in the output for a virtual package, but 'apt-cache show' does not. However, aptitude is not always installed, and grep'ing for "Provided by:" in a script would be fragile due to localization.

# aptitude show java-sdk
No current or candidate version found for java-sdk
Package: java-sdk
State: not a real package
Provided by: default-jdk, gcj-4.4-jdk, gcj-4.5-jdk, gcj-jdk, openjdk-6-jdk, sun-java6-jdk

# apt-cache show java-sdk
N: Can't select versions from package 'java-sdk' as it is purely virtual
N: No packages found

Second, is there any reasonable way to rank the providers such that I'm likely to choose the latest or "most preferred" one? In the 'java-sdk' case, a script should obviously just use 'default-jdk' to begin with; however, if someone hadn't thought to create that, I could imagine sorting by a combination of Priority, Component/Section, and Version. (Obviously, this would mainly be useful for virtual packages providing a standard API; automatically picking a provider for 'mail-reader' would be silly.)

To be concrete, I'm trying to automate the installation of Cloudera Hadoop using Chef. 'hadoop' is a virtual package, where the corresponding real package is currently 'hadoop-0.20':

# aptitude show hadoop
No current or candidate version found for hadoop
Package: hadoop
State: not a real package
Provided by: hadoop-0.20

When there's more than one provider (e.g. hadoop-0.22), I basically want to automatically pick the latest version if 'hadoop-X.YY' is present. (Or better yet, get the Version for each from apt somehow, rather than trying to parse the names.) I know I could achieve this with some scripting, but it wouldn't surprise me if a more elegant way already existed.

Update: 'apt-cache showpkg' includes "Reverse Provides", which further seems to include full version information. This helps, but any idea how to get only this section?

# apt-cache showpkg hadoop
Package: hadoop
Versions:

Reverse Depends:
  sqoop,hadoop
  hadoop-pig,hadoop
  hadoop-hive,hadoop
Dependencies:
Provides:
Reverse Provides:
hadoop-0.20 0.20.2+923.21-1~maverick-cdh3

Best Answer

If you install the grep-dctrl package, you can use grep-available:

grep-available -F Provides -s Package <virtual-package-name>

I don't have hadoop packages available in my debian sources.list so i'll use mail-transport-agent as an example:

$ grep-available -F Provides -s Package  mail-transport-agent
Package: xmail
Package: exim4-daemon-light
Package: exim4-daemon-heavy
Package: esmtp-run
Package: postfix
[...most deleted...]

if you want version numbers too:

$ grep-available -F Provides -s Package,Version  mail-transport-agent
Package: xmail
Version: 1.27-1.1+b1

Package: exim4-daemon-light
Version: 4.76-2

Package: exim4-daemon-heavy
Version: 4.76-2

Package: esmtp-run
Version: 1.2-6
[...]

Note the convenient, easily parsed paragraph mode for each package in this second example.

There are numerous other options, including omitting the field names ("Package:", "Version:", etc). See the man page or --help for more details.