Java Algorithms – Finding Best Combination for Unique Items

algorithmsgenetic algorithmsjava

A couple of months ago I went to a Bruce Springsteen concert. I hadn't heard much of his songs before that, but really liked the concert and bought the live recording of it afterwards (which he sells via live.brucespringsteen.net). A bit later I even bought another live recording of a different concert and after listening to them on repeat I'm liking his music more and more. I like it in fact so much that I want to buy more of his live recordings… and this is where I ran into my problem: he has a lot of recordings!

So now as a developer I want to write an piece of code that will give me the minimum/optimum numbers of recordings to buy that will give me a maximum number of different songs with the least duplicates (taking into account the 2 recordings I already have).

I could brute force it of course, but I'm hoping that there is some algorithm or library that could help me determine the correct set of recordings to buy? Scraping the set lists from the site is the easy part.

I've already tried searching for algorithms and it looks like it might be some sort of Cover Set Problem, which would make it NP-complete, which is a problem, but I'm not looking for a combination that gives me all different songs, just which x recordings to buy additionally to get the best amount of different songs in total (song length doesn't matter).

Clarification: as Doc Brown points out I need to be even more specific. I'd like to know which recordings to buy that give me the most, but not all, songs, with a fixed maximum of duplicates.

Possibility? After thinking about this some more I'm starting to wonder if maybe Genetic Programming can be applied to this? You could start off with a population of randomly generated individuals (combinations of recordings) and then applying a fitness function that gives a score based on the amount of unique songs, duplicates & total recordings needed. After enough generations this could also maybe give a good approximation?

Best Answer

You can use genetic algorithms with a simple binary representation for the individuals:

individual = 01010100000111000011010111001110

Every locus of an individual is related to the recordings of a specific show. If:

individual[k] == 1

then you're going to buy the k-th album.

You should code two / three functions:

unique_songs(individual) = number of unique songs

duplicates(individual) = number of duplicates

cost(individual) = total purchase cost

The fitness function is something like:

fitness(individual) =        unique_songs(individual)
                      - k1 *   duplicates(individual)
                      - k2 *         cost(individual)

k1 / k2 have to be tuned. This will treat duplicates(individual) as a soft constraint.

If you want as many songs as possible with a fixed maximum number of duplicates (limit) you need a hard constraint. The simplest approach is changing the fitness from a scalar value to a bi-dimensional vector:

fitness(individual) = [-penalty(individual),
                       unique_songs(individual) - k * cost(individual)]


                       / duplicates(individual) - limit   constraint not satisfied
penalty(individual) = {
                       \ 0                                otherwise

For further details about this approach you can read An Efficient Constraint Handling Method for Genetic Algorithms - Kalyanmoy Deb.

You can use the same library / framework, just change the fitness comparison operator to a lexicographic comparison function.

Related Topic