Fast set indexing data structure for superset retrieval

data structuresindexingtrie

I am given a set of sets:

{{a,b}, {a,b,c}, {a,c}, {a,c,f}}

I would like to have a data structure to index those sets such that the following "lookup" is executed fast: find all supersets of a given set.

For example, given the set {a,c} the structure would return

{{a,b,c}, {a,c,f}, {a,c}}

but not {a,b}.

Any suggestions? Could this be done with a smart trie-like data structure storing sets after a proper sorting?

This data structures is going to be queried a lot. Thus, I'm searching for a structure that might be expensive in build but rather fast to query.

UPDATE: I have finally used a prefix Trie as described in the paper "A New Method to Index and Query Sets", by Jorg Hoffmann and Jana Koehler.

Best Answer

It seems you're searching for a standard Information Retrieval algorithm. Instead of giving you the answer (which depends on factors such as frequency and cardinality of terms and the number of documents, the type of queries asked), I forward you to the excellent introductory treatise on the topic called: Introduction to Information Retrieval: http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html

Probably the chapter 'Index Construction' contains an algorithm suitable for your needs.

Best Answer

Related Solutions

How to Quickly Search Through a Very Large List of Strings/Records in a Database

Algorithms – Data Structure for Fast StartsWith Filtering of a Dictionary

Related Topic