How to Design Performance Comparison Between Data Structures

data structuresperformance

I want to compare the performance of two search trees of integers (an AVL tree and a RedBlack tree). How should I design/engineer the tests to accomplish this? For example, let's consider the insert operation; what steps should I follow in order to state that on average this operation is faster in the RB case? What considerations should I take to correctly measure CPU time? Should both implementations be optimized, or may I compare an optimized implementation of AVL vs a straightforward implementation of RB?

Any links or papers would be very helpful.

Best Answer

It highly depends on what you plan to do with the data structure. If you will end up filling it with a certain structured input, then you should also test it that way.

If you don't know anything about your future inputs and want to measure average performance, then remember that complexity theory calculates average performance based on randomized inputs (using a normal distribution). Hence, average-case performance test should include many runs with varying random inputs.

Depending on the data structures themselves, you may also be interested in comparing certain input structures that are known to be very good/bad for one of the data structures. Nevertheless, your future application of the data structure may almost never create such inputs, in which case you may well ignore the performance comparison of these cases. (Intuitive Example: comparing sort algorithms in a context, where you often try to sort an already sorted sequence could throw many quicksort implementations off.)

As for your optimization point, the answer again is: It depends. Are you aiming for using this data structure in exactly that one project right now? Then I'd go for optimized versions. Are you aiming at a comparison to get a general idea on which one might be more suitable for a planned project? Then try to compare reference implementations, but do not waste time on creating super-efficient implementations. Of course, the context in which you execute the tests has to be comparable, so don't try to f.ex. compare implementations in different programming languages to each other. Probably obvious, but I just thought I'd make sure to mention it.

Related Solutions

Data Structures – Understanding the Macro Aspects

I would define them as the interface they provide, including complexity of the operations.

One could after all implement a linked list with an array and vice versa (and recursivly), but it would be impossible to keep the complexity.

Update: To clarify: A stack is defined as a datastructure with constant-time add to top, and constant-time remove at top. It doesn't say how we implement it. One could implement it using an array, or a linked list, but also a double linked list, but that is irrelevant.

The atomic we have on a computer is (binary) memmory, and the most basic form of memmory is an array, thus one can create every kind of datastructure using an array.

One could imagine we had another atomic (maybe a qubit). Then a stack would still be a stack, but maybe we couldn't implement it with that atomic.

Java – How would you transfer data between your data structures and databases

At its most basic, what you need to develop to transfer data between data structures in memory and persistent data stores (files, DB, whatever) is a Data Access Layer or DAL. This is one of your "application tiers" in a properly-constructed N-tier application. Objects in this tier know how to create data structures from DB data, and conversely to convert a properly-populated data structure into DB data, so that objects generally lying outside this tier don't have to have this knowledge.

There are several models for building a DAL tier. The most common in my experience is a centralized "front door" that can accept any request to read or write data and produce any needed data structure. This pattern is called the "Repository". You would make a call to this object along the lines of:

int myObjectId = 1243;
MyObject myObjectInstance = myRepository.RetrieveById<MyObject>(myObjectId);

... and then myObjectInstance would be populated with the data from the table representing MyObject records, with the identifying key that we specified. How that happens exactly, nobody outside the Repository really has to know; you could dynamically construct a SQL statement, use a Data Access Object that knows specifically how to retrieve MyObject instances, or you could do as Oded states and use a third-party library called an Object-Relational Mapper or ORM which abstracts away all the details of how the retrieval happens.

Best Answer

Related Solutions

Data Structures – Understanding the Macro Aspects

Java – How would you transfer data between your data structures and databases

Related Topic