Data Structures – How Does a Skip List Work?

data structures

For a homework assignment, I need to understand how a skip list works.

I've been programming for a little over 2 years now (I know that's not that long in reality), and I have never even heard of a skip list.

I've looked over all of the guides that I can find, and I still only barely understand how they work. I even searched Code Review for an example implementation, and found only one review; and it's not even a complete implementation. I looked over the sample implementation supplied by the course, and it's absolutely atrocious. Between the lack of proper methods, and single-letter variable names, I have no clue how it works.

How does a skip list work? Is knowledge of a skip list required to understand more advanced data structures?

Best Answer

In the olden days during data structures class, we learned how AVL trees worked. I would have had that in one of my classes, but the instructor said "you'll never actually use this" and instead had us learn 2-3 trees and b* trees instead. Those were days when memory was tight and processes were singly threaded. You didn't use a deque when a singly linked list would work just as well.

The skip list is much more common today with more available memory and concurrency not being an issue (you don't need to lock much at all when acting as a writer in a skip list - compared to everything with an AVL tree).

Frankly, it's my favorite data structure now in that it's something I can easily reason about how it works underneath and where it will be advantageous or disadvantageous to use.

You aren't going to need to write one from scratch (unless you get it as an interview question - but then you're just as likely to get implement an AVL tree).

You are going to need to understand why you want to select a ConcurrentSkipListMap in Java rather than a HashMap or TreeMap or any of the other map implementations.

To understand how it works, you need to understand how a binary tree works. Wait, let me amend that. You need to understand how a balanced binary tree works. Without balancing a binary tree, you don't get any real advantage with its lookup.

Lets say we've got this tree:

A binary tree

And we insert an '8' into it. Now we've got:

An unbalanced binary tree

And that's not balanced. So, we go and do the magic of balancing it via some implementation...

balanced tree

And you've got a balanced tree again. But that was a lot of magic I waved my hand over.

Lets take a skip list.

ideal skip list

This one happens to be an idealized one. Few are, but it shows the balanced binary tree nature that the skiplist ideal approximates.

Now, we want to insert a 6 into there. This is inserting it much like a linked list. However, we start at the top and go down. The top points to 5. Is 6 > 5? Yes. Ok, the top points off to the end now, so we go down the stack (we're on the 5). The next is 7. Is 6 > 7? Nope. So we go down a level and we're at the base level so we insert 6 to the right of the 5.

We flip a coin - heads we build, tails we stay. Tails. Nothing more needs to be done.

skip list after an insert

Lets insert that 8 now. 8 > 5? yep. 8 > 7? Yep. And now we are at the bottom level again after following arrows and the stack around and we test 8 > 11? Nope. So we insert the 8 to the right of the 7.

We flip a coin - heads we build, tails we stay. Tails. Nothing more needs to be done.

skip list after another insert

In the balanced tree, we'd be getting all worked up about the tree not being balanced now. But this isn't a tree - its a skip list. We approximate a balanced tree.

Now lets insert a 10. I'll avoid all the comparisons.

We flip a coin - heads we build, tails we stay. Heads! And flip it again, Heads again! Flip it again, ok, there's a tails. Two levels above the base linked list.

skip list after yet another insert

The advantage here is that now if we're going to insert a 12, we can skip from 5 to 10 without doing all those other comparisons. We can skip over them with the skip list. And we don't have to worry about the balanced tree - the probabilistic nature of the stacking does that for us.

Why is this so useful? Because when inserting the 10 I can do that by locking the write on the 5 and 7 and 8 pointers rather than the entire structure. And while I am doing that, the readers can still be going through the skip list without having an inconsistent state. For concurrent usage, its faster not having to lock. For iterating over the bottom layer, its faster than a tree (the joys of the BFS and DFS algorithms for tree navigation - you don't have to worry about them).

Will you encounter it? You'll probably see it in use in places. And then you'll know why the author chose that implementation rather than a TreeMap or HashMap for the structure.

Much of this has been borrowed from my blog post: The Skip List

Related Solutions

Linked List – Why Linked List Implementation is Considered Linear

I think what they mean by "linear" is most probably the linked list's performance characteristics. To access the n-th element of a linked list, you need to walk through each element before it one by one. Thus, the time required to do this is a linear function of n (the upper limit of which is the list size). As opposed to e.g. an array, which is random access, i.e. accessing any array element by index requires practically the same, constant time.

OTOH inserting a new element in a known location in the middle of a linked list requires constant time, while doing the same in an array requires shifting the remainder of the array in memory, which is linear to number of subsequent elements (bounded by the size of the array). So linked lists do have performance benefits.

Generic Adjacency List Graph implementation

Here's an implementation that I dug up from when I was dealing with graphs. Although it's in C#, with a few minor adjustments it could compile in Java.

To make it directed you would have to copy v.Adjacents.Add(new Edge(w, cost)); and reverse the direction, thereby taking up double the space.

I don't think it's any better than what you have though.

class Edge
{
    public Vertex Destination { get; set; }
    public double Cost { get; set; }

    public Edge(Vertex destination, double cost)
    {
        Destination = destination;
        Cost = cost;
    }

}

class Vertex
{
    public string Name { get; set; }
    public List<Edge> Adjacents { get; set; }
    public double Distance { get; set; }

    public Vertex(string name)
    {
        Name = name;
        Adjacents = new List<Edge>();
        Distance = double.MaxValue;
    }
}

class Graf
{
    private readonly Dictionary<string, Vertex> vertexmap = new Dictionary<string, Vertex>();

    public void AddEdge(string source, string dest, double cost)
    {
        var v = GetVertex(source);
        var w = GetVertex(dest);
        v.Adjacents.Add(new Edge(w, cost));
    }

    private Vertex GetVertex(string vertexname)
    {
        Vertex v;
        vertexmap.TryGetValue(vertexname, out v);
        if (v == null)
        {
            v = new Vertex(vertexname);
            vertexmap.Add(vertexname, v);
        }
        return v;
    }
}

Best Answer

Related Solutions

Linked List – Why Linked List Implementation is Considered Linear

Generic Adjacency List Graph implementation

Related Topic