Data Structures – How Does a Skip List Work?

data structures

For a homework assignment, I need to understand how a skip list works.

I've been programming for a little over 2 years now (I know that's not that long in reality), and I have never even heard of a skip list.

I've looked over all of the guides that I can find, and I still only barely understand how they work. I even searched Code Review for an example implementation, and found only one review; and it's not even a complete implementation. I looked over the sample implementation supplied by the course, and it's absolutely atrocious. Between the lack of proper methods, and single-letter variable names, I have no clue how it works.

How does a skip list work? Is knowledge of a skip list required to understand more advanced data structures?

Best Answer

In the olden days during data structures class, we learned how AVL trees worked. I would have had that in one of my classes, but the instructor said "you'll never actually use this" and instead had us learn 2-3 trees and b* trees instead. Those were days when memory was tight and processes were singly threaded. You didn't use a deque when a singly linked list would work just as well.

The skip list is much more common today with more available memory and concurrency not being an issue (you don't need to lock much at all when acting as a writer in a skip list - compared to everything with an AVL tree).

Frankly, it's my favorite data structure now in that it's something I can easily reason about how it works underneath and where it will be advantageous or disadvantageous to use.

You aren't going to need to write one from scratch (unless you get it as an interview question - but then you're just as likely to get implement an AVL tree).

You are going to need to understand why you want to select a ConcurrentSkipListMap in Java rather than a HashMap or TreeMap or any of the other map implementations.


To understand how it works, you need to understand how a binary tree works. Wait, let me amend that. You need to understand how a balanced binary tree works. Without balancing a binary tree, you don't get any real advantage with its lookup.

Lets say we've got this tree:

A binary tree

And we insert an '8' into it. Now we've got:

An unbalanced binary tree

And that's not balanced. So, we go and do the magic of balancing it via some implementation...

balanced tree

And you've got a balanced tree again. But that was a lot of magic I waved my hand over.

Lets take a skip list.

ideal skip list

This one happens to be an idealized one. Few are, but it shows the balanced binary tree nature that the skiplist ideal approximates.

Now, we want to insert a 6 into there. This is inserting it much like a linked list. However, we start at the top and go down. The top points to 5. Is 6 > 5? Yes. Ok, the top points off to the end now, so we go down the stack (we're on the 5). The next is 7. Is 6 > 7? Nope. So we go down a level and we're at the base level so we insert 6 to the right of the 5.

We flip a coin - heads we build, tails we stay. Tails. Nothing more needs to be done.

skip list after an insert

Lets insert that 8 now. 8 > 5? yep. 8 > 7? Yep. And now we are at the bottom level again after following arrows and the stack around and we test 8 > 11? Nope. So we insert the 8 to the right of the 7.

We flip a coin - heads we build, tails we stay. Tails. Nothing more needs to be done.

skip list after another insert

In the balanced tree, we'd be getting all worked up about the tree not being balanced now. But this isn't a tree - its a skip list. We approximate a balanced tree.

Now lets insert a 10. I'll avoid all the comparisons.

We flip a coin - heads we build, tails we stay. Heads! And flip it again, Heads again! Flip it again, ok, there's a tails. Two levels above the base linked list.

skip list after yet another insert

The advantage here is that now if we're going to insert a 12, we can skip from 5 to 10 without doing all those other comparisons. We can skip over them with the skip list. And we don't have to worry about the balanced tree - the probabilistic nature of the stacking does that for us.

Why is this so useful? Because when inserting the 10 I can do that by locking the write on the 5 and 7 and 8 pointers rather than the entire structure. And while I am doing that, the readers can still be going through the skip list without having an inconsistent state. For concurrent usage, its faster not having to lock. For iterating over the bottom layer, its faster than a tree (the joys of the BFS and DFS algorithms for tree navigation - you don't have to worry about them).

Will you encounter it? You'll probably see it in use in places. And then you'll know why the author chose that implementation rather than a TreeMap or HashMap for the structure.

Much of this has been borrowed from my blog post: The Skip List