##
Having fun sorting with Haskell *July 11, 2009*

*Posted by haskelladdict in C programming, Haskell.*

trackback

trackback

Recently, I re-discovered my good old copy of Jon Bentley’s *Programming Pearls *and started re-reading some of the chapters. It is a wonderful little book which I can highly recommend to any programmer. Particularly, the chapter on sorting (Column 11) is great and I decided to implement some common sorting routines in C just to stay in practice. Pretty quickly I had written a *bubble sort*, *insertion sort*, and *quicksort* routine and tested them all with a random vector of ints of length 20,000. Of course, *quicksort* (combined with *insertion sort* to sort the smallest sub-array) beat the others hands down.

So far so good but why not implement these sorting routines in Haskell as well, just for fun. *Quicksort* in Haskell is straightforward and strikingly beautiful. Here it is

> quicksort :: [Double] -> [Double]

> quicksort [] = []

> quicksort (x:xs) = (quicksort (filter (<= x) xs)) ++ [x]

> ++ (quicksort (filter (> x) xs))

Interestingly, even though this implementation uses filter twice, it is a tad faster on my system (~ 3%) than the version that uses a single call to partition to partition the list.

Of course, this version is no match for the C implementation in terms of speed due to the appends and the fact that each recursive call to *quicksort* uses its own private copy of the sub-array. Nevertheless, it is quite speedy and code-wise it can’t get much more concise.

My Haskell implementation of *insertion sort* is also not too much longer

> insertion_sort :: [Double] -> [Double]

> insertion_sort [] = []

> insertion_sort (x:xs) = my_insert x (insertion_sort xs)

>

> where

> my_insert a [] = [a]

> my_insert a tot@(b:bs) = case compare a b of

> GT -> b:(my_insert a bs)

> _ -> a:tot

Yes, I know, I could have used prelude’s insert function instead, but

for some reason, using my_insert is about 30% faster for a reason I don’t understand since the functions are almost identical. The only reason I can think of is the additional redirection via insertBy used by insert. Of course, due to its O(n^{2}) behaviour, *insertion sort* is quite a bit slower than *quicksort*.

Finally, there is my version of *bubble sort* and, frankly, it took me some time to find an implementation that ran in a somewhat decent amount of time without blowing my machine’s stack. My code threads a status variable through the bubble function which signals the absence of swaps indicating a sorted list. Here it is

> bubble_sort :: [Double] -> [Double]

> bubble_sort xs = let

> (state,bubbled) = bubble True xs

> in

> if not state

> then bubble_sort bubbled

> else bubbled

>

> where

> bubble s [] = (s,[])

> bubble s [a] = (s,[a])

> bubble s (a:b:as) = case compare a b of

> GT -> let

> (s_n,new) = bubble False (a:as)

> in

> (s_n,b:new)

>

> _ -> let

> (s_n,new) = bubble (s && True) (b:as)

> in

> (s_n,a:new)

This version of* bubble sort* is about 6 times slower than insertion sort. So not too bad but I have the feeling it can still be improved quite a bit. E.g., currently bubble walks through the whole list during each iteration even the already sorted part which is a waste.

Why use filter twice when you can use partition and get the same result with a single pass of the list?

Sorting is great fun. I’m part way through a series on sorting myself.

Cheers.

Hi OJ,

Thanks for the great suggestion. I actually tried using partition at some point but it turned out that (somewhat counterintuitively) the version with two calls to filter is actually faster for me by a few %. Not sure why.

Are you using GHC 6.10 or GHC 6.8, out of curiousity?

The problem with partition is that it returns a lot of lazy pairs. I haven’t tested partition per se, but using drop and take used to be faster than calling splitAt, but no longer if you use 6.10. (although splitAt, drop, and take are a *lot* faster)

That is really interesting. I’m going to have to take a look at the implementation of partition now! Wonder if they’re doing something crazy in it. It seems odd there’d be a negative impact on performance.

Cheers for the info.

OJ

Below are some timings (averages over 3 runs each) for sorting a list of 1e6 random ints using the above code for filter-quicksort and the following code for partition-quicksort (sorry, indenting is messed up)

>quicksort :: [Double] -> [Double]

>quicksort [] = []

>quicksort (x:xs) = let

> (leq,gt) = partition (<= x) xs

> in

> (quicksort leq) ++ [x] ++ (quicksort gt)

1) ghc-6.10.1 on x86_64, compiled with -O2

filter-quicksort : 20.40 s

partition-quicksort : 21.31 s

2) ghc-6.10.3 on x86, compiled with -O2

filter-quicksort : 41.51 s

partition-quicksort : 41.81 s

So it looks like filter-quicksort is a bit faster when using ghc-6.10.1 on x86_64 and only very slightly so when using ghc-6.10.3 on x86.

Using difference-lists (dlist in hackage) will probably help competing with the C version ((++) will become O(1))

Interesting stuff mate. Thanks for posting the timings.