jump to navigation

Having fun sorting with Haskell July 11, 2009

Posted by haskelladdict in C programming, Haskell.

Recently, I re-discovered my good old copy of Jon Bentley’s Programming Pearls and started re-reading some of the chapters. It is a wonderful little book which I can highly recommend to any programmer. Particularly, the chapter on sorting (Column 11) is great and I decided to implement some common sorting routines in C just to stay in practice. Pretty quickly I had written a bubble sort, insertion sort, and quicksort routine and tested them all with a random vector of ints of length 20,000. Of course, quicksort (combined with insertion sort to sort the smallest sub-array) beat the others hands down.

So far so good but why not implement these sorting routines in Haskell as well, just for fun. Quicksort in Haskell is straightforward and strikingly beautiful. Here it is

> quicksort :: [Double] -> [Double]
> quicksort [] = []
> quicksort (x:xs) = (quicksort (filter (<= x) xs)) ++ [x]
>                              ++ (quicksort (filter (> x) xs))

Interestingly, even though this implementation uses filter twice, it is a tad faster on my system (~ 3%) than the version that uses a single call to partition to partition the list.

Of course, this version is no match for the C implementation in terms of speed due to the appends and the fact that each recursive call to quicksort uses its own private copy of the sub-array. Nevertheless, it is quite speedy and code-wise it can’t get much more concise.

My Haskell implementation of insertion sort is also not too much longer

> insertion_sort :: [Double] -> [Double]
> insertion_sort []       = []
> insertion_sort (x:xs) = my_insert x (insertion_sort xs)
>   where
>     my_insert a []               = [a]
>     my_insert a tot@(b:bs) = case compare a b of
>                                                  GT -> b:(my_insert a bs)
>                                                     _ -> a:tot

Yes, I know, I could have used prelude’s insert function instead, but
for some reason, using my_insert is about 30% faster for a reason I don’t understand since the functions are almost identical. The only reason I can think of is the additional redirection via insertBy used by insert. Of course, due to its O(n2) behaviour, insertion sort is quite a bit slower than quicksort.

Finally, there is my version of bubble sort and, frankly, it took me some time to find an implementation that ran in a somewhat decent amount of time without blowing my machine’s stack. My code threads a status variable through the bubble function which signals the absence of swaps indicating a sorted list. Here it is

> bubble_sort :: [Double] -> [Double]
> bubble_sort xs   = let
>                                 (state,bubbled) = bubble True xs
>                               in
>                                 if not state
>                                   then bubble_sort bubbled
>                                   else bubbled
>   where
>     bubble s []           = (s,[])
>     bubble s [a]         = (s,[a])
>     bubble s (a:b:as) = case compare a b of
>                                      GT -> let
>                                                  (s_n,new) = bubble False (a:as)
>                                                in
>                                                  (s_n,b:new)
>                                       _ -> let
>                                                 (s_n,new) = bubble (s && True) (b:as)
>                                               in
>                                                 (s_n,a:new)

This version of bubble sort is about 6 times slower than insertion sort. So not too bad but I have the feeling it can still be improved quite a bit. E.g., currently bubble walks through the whole list during each iteration even the already sorted part which is a waste.



1. OJ - July 11, 2009

Why use filter twice when you can use partition and get the same result with a single pass of the list?

Sorting is great fun. I’m part way through a series on sorting myself.


2. haskelladdict - July 12, 2009

Hi OJ,

Thanks for the great suggestion. I actually tried using partition at some point but it turned out that (somewhat counterintuitively) the version with two calls to filter is actually faster for me by a few %. Not sure why.

3. Leon P Smith - July 12, 2009

Are you using GHC 6.10 or GHC 6.8, out of curiousity?

The problem with partition is that it returns a lot of lazy pairs. I haven’t tested partition per se, but using drop and take used to be faster than calling splitAt, but no longer if you use 6.10. (although splitAt, drop, and take are a *lot* faster)

4. OJ - July 12, 2009

That is really interesting. I’m going to have to take a look at the implementation of partition now! Wonder if they’re doing something crazy in it. It seems odd there’d be a negative impact on performance.

Cheers for the info.

5. haskelladdict - July 12, 2009

Below are some timings (averages over 3 runs each) for sorting a list of 1e6 random ints using the above code for filter-quicksort and the following code for partition-quicksort (sorry, indenting is messed up)

>quicksort :: [Double] -> [Double]
>quicksort [] = []
>quicksort (x:xs) = let
> (leq,gt) = partition (<= x) xs
> in
> (quicksort leq) ++ [x] ++ (quicksort gt)

1) ghc-6.10.1 on x86_64, compiled with -O2

filter-quicksort : 20.40 s
partition-quicksort : 21.31 s

2) ghc-6.10.3 on x86, compiled with -O2

filter-quicksort : 41.51 s
partition-quicksort : 41.81 s

So it looks like filter-quicksort is a bit faster when using ghc-6.10.1 on x86_64 and only very slightly so when using ghc-6.10.3 on x86.

6. Yair - July 13, 2009

Using difference-lists (dlist in hackage) will probably help competing with the C version ((++) will become O(1))

7. OJ - July 13, 2009

Interesting stuff mate. Thanks for posting the timings.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: