jump to navigation

Having fun sorting with Haskell July 11, 2009

Posted by haskelladdict in C programming, Haskell.

Recently, I re-discovered my good old copy of Jon Bentley’s Programming Pearls and started re-reading some of the chapters. It is a wonderful little book which I can highly recommend to any programmer. Particularly, the chapter on sorting (Column 11) is great and I decided to implement some common sorting routines in C just to stay in practice. Pretty quickly I had written a bubble sort, insertion sort, and quicksort routine and tested them all with a random vector of ints of length 20,000. Of course, quicksort (combined with insertion sort to sort the smallest sub-array) beat the others hands down.

So far so good but why not implement these sorting routines in Haskell as well, just for fun. Quicksort in Haskell is straightforward and strikingly beautiful. Here it is

> quicksort :: [Double] -> [Double]
> quicksort [] = []
> quicksort (x:xs) = (quicksort (filter (<= x) xs)) ++ [x]
>                              ++ (quicksort (filter (> x) xs))

Interestingly, even though this implementation uses filter twice, it is a tad faster on my system (~ 3%) than the version that uses a single call to partition to partition the list.

Of course, this version is no match for the C implementation in terms of speed due to the appends and the fact that each recursive call to quicksort uses its own private copy of the sub-array. Nevertheless, it is quite speedy and code-wise it can’t get much more concise.

My Haskell implementation of insertion sort is also not too much longer

> insertion_sort :: [Double] -> [Double]
> insertion_sort []       = []
> insertion_sort (x:xs) = my_insert x (insertion_sort xs)
>   where
>     my_insert a []               = [a]
>     my_insert a tot@(b:bs) = case compare a b of
>                                                  GT -> b:(my_insert a bs)
>                                                     _ -> a:tot

Yes, I know, I could have used prelude’s insert function instead, but
for some reason, using my_insert is about 30% faster for a reason I don’t understand since the functions are almost identical. The only reason I can think of is the additional redirection via insertBy used by insert. Of course, due to its O(n2) behaviour, insertion sort is quite a bit slower than quicksort.

Finally, there is my version of bubble sort and, frankly, it took me some time to find an implementation that ran in a somewhat decent amount of time without blowing my machine’s stack. My code threads a status variable through the bubble function which signals the absence of swaps indicating a sorted list. Here it is

> bubble_sort :: [Double] -> [Double]
> bubble_sort xs   = let
>                                 (state,bubbled) = bubble True xs
>                               in
>                                 if not state
>                                   then bubble_sort bubbled
>                                   else bubbled
>   where
>     bubble s []           = (s,[])
>     bubble s [a]         = (s,[a])
>     bubble s (a:b:as) = case compare a b of
>                                      GT -> let
>                                                  (s_n,new) = bubble False (a:as)
>                                                in
>                                                  (s_n,b:new)
>                                       _ -> let
>                                                 (s_n,new) = bubble (s && True) (b:as)
>                                               in
>                                                 (s_n,a:new)

This version of bubble sort is about 6 times slower than insertion sort. So not too bad but I have the feeling it can still be improved quite a bit. E.g., currently bubble walks through the whole list during each iteration even the already sorted part which is a waste.