Introduction
Example
Analysis
Practical Implementations

 

Introduction

Radix sort is one of the best general sorting techniques known.  It is quite simple to understand, but more difficult than most algorithms to implement.

The technique is simple and based on common human behaviour.  When presented with a large pile of library cards, for example, most people will start to sort them by first dividing them into piles based on the first letter of their title.  They then sort each of the subpiles using the second character of the title, and so forth.  Once all the piles at all levels have been sorted, it's a simple matter to concatenate them all back together into the final sorted list.

So it can be seen from this example that radix sort is somewhat similar in principle to merge or quick sort, except that instead of having two sublists at each level, there are usually many.  In computing terms, it then becomes very fast to sort each item, because you need only index into an array using some simple key, e.g. a single character of a larger string.  This will be shown in more detail in the Practical Implementations section.

 

Introduction
Example
Analysis
Practical Implementations

 

Example

Consider the following list of integers.







We begin by sorting the numbers into groups, based on their first digit.  This produces the following:





We then sort each of these three sublists in the same manner, but using the second digit this time.


Finally, the process continues for the last list, using the third digit.

And now it's a simple matter of putting all these back together in the final sorted list.







Voila.  As you've seen, it only takes at most 3 'passes' for a series of 3-digit numbers, and each pass is relatively simple - there are no comparisons, just separation into groups.  This is what makes radix sort extremely fast, since for complex objects comparisons can be quite expensive.

 

Introduction
Example
Analysis
Practical Implementations

 

Analysis

Radix sort is extremely difficult to analyse.  This is partly due to the fact that instead of simple comparisons and swaps, your are doing appends to multiple lists, and possibly generating keys for each item, etc.  There are a lot of variations on the basic algorithm, and many details specific to each implementation and use, so analyse of an arbitrary general case is not entirely useful anyway.  As such, I won't concern myself with too many of the details.

There are up to n passes, where n is the maximum length of the items to be sorted.  In the example previously, this was 3, as they were all 3 digit numbers.  Longer numbers, or long strings of characters, could take many more passes.

Each pass indexes some subset of the list into m different sublists, where m is the number of possible items in the key field.  For numerical digits, this is 10 - 0 through 9.  For characters, this could be 26 (a through z) or 52 (a through z, A through Z), if your sort is case sensitive.

So if we take a saturated list (one where every possible value is in the list), e.g. every number from 0 to 999, we can work out roughly how many indexes & appends, and sublists we need.

In the first case, there are 1000 items, each being indexed into 10 different sublists.  Thus there are a thousand indexes & appends, and 10 sublists created.

In the second case, there are 100 items (in 10 lists), each being indexed into 10 different sublists again.  So we have a thousand indexes & appends, and another 100 sublists created.

In the final case, there are 10 items (in 100 lists), each being indexed into 10 different sublists.  So again, a thousand indexes & appends, with a thousand sublists created.

The end result is that it takes 3000 indexes & appends, and 1110 sublists in order to sort our list of 1000 items.

If we have an unsatured list, say every even number from 0-999 (thus, 500 items), then it turns out that it takes 1500 indexes & appends, and 155 sublists.  However, if we choose 500 numbers from 0 to 999 at random, rather than all the even ones, then we could have anything from 155 to 610 sublists.  So as you can see, it entirely depends on the exact list you're sorting, as to how many sublists you end up creating.  This is what makes it very difficult to produce a reliable average case for radix sort, because you no longer have a nice single answer, but instead a range of possabilities.  Given that creating sublists is usually relatively expensive, so the time required to sort any list can vary hugely.

As a rule, however, the number of indexes & appends in a satured list is equal to just m*n, where n is the number of items, and m is the number of fields in each items key.  It should be immediately obvious from this that you will have the smallest number of indexes & appends when n and m are about the same.  So for example if we indexed each item into a thousand different sublists (n=1000 still, but m=1 instead of 3), we would perform just 1000 indexes & inserts.  We would also only create 1000 sublists, versus 1110 from before, because we essentially skip the first 2 passes from before.

For an unsaturated list, the number of indexes & appends is anything less than or equal to m*(p^m), where m is the number of fields in each key, and p the number of possible values for each field.  So for our 0-999 example, this is 3*(10^3) = 3000.  In our 500-item list, this would invariably be something less than this.  What, exactly, depends on the list itself, as we've discussed.

As a further optimization, if we discount the possability of duplicate items, we could create a list with space for 1000 items, and index each item into this directly.  So we'd create just a single sublist, rather than a thousand.  This is in effect a hash table, and a very simple one at that.  It is incredibly fast, but not usually applicable in the real world.  If you were sorting complex objects, for instance, where their key could be perhaps tens of fields long, and each field could be one of maybe hundreds of values...m*(p^m) very quickly becomes gigantic, and simply not practical in any sense.

 

Introduction
Example
Analysis
Practical Implementations

 

Practical Implementations

How you implement radix sort is extremely case-specific, so I won't try and cater for everyone here.  I'm simply going to write a generic psuedo-code implementation, which you can adapt for your own needs.

struct radixNode {
   LIST items;
   radixNode *subnode[NUMBER_OF_KEY_VALUES];
}

void radixFile(radixNode *node, ITEM *item, int keyField) {
   if (keyField < item->numberOfKeyFields) {
     if (node->subnode[item->valueOfKeyField(keyField)] == NULL) {
       node->subnode[item->valueOfKeyField(keyField)] = new radixNode;
     }

     radixFile(node->subnode[item->valueOfKeyField(keyField)], item, keyField + 1);
   } else if (keyField == item->numberOfKeyFields) {
     node->items.append(item);
   } else {
     // error condition; keyField should never progress this far
   }
}

void radixConcatenate(radixNode *node, LIST *list) {
   list->append(node->items);

   for (int i = 0; i < NUMBER_OF_KEY_VALUES; ++i) {
     if (node->subnode[i] != NULL) {
       radixConcatenate(node->subnode[i], list);
     }
   }
}

void performRadixSort(LIST *list, int listSize) {
   radixNode supernode;

   for (int i = 0; i < listSize; ++i) {
     radixFile(&supernode, list[i], 0);
   }

   list->erase();

   radixConcatenate(&supernode, list);
}

That code isn't fantastic, but it's about as simple as it gets, and thus easier to read than faster versions.  As you can see, this implementation is recursive - a faster one would do the same thing iteratively.  This is indeed quite possible, and could be an exercise for the reader.

One thing that's lacking from this is a good definition of what the key is for each item.  In this case, it could be an array of integers, each field being each digit, or it could be a string with each field being the individual characters.  In the latter case, this implementation would in most cases be very inefficient.  It would allocate 256 subnode pointers in each radixNode, whereas if you're sorting human-readable data, you probably only have at most 75 possible real values.  Thus, you could be using 3 times as much memory as necessary.  You can solve this problem by mapping your 256 potential key values to say 75 you actually allow.  One of these (ASCII 0, perhaps) could be an invalid character marker.  i.e. 74 unique characters are mapped to 1 through 75, while any of the other 182 ASCII values map to 0.  This implementation would use much less memory, and despite the compressed character set, would still work for it's purpose.  In the case that it does get handed bad data (data outside it's valid input, and thus mapped to 0) it will still work, as opposed to crashing or aborting.  But whether the results are then logically valid is arguable, as you have effectively discarded parts of your key you found inconvenient.

Thus, while you can usually optimize radix sort for your particular data, especially if you're sorting strings, you need to be careful that you won't one day need those other characters you effectively toss away.

Looking at another aspect of this implementation, it should be noted the this implementation is not in-place.  That is, it creates temporary data structures outside the original list.  This is a disadvantage when sorting, because it means you need enough memory for the list itself, plus any extra structures you build while sorting.  If you use too much memory, you'll have some of your data swapped out (or worse - run out of memory completely and crash).   Once your data starts getting paged, your sort with run like molasses no matter what algorithm you use.  So radix sort is not always faster than other methods, if memory is particularly tight.

There are some radix sort implementations that are effectively in place, or at least reduce the amount of extra memory required.  However, true in-place methods are exceedingly complicated and often much slower - they often remove & insert elements all over the place, both of which are very expensive operations on many list or array structures.  So, an in-place radix sort is not as desirable as it sounds.  What is more desirable, however, are implementations that perform nearly as well as our previous implementation, but use much less memory.

One such implementation works as exampled below.  The technique is simple - sort the list into appropriate groups using the last key field of each item.  Then concatenate the result back into a list.  Now do the same again using the second last field, and concatenate the results back into a list.  Repeat until you reach the first field.  Then you're done, and have a nicely sorted list, that only ever needed a single level of sublists.

This technique works because radix is what's known as a stable sorting method - that is, if two 'identical' items appear in a list, they will reappear in exactly the same order in the sorted list.  Remember that just because two items have identical keys, they are not necessarily identical - they may have other non-key data that is unique.  So sometimes whether or not a sort is stable is very important.  In this case, we utilize the stability.  It's best to simply show this through example, using the following list:







If we sort this by it's last field, we get the following sublists:



We then concatenate these back into list form, to produce:







We now sort into sublists based on the second last digit, yielding the following:




Notice how in each of the sublists, especially the 3rd one above, the last digits are still in the right order.  This is a result of the stability, and will always be a case.  Concatenating these back into a single list, we have:







Finally, sorting into groups based on the first digit gives:





Now putting these back into list form gives the final, sorted list, below:







Done.  Notice that at any one time, we only ever had at most 5 sublists.  Were we to use a larger sample of 3 digit numbers - e.g. 0 to 999 - we'd still only ever need at most 10 sublists.  Compare this to our 1110 sublists from before!

The disadvantage, of course, is that we have to concatenate our sublists 3 times, versus just once for the original implementation.  So for very long keys, this technique could incur a significant penalty.

Also, this technique simply doesn't work in the case where the number of key fields is not the same for each item.  While it wasn't stated explicitely, the original technique had no issues with variable-length keys.  So this technique is often impractical for lists of strings or other such data.  Note that while you can further modify this implementation to allow for variable-length keys, it really starts to slow it down a lot, and in any case is not commonly done.  So, I won't cover it here.

Exercise for the Reader

Implement both the traditional method and the modified one (as exampled above).  Compare the two for various extreme list types (e.g. a list with few items but very long keys, a list with many items and short keys, etc).  See for yourself in which cases either technique works better.

If you want more work still, try to devise your own version of the second technique, allowing for variable length keys.  Compare it with the first technique, to see if it's really worth all the extra trouble after all.  In some cases it will be, but for most real world scenarios, even with little memory, it won't be as fast.

 

Introduction
Example
Analysis
Practical Implementations