Sorting Glossary

Glossary

Here's a list of some special words in the context of sorting algorithms. Where possibly, the meaning is explained when they are used, so you shouldn't need to refer to this very much. But, here it is, for completeness.

Append

The action of placing an item at the end of the list, such that it becomes the last item in that list, and the previous end of the list becomes second last, etc. Appends are usually much faster than inserts, for most real world list implementations. They are, however, generally still more expensive than swaps. Thus, algorithms which utilize swaps moreso than appends generally have an advantage in this area.

Average

It's very important to be specific about the definition of average in terms of sorting algorithm analysis. When we analyse the 'average case', we really do mean the actual mathematical average. This is given by working out what the result would be for every single possible value, and dividing this by the number of cases. The average case is not some special subset of the possabilities that may or may not represent your data - see Real World Average for that.

Big-O, O(f)

A mathematical definition whereby the dominating term of any function can be expressed as O(f), where f is the dominant term within the function. This has uses in defining the function behaviour and limits, and is used significantly in the mathematical analysis of sorting. For example, a generic quadratic of the form ax^2+bx+c belongs to the Big-O class O(x^2), because x^2 is the dominating term of a quadratic. 'Dominating' in this sense means having most influence on, for values of x greater than some number.

There is an entire mathematical field devoted around the Big-O area, so I'll leave further explanation to maths texts. You can, however, view a handy graph of some common Big-O classes here, if you're comparing algorithms.

Comparison

The action by which two items are compared based on their keys, and the ordering of the two determined. In computing terms, this can range from being a very simple operation (in the case of integer lists, for example), to very expensive, in the case of complicated data structures. Generally it is taken that the real world implementations will be relatively expensive, given the assumption that most real world implementations will be operating on complex real world data.

Insert

The placement of an item into a list at a particular point. Generally this implies the the particular point is not the end of the list - this is usually labelled specifically an append. In many list implementations, inserts can be incredibly expensive - e.g. arrays - as compared to appends, which are usually relatively fast.

Item

As used on this website, 'item' refers to the discrete elements that make up any list. For instance, in a list of English words, each word is an item in the list. In a list of people as you might find in an address book, each item is some sort of data structure representing a single person (and all their associated data). An item or it's data is not necessarily the key used to sort it, especially in complex real world implementations.

Key

Some data associated with an item, by which the item can be identified and compared. The most important property in terms of sorting is the ability to define order - that one key may be before or after - bigger or smaller, above or below, etc - another key. This ordering may be numerical, lexical, or some other arbitrary definition.

In simple systems, the key may well be the actual data of the item, e.g. words or sentences. In more complex systems, such as databases or other complex data structures, the key may be derived from the data, or even entirely unrelated - consider for example your bank account number, which is obviously not based on any information about you, but very intimately and importantly related to that information.

List

Any grouping of items of the same type. E.g. the words "the", "it", "then" and "we" can be considered a list, in this case an arbitrary example one. The ordering or position of elements within a list is not limited. Thus in the context of sorting algorithms we usually refer to the type of list, whether unsorted or sorted, in order to make clear the purpose of the list. A sorted list, of course, has all it's items in increasing or decreasing order.

In programming terms, a list may be implemented as an array, a linked list, or any of several other data structures. Practically, these have a huge impact on the relative performance of different sorting algorithms. However, in mathematical analysis sense the details of how a list is implemented is irrelevant, so long as it satisfies the criteria of holding a variable number of items, and of being able to manipulate these in the context of the list - i.e. remove, insert, append, swap, etc.

Pass

Sorting algorithms are usually all recursive. That is, they repeat the same process many times over. A pass is defined as one repetition of any process which operates on the entire list. For example, a single pass in selection sort is the process of progressing through the list from start to finish, finding the next item to be selected. In radix sort, a single pass is either the process of sorting every item into the next level of sublists, or the process of sorting one item completely through it's sublists to it's final position. Which depends on the exact approach used in the implementation, but both are validly described as passes.

Real World Average

This is the term given to averages that assume only a particular subset of all possible values. For instance, while a character in C can hold all the ASCII character values from 0 to 255, you're not likely to find most of them in a human-readable document. Thus, when considering the "real world average" - where you're probably dealing with human-readable documents - you might discard the unlikely characters, leaving the 75 or so 'normal' characters.

In terms of sorting algorithm analysis, the 'real world average' is often useful because it gives a better real world representation than the mathematical average, for what is assumed to be the average real world usage. Thus, unlike the mathematical average it is not proven nor constant, but also unlike the mathematical average it usually gives a much more accurate portrayal of the kind of results most people can expect to see.

Remove

The action of taking an item that exists in a list, and taking it out, such that the list no longer contains it. This is usually a very expensive operation for most real world list implementations.

Note that the item is simply removed, which does not necessarily imply it is destroyed. Lists usually contain pointers to each item, rather than the item itself. So removal from the list actually removes the pointer, not the item itself. The item lives on, possibly in another list or lists.

Sort

While of course everyone knows the general meaning of this, it should be noted that in the context of computational methods, 'sort' implies the ordering of items by some key unique to each item. The direction of the ordering is generally considered irrelevant in most discussion, and thus only stated specifically as required.

Swap

The action of interchanging two items within a list. In terms of practical implementations, a swap usually infers exchanging the values of two pointers. It is thus a relatively cheap operation, and usually highly preferrable to inserts, removals, comparisons, or any other common operation.