Sorting lower bounds #

Defining a valid sorting algorithm #

Comparison-based model of computation (requires Ω(n log n) steps):

Another model:

Theorem:

Any deterministic comparison-based sorting algorithm must take Ω(n log n) steps
Any randomized comparison-based sorting algorithm must take Ω(n log n) steps in expectation

Argument:

All comparison-based sorting algorithms give rise to a decision tree
- Internal nodes correspond to yes/no (>= / <) decisions
- Leaf nodes correspond to outputs (in this case, all possible orderings of the items)
Running an algorithm on a particular input corresponds to a particular path through the tree
In all such trees, the longest path is at least log(n!) ~= log((n/e)n) = n ((log n) - 1) = O(n log n)
Thus traversing a path on the decision tree is worst case O(n log n)
- MergeSort is optimal!

Assumptions:

Need to be able to know what bucket to put something in
- Need to evaluate items directly, not just by comparison
Need to know what values might show up ahead of time
Need to assume there are not too many such values

For sorting integers up to size M, or more generally for lexicographically sorting strings
Can use less space than CountingSort
Idea: CountingSort on least significant digit, then next significant, etc
- Treat each bucket as FIFO, then concatenate
Running time: O((logrM + 1)(n+r))
- Choose n = r: O(n(lognM + 1))
- If M <= nc for some constant c, then this is O(n)
- If M >= nc for some constant c, then this is O(n2/log n)