Sorting lower bounds #
Defining a valid sorting algorithm #
Comparison-based model of computation (requires Ω(n log n) steps):
- Input: array
- Output: sorted array
- Operations allowed: comparisons
Another model:
- CountingSort and RadixSort
- Run in time O(n)
Ω(n log n) bound for comparison sorting #
Theorem:
- Any deterministic comparison-based sorting algorithm must take Ω(n log n) steps
- Any randomized comparison-based sorting algorithm must take Ω(n log n) steps in expectation
Argument:
- All comparison-based sorting algorithms give rise to a decision tree
- Internal nodes correspond to yes/no (>= / <) decisions
- Leaf nodes correspond to outputs (in this case, all possible orderings of the items)
- Running an algorithm on a particular input corresponds to a particular path through the tree
- In all such trees, the longest path is at least log(n!) ~= log((n/e)n) = n ((log n) - 1) = O(n log n)
- Thus traversing a path on the decision tree is worst case O(n log n)
- MergeSort is optimal!
O(n) sorting* #
CountingSort #
- Suppose we have an array of numbers (e.g. [9, 6, 3, 5, 2, 1, 2])
- Set up buckets for values 1..9
- Foreach number in the array, put the number in its respective buckets
- Concatenate the buckets
Assumptions:
- Need to be able to know what bucket to put something in
- Need to evaluate items directly, not just by comparison
- Need to know what values might show up ahead of time
- Need to assume there are not too many such values
RadixSort #
- For sorting integers up to size M, or more generally for lexicographically sorting strings
- Can use less space than CountingSort
- Idea: CountingSort on least significant digit, then next significant, etc
- Treat each bucket as FIFO, then concatenate
- Running time: O((logrM + 1)(n+r))
- Choose n = r: O(n(lognM + 1))
- If M <= nc for some constant c, then this is O(n)
- If M >= nc for some constant c, then this is O(n2/log n)