Median and Selection

# The k-SELECT problem #

• Let A be an array of integers and let k be an integer
• Goal: Return the k’th smallest element of A
• e.g.
• A = [7, 4, 3, 8, 1, 5, 9, 14]
• SELECT(A, 1) = 1
• SELECT(A, 2) = 3
• SELECT(A, 3) = 4
• SELECT(A, 8) = 14
• SELECT(A, 1) = MIN(A)
• SELECT(A, n/2) = MEDIAN(A)

## O(n log n) algorithm #

• Very simple! Sort the array, return A[k-1]

## What we want: an O(n) algorithm #

Ex. MIN(A) = SELECT(A, 1)

• ret = MAX_INT
• for i in 0..n-1:
• ret = min(ret, A[i])
• return ret

### Divide and conquer #

Algorithm SELECT(A, k)

• First, pick a “pivot”
• Next, partition the array into “bigger than pivot” (L) and “less than pivot” (R)
• if k = len(L) + 1:
• return A[pivot]
• if k < len(L) + 1:
• return SELECT(L, k)
• if k > len(L) + 1:
• return SELECT(R, k - len(L) - 1)

### Running time #

• T(n) =
• T(len(L)) + O(n) if len(L) > k-1
• T(len(R)) + O(n) if len(L) < k-1
• O(n) if len(L) = k-1
• Ideal pivot would split the input in half; however, finding such a pivot is itself an application of SELECT
• Yields O(n) algorithm
• “Good enough” pivot approximates median: median of medians of groups of size 5, which takes O(n) time to find
• Alternatively, random pivot performs fine on average