Worst Case and Asymptotic Analysis

Worst-case and asymptotic analysis #

Worst-case analysis #

An algorithm must be correct on all possible inputs
The running time of the algorithm is the worst possible running time over all inputs
- That is, if we design a purposefully adversarial input, the algorithm should still work on that input
Pro and con: very strong guarantee

Asymptotic analysis: Big-O notation #

Meaningful way to talk about the runtime of an algorithm, independent of programming language or computing platform, without having to count all of the operations
Focuses on how the runtime scales with input size n
Highest-degree term in the sum is the only one that counts

e.g.

(n2/10) + 100 -> O(n2)

0.063n2 - 0.5n + 12.7 -> O(n2)

100n1.5 - 1010000sqrt(n) -> O(n1.5)

11 n log(n) + 1 -> O(n log(n))

Pros:

Abstracts away hardware- and language- specific issues
Makes algorithm analysis more tractable
Allows us to meaningfully compare how algorithms will perform on large inputs

Cons:

Only makes sense if n is large compared to the constant factors

Definition for O(…) #

Let T(n), g(n) be functions of positive integers
- Consider T(n) as a runtime: positive and increasing in n
We say “T(n) is O(g(n))” if for large enough n, T(n) is at most some constant multiple of g(n)
Formally, T(n) = O(g(n)) if there exists a c, n0 > 0 such that for each n > n0, T(n) <= c g(n)

Definition for Ω(…) #

We say “T(n) is Ω(g(n))” if for large enough n, T(n) is at least as big as a constant multiple of g(n)
Formally, T(n) = Ω(g(n)) if there exists a c, n0 > 0 such that for each n > n0, T(n) >= c g(n)

Definition for Θ(…) #

We say “T(n) is Θ(g(n))” iff both
- T(n) = O(g(n))
- T(n) = Ω(g(n))

Proofs #

To prove T(n) = O(g(n)), need to come up with c and n0 such that the definition is satisfied
To prove T(n) is not O(g(n)), use proof by contradiction:
- Come up with c and n0 such that the definition is satisfied

Examples #

Sorting #

Basics #

Idea: take array of unsorted items (i.e., [6, 4, 3, 8, 1, 5, 2, 7]) and sort it in nondecreasing order (i.e., [1, 2, 3, 4, 5, 6, 7, 8])
For convenience, assume all items are distinct and length of list is n

Insertion Sort #

At each iteration i starting with i=1:
- Move A[i] towards the beginning of the list until we can’t find anything smaller or can’t go further

Worst-case analysis on Insertion Sort #

Claim: Insertion Sort is correct.

Proof: Induction.

Base case: empty array (A[:1]) is sorted.
Inductive hypothesis: At the i’th iteration of the outer loop, A[:i+1] is sorted, for 0 <= i < k
Inductive step: Consider the k’th iteration. Insertion sort will move A[k] to its appropriate position (i.e., it will move it to a position p where A[p-1] < A[p] < A[p+1]). Thus A[:k+1] is sorted at the end of the k’th iteration, as required. Thus insertion sort is correct for 0 <= k < n.

Asymptotic analysis #

O(n2)

Merge sort #

Algorithm MergeSort(A):
- if len(A) <= 1, return A
- L = MergeSort(A[:len(A)/2])
- R = MergeSort(A[len(A)/2:])
- return Merge(L, R)

Worst-case analysis on Merge Sort #

Claim: Merge Sort is correct

Proof: Induction.

Base case (i=1): One-element array is always sorted.
Inductive hypothesis: In every recursive call on an array of length at most i, MergeSort returns a sorted array
Inductive step: Assume inductive hypothesis holds for k < i. Then it holds for k=i because if left and right halves are sorted, then Merge(L, R) is sorted. Therefore, this works and in the top recursive call, MergeSort returns a sorted array.

Asymptotic analysis on MergeSort #

Runs in O(n log(n))
Why?
- At each step, we divide A into L and R arrays, where |L| = |R| = |A|/2
- There are log(n) of these divisions = log(n) recursive calls
- Each Merge(L, R) takes O(n) time to finish.
- Thus, O(n log(n))