Linear Time Median

05 Jan 2020

Median is considered one of the important measures in Statistics. Unlike Mean, it is not skewed by a few very large values in the dataset. Computing Median can seem like a trivial problem and it is trivial if the dataset (array) is sorted. If it is not sorted, it may not be trivial. Sorting an array, as you know, will take O(nlogn) time. But thinking about median computation, we don’t need to sort the entire array. We just need to make sure that median elements (middle one if length of array is odd and middle two elements if array length is even) is in their right place.

Putting elements in right place reminds us of Quick Sort algo which was invented by Tony Hoare. The algorithm employed to find median in linear time is a variation of Quick Sort called Quick Select which was also developed by Tony Hoare.

The guts of the algorithm is as follows.


def partition(arr, lowIdx, highIdx, pivotIdx):
    pivot = arr[pivotIdx]
    arr[highIdx], arr[pivotIdx] = arr[pivotIdx], arr[highIdx]
    i = lowIdx-1
    for j in range(lowIdx, highIdx):
        if arr[j] <= pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]
    arr[i+1], arr[highIdx] = arr[highIdx], arr[i+1]
    return arr

Essentially, partition puts the pivot element in its right place and everything greater after it and everything lesser/equal below it. Complete code which finds median employing the above guts is at Quick Select Median.

As Andrei Alexandrescu says in one of his CppCon 2019 talks, sorting (axiomatically selection) algos are one of the most researched problems in Computer Science. For a programmer, its good to implement or analyse these algos every now and then.

Keep Calm and Start Coding