2016/01/22

Median of two sorted arrays

Problem: Let a and b be two sorted arrays. Find the median of the array that would be obtained by merging a and b.

Note: The median of a sorted array with an even number of elements is the average of the two elements in the middle of the array. E.g., the median of (1,2,3,4) is (2+3)/2=2.5

Solution: A naive solution would be processing a and b using the same technique of the merge routine of the merge-sort algorithm. We do not need to materialize the merged array, we can just save the element/s that would be in the middle. This approach requires O(|a|+|b|) time and constant auxiliary space.

Indeed, the problem can be solved in time O(log(|a|)+log(|b|)) and constant auxiliary space.

Let's think to a simple case that can be solved in constant time. Assume that the size of one of the arrays, say a, as length not greater then 2 (|a| ≤ 2). By means of some examples, it is easy to see that we can focus our attention the three (|b| is odd) or four (|b| is even) elements in the middle of b, or all the elements of b, if |b|<3.
In this case, the naive approach can be applied, and since it will work on at most 6 elements, whatever |a|+|b| is, its complexity can be considered constant.
Summarizing, we are able to solve the problem in constant time and space when one array has at most two elements, whatever the size of the other array is.

Now, lets try to efficiently reduce the general problem to the above situation, i.e., one of the two arrays has size not larger than two. Let ma and mb be the median of a and b, respectively. We can observe that if m=ma=mb, then m is also the value of the median of merged array.
But assume that we are not so lucky, and ma<mb (the case ma>mb can be handled symmetrically).
Let h be the minimum between the number of elements on the left of ma and the number of elements on the right of mb. The global median cannot be either in the first h elements of a or in the last h elements of b.
An informal intuition is the following. Being ah the h-th element of a, the number of elements less than ah in a plus the number of elements less than ah in b is less than one half of |a|+|b|. Similarly, being bh the h-th last element of b, the number of elements greater than bh in b plus the number of elements greater than bh in b is less than one half of |a|+|b|.
Thus, we can discard the first h elements of a and the last h elements of b. Observe that, in this way, the length of at least one array has been halved (indeed, it is almost halved, since the median element/s is/are kept). We do not actually need to remove the elements of the arrays. We can just keep two indexes for each array, denoting the lower and upper bound of the range that can contain the global median.

//median of the elements of a with index between lo and hi
public double median(int[] a, int[] r) {
    return (a[(r[0] + r[1]) / 2] + a[(r[0] + r[1] + 1) / 2]) * 0.5;
}

//naive O(|a|+|b|) algorithm
public double naiveMedian(int[] a, int[] ra, int[] b, int[] rb) {
    int totalLength = ra[1] - ra[0] + 1 + rb[1] - rb[0] + 1;
    int pos1 = (totalLength - 1) / 2;
    int pos2 = totalLength / 2; //when totalLength is odd, pos1=pos2
    int i = ra[0];
    int j = rb[0];
    int k=0;
    double median = 0;
    while (k <= pos2) {
        if (i <= ra[1] && (j>rb[1] || a[i] <= b[j]) ) {
            if (k == pos1 || k == pos2) {
                median += a[i];
            }
            i++;
        } else {
            if (k == pos1 || k == pos2) {
                median += b[j];
            }
            j++;
        }
        k++;
    }
    if (pos1 != pos2) {
        median *= 0.5;
    }
    return median;
}

public double efficientMedian(int[] a, int[] b) {
    int[] ra = new int[]{0, a.length - 1}; //range of a
    int[] rb = new int[]{0, b.length - 1}; //range of b
    if (ra[1] < ra[0]) { //a is empty
        return median(b, rb);
    }
    if (rb[1] < rb[0]) { //b is empty
        return median(a, ra);
    }
    double ma = median(a, ra);
    double mb = median(b, rb);
    while (ma != mb && ra[1] - ra[0] > 1 && rb[1] - rb[0] > 1) {
        int delta = Math.min(ra[1] - ra[0], rb[1] - rb[0]) / 2;
        if (ma < mb) { //ignore the leftmost elements of a and the rightmost elements of b
            ra[0] += delta;
            rb[1] -= delta;
        } else { //ignore the rightmost elements of a and the leftmost elements of b
            rb[0] += delta;
            ra[1] -= delta;
        }
        ma = median(a, ra);
        mb = median(b, rb);
    }
    if (ma == mb) {
        return ma;
    }
    if (ra[1] - ra[0] <= 1) { //restrict the subarray of b to its elements in the middle
        return naiveMedian(a, ra, b, restrictRange(rb));
    } else { //rb[1] - rb[0] <= 1
        return naiveMedian(a, restrictRange(ra), b, rb);
    }
}

public int[] restrictRange(int[] r) {
    if (r[1] - r[0] <= 1) {
        return r;
    }
    int mid = (r[0] + r[1]) / 2;
    return new int[]{mid - 1, (r[1] - r[0]) % 2 == 0 ? mid + 1 : mid + 2};
}

No comments:

Post a Comment